Session Management with CGI::Session
The Perl Journal January 2003
By Sherzod Ruzmetov
Sherzod is an undergraduate student at Central Michigan University, where he studies Marketing. He is the author of several CPAN libraries, including CGI::Session. He can be contacted at [email protected].
HyperText Transfer Protocol (HTTP) is stateless. Successive clicks to a web server are each considered brand new, and lose all the state information from previous requests. Lack of persistency in HTTP makes such applications as shopping carts and login-authentication routines a challenge to build. Session management on the Web is a system designed to remedy the statelessness of HTTP.
Users of other web technologies, such as PHP and JSP, enjoy the luxury of built-in, reliable session-management packages, leaving Perl programmers to reinvent the wheel. In this article, I'll go over the problems involved, and introduce a relatively new Perl 5 library, CGI::Session, which provides Perl programmers with all the tools required for managing user sessions on the Web.
Persistence
Persistence is a way of relating successive requests to a web site with each other. For example, when we add a product to a shopping cart, we expect it to be there until the time comes to check out. Managing user sessions requires us to keep in mind that a web site can have more than one shopper at a given time, and that each needs a private cart.
CGI::Session is a library that manages these issues in a very reliable and friendly way, providing persistence to your web applications even though the Web wasn't designed for it.
Persistence can also be achieved without having to implement a complex server-side system. Query strings, path_info, and cookies are alternatives for passing state information from one click to another. Although we use them in our examples as session ID transporters, they have limitations that prevent them from being used on a larger scale. For more information, please refer to the "References" section at the end of the article.
Overview
Here are the highlights of the session-management process:
1. When a new visitor retrieves your page for the first time, your application creates a unique ID for that user, and creates storage somewhere on the server-side system associated with that ID.
2. After creating the ID, we need to "mark" the visitor with the ID by either sending the ID as a cookie to his/her computer or appending it to each dynamic URL in the form of a query string so that clicks to that site will return an already-created ID.
3. If a user performs an action that needs to be remembered later (for example, adding an item to a virtual cart), we store related information on the server into a device (file) created in the first step.
4. To retrieve the previously stored session data, we try to match the user's cookie and/or specific query string parameter value with that of the stored device name/slot. If they match, we initialize the session data off that storage device (file).
This is a fairly complex system to manage, but CGI::Session can manage this for you transparently.
Syntax
The syntax of the library is very similar to that of CGI.pm, except for the object initialization part. This is done intentionally to reduce the time spent learning the library syntax:
$session = new CGI::Session(DSN, SID, DSN_OPTIONS);
The rest of the syntaxadding data to the session file, retrieving from the file, and updatesshould be very familiar to anyone who has used CGI.pm; see Example 1.
Object Initialization
Object creation is crucial. new(), the CGI::Session constructor, requires three arguments.
The first is the DSN, which is a set of key-value pairs. The DSN mainly tells the library which driver to use. Consider the following syntax:
$session = new CGI::Session("driver:File", undef, {Directory=>"/tmp"}); $session = new CGI::Session("driver:MySQL", undef, {Handle=>$dbh}); $session = new CGI::Session("driver:DB_File", undef, {Directory=>"/tmp", FileName=> "sessions.db"});
You can also pass undef instead of the DSN to force default settings for driver:File.
The second argument is the session ID to be initialized or undef, which forces the library to create new session for a user. A new session will be created if a claimed session ID turns out to be either invalid or expired. Instead of session ID, we can also pass an instance of the CGI object. In this case, the library will try to retrieve the session ID either from a cookie or a query string.
The third argument must be in the form of a hash reference, and is used solely by DSN components. Consult with the library manuals for more details.
Most of the time, the following syntax should suffice to provide your programs with a session object:
$cgi = new CGI; $session = new CGI::Session(undef, $cgi, { Directory=>"/tmp" } );
Here we created an instance of the CGI object and gave it to CGI::Session. We also told the library where the session files are to be stored: Directory=>"/tmp".
You should then send the newly generated session ID back to the user either as a cookie or appended to the document's links as a query string:
my $cookie = $cgi->cookie( "CGISESSID", $session->id ); print $cgi->header(-cookie=>$cookie);
Notice we're creating a cookie object using CGI.pm's cookie() method, and sending it as part of the HTTP header. CGI::Session by default expects the name of the cookie and the CGI parameter holding the session ID to be CGISESSID. You can change this setting if needed.
The latest releases of CGI::Session provide their own header() method, which condenses the aforementioned two lines of code into one:
print $session->header();
Storing Data in the Session
Data is stored in the file in the form of key/value pairs, where keys are required to be strings or variables that resolve into strings, and values can be any arbitrary Perl data structure including references to scalars, arrays, hashes, and even objects. In the following example, we store the user's profile into the PROFILE session parameter in the form of a hashref:
my $profile = $dbh->selectrow_hashref(qq| SELECT * FROM profile WHERE login=? AND psswd=PASSWORD(?)|, undef, $login, $password); $session->param( PROFILE => $p );
You can also store values right off the CGI parameter, either as a whole or selectively. This form of storing data makes it easier to implement complex HTML forms, such as advanced search forms, in such a way that they retain their previously submitted data for later use:
# stores all the parameters available through # $cgi object's param() method $session->save_param( $cgi );
# or to store $cgi parameters selectively: $session->save_param( $cgi, ["_cmd", "query", "sort_by", "sort_type"] );
To fill in complex HTML forms with the data stored in the session, you should simply load the session data into the CGI.pm object:
# loads all the session parameters into the $cgi # object $s->load_param( $cgi );
# or, to load session parameters selectively:
$s->load_param( $cgi, ["_cmd", "query", "sort_by", "sort_type"] );
This comes in handy when outputting HTML drop down menus, checkbox groups, and radio buttons with previously submitted form data:
$s->load_param( $cgi, ["words"] ); print $cgi->checkbox_group("words", ["eenie","meenie","minie","moe"]);
This example loads the words parameters from the session object into the $cgi object if it exists. Then, when printing a group of checkboxes using CGI.pm's checkbox_group() method, all the previously selected (and saved) checkboxes will be prechecked. The same applies to all the other HTML form elements (except file-upload fields) generated using standard CGI.pm.
Reading Stored Data
CGI::Session allows us to access previously stored data via the same param() method that we used to store them.
my $first_name = $session->param("first_name"); my $email = $session->param("email"); print qq~<a href="mailto:$email"> $first_name></a><br />~;
load_param() also allows one to access the session data, but this time via the CGI.pm object.
CGI::Session can also be associated with HTML::Template, a module that enables the separation of logic and presentation:
my $tmpl = new HTML::Template(filename => "some.tmpl", associate => $session, die_on_bad_params => 0 ); print $tmpl->output();
Now inside your "some.tmpl" template file, you can access data stored in the session object like so:
Hi <a href="mailto:<TMPL_VAR email>"> <TMPL_VAR first_name></a>!
Properly saving data structures in the session enables you to create complex loops, such as shopping cart contents, in your template files with minimal coding, as shown in Example 2.
Clearing Session Data
You want a way to frequently delete certain session data from the object. For instance, when the user clicks on the "sign out" link, you want to delete the "logged-in" flag. Another common use of this feature is in login-authentication forms, where the author wants to "lock" the user's session after three unsuccessful attempts. But as soon as they log in successfully, we need to delete this counter flag. That's where the clear() method comes in:
# clears all the session data ( ouch! ) $session->clear(); # or to clear certain session data: $session->clear(["logged-in", "login_failure"]);
Consider the following example, which I use to keep track of the number of subsequent failures in login-authentication forms and lock the user's browsing session:
# somewhere at the top of your script: if ( $session->param("login_failures") >= 3 ) { print error_session_locked(); exit(0); } authenticate($cgi, $dbh, $session);
The authenticate() function is shown in Example 3.
Deleting the Session
Deleting the session is somewhat different from clear()ing it. clear() deletes certain session parameters but keeps the session open, whereas deleting a session makes sure that all the information, including the session file itself, is gone from the disk. You will want to call delete() mainly for expired sessions, which will no longer be of any use.
Expiring
CGI::Session also provides a limited means to expire session data. Expiring a session is the same as deleting it via delete(), but deletion takes place automatically. To expire a session, you need to tell the library how long the session will be valid after the last access time. After that time, CGI::Session refuses to retrieve the session. It deletes the session and returns a brand new one. To assign an expiration ticker for a session, use the expire() method:
$session->expire(3600); # expire after 3600 seconds $session->expire('+1h'); # expire after 1 hour $session->expire('+15m'); # expire after 15 minutes $session->expire('+1M'); # expire after a month # and so on.
Sometimes it makes perfect sense to expire a certain session parameter instead of the whole session. I usually do this in login-authentication enabled sites, where after the user logs in successfully, I set a _logged_in flag to True and assign an expiration ticker on that flag to something like 30 minutes. After 30 idle minutes, CGI::Session will clear() the _logged_in flag, indicating the user should log in again. The same effect can be achieved by simply expiring() the session itself, but in this case ,we would lose other session parameters such as user's shopping cart, session preferences, and the like.
This feature can also be used to simulate layered security/authentication. For instance, you can keep the user's access to his/her personal profile information for as long as 10 idle hours after successful login, but expire access to that user's credit-card information after 10 idle minutes. To achieve this effect, we will use the expire() method again, but with a slightly different syntax:
$session->expire(_profile_access, '+10h'); $session->expire(_cc_access, '+10m');
With this syntax, the user would still have access to personal information after, say, five idle hours, but would have to log in again to access or update credit-card information.
Remember that time intervals given to expire() are relative to a session's last access time, and expirations are carried out before modifying this time in the session's metatable.
Although expire() is quite handy, it cannot solve all the issues related to expiring session data in the real world. Some of the expired sessions will never be initialized, and your program will never know about their existence. The only applicable solution currently available is to either do it manually through a script, or set this script up in your cron. For this purpose, CGI::Session provides the touch() method, which simply touches the session without modifying its last access time. This is the minimum requirement to trigger automatic expiration:
use CGI::Session; tie my %dir, "IO::Dir", "/tmp"; while ( my ($filename, $stat) = each %dir ) { my ($sid) = $filename =~ m/^cgisess_(\w{32})/ or next; CGI::Session->touch(undef, $sid, {Directory=>"/tmp"}); } untie(%dir);
You can treat touch() as an alternative to the new() constructor, so it expects the same set of arguments as new() does.
There are, however, several proposed solutions to let CGI::Session deal with orphan session data gracefully. One is to implement a master session table, the purpose of which is to keep track of all the session data in a specific location. Another, better solution is to implement a session service, to which CGI::Session would connect each time to create and/or initialize a session. We will be working on implementing these solutions in subsequent releases.
Security
In a server-side session managing mechanism, implementing persistence does not require that sensitive information such as users' logins and passwords travel across the Internet at each mouse click or form submission. These are only transmitted the first time a user logs in. After that, information is transmitted as a session identifier, which looks something like 9f2b14b2008b9885abb07a30a09bab9c, and should make no sense to anyone except the library implementing the system (CGI::Session, in this case). We no longer need to embed logins and passwords in hidden fields of forms, which get cached by browsers and are available by viewing the source of the page. Nor do we need to append them to the url as a query string, which tends to get logged in the server's access logs. We also no longer need to plant sensitive data in the user's cookie files in plain text format.
But there are several issues that we need to be aware of to prevent unpleasant surprises.
Storage
Although we no longer need to store data in the user's cookie file, we still store it somewhere on the server side. This may not increase security at all if we don't we take the proper precautions:
- If you use MySQL or another similar RDBMS driver, data tables should be protected with a login/password pair to (one hopes) prevent evil eyes from peeking into them. But if you are implementing File or DB_File drivers, you need to take the extra effort to hide the data from curious eyes by setting proper permissions.
- If you can go without having to store very sensitive data in the session, that's even better.
- If a session is likely to have sensitive information at some point, reduce the lifecycle of such sessions by specifying a shorter expiration period:
$session->expire("+30m");
- For sensitive sessions, delete() is always preferred over calling clear() on certain parameters. Read on for details.
Session Identifiers
Suppose you are shopping at a web site where you're currently logged in to your profile. If someone correctly guesses your session ID, can that person now trick the site and appear to that site as you?
Yes. Consider the scenario where the session object is initialized like so:
$claimed_id = $cgi->param("SID") || undef; $session = new CGI::Session(undef, $claimed_id, {Directory=>"/tmp"});
A person can simply append your session ID to the URL of that site (for example, ?SID=12345), and if that ID is correct, the program will initialize this particular session ID, and give that person access to all of your profile information instantly. Ouch! What an accident waiting to happen! Instead of initializing a session from just a query string, we can get it from the cookie. But it is still not an impossible task to get the browser to send a custom cookie to the server.
This scenario raises another set of questions. Are the IDs guessable? The default setting of CGI::Session generates 32-character random strings using the MD5 digest algorithm, which makes it impossible to anticipate subsequent IDs.
Even if someone somehow gets the ID from somewhere, we should make it nearly impossible for them to trick the application. For this purpose, CGI::Session supports an -ip_match switch:
use CGI::Session qw/-ip_match/; $session = new CGI::Session(undef, $cgi, {Directory=>"/tmp"});
If this switch is turned on, CGI::Session refuses to retrieve any session if the IP address of the person who created the session doesn't match that of the person asking for it. The same effect can be achieved by setting $CGI::Session::IP_MATCH to a True value.
Driver Specification
CGI::Session uses drivers to access the storage device for reading and writing the session data. At this point, CGI::Session supplies File, DB_File, and MySQL drivers for storing session data in simple files, BerkelyDB files, and MySQL tables, respectively. The corresponding driver names are CGI::Session::File, CGI::Session::DB_File, and CGI::Session::MySQL, respectively.
These three drivers are enough most of the time, but if not, CGI::Session allows us to write our own driver, say, to store session data in MS Access or PostgreSQL tables, for instance.
What Is a Driver?
A driver is another Perl5 library that simply extends (inherits from) CGI::Session and implements certain features (methods):
retrieve() is called when an object is being created by passing a session ID or a CGI object instance as the second argument. It takes three arguments: $self, a CGI::Session object itself; $sid, the currently effective session ID; and $options, an arrayref that holds the second and third arguments passed to new(). retrieve() must return deserialized session data as a Perl data structure (hashref). On failure, it should log the error message in $self->error("error message"), and return undef.
store() stores session data on disk. It takes four arguments: $self, $sid, $options, and $data. $data is session data in the form of hashref. store() should return any True value indicating success. On failure, it should log the error message in $self->error("error message"), and return undef.
teardown() is called just before the session object is to be terminated. It's up to the driver what to do with it. For example, the MySQL driver would close the connections to the driver if it opened them, the DB_File driver would get rid of lock files, and the File driver would simply terminate all the open file handles. It takes three arguments: $self, $sid, and $options. teardown() should return true indicating success. On failure, it should log the error message in $self->error("error message"), and return undef. If the error isn't crucial for persistence and validity of session data, the failure should be discarded, and any True value should be returned.
remove() implements the delete() method. It takes three arguments: $self, $sid, and $options. It should return True, indicating success. On failure, it should log the error message in $self->error("error message"), and return undef.
In addition, the driver also needs to provide a generate_id() method. generate_id() returns an ID for a new session if necessary. For the sake consistency, the CGI::Session distribution comes with several ID generators you can inherit from instead of writing your own, including CGI::Session::ID::MD5 and CGI::Session::ID::Incr.
Serialization
$data, which your driver's store() method receives as the fourth argument, is a hash referencevalues of which can hold almost any Perl data structure ranging from simple strings to references to other hashes to arrays, and even objects. You cannot simply save the data structure in a file or anywhere else without converting it into a string or a stream of data. This process is called "serialization of data." To recreate the data structure, your retrieve() method should deserialize it accordingly.
You can use your own serializing engines if you wish, but the CGI::Session distribution comes with three different serializers you can simply inherit from: CGI::Session::Serialize::Default, CGI::Session::Serialize::Storable, and CGI::Session::Serialize::FreezeThaw. This makes it easy to serialize the data. You can simply call freeze() and store its return value on disk, and call thaw() to deserialize and return it from within your retrieve() method.
For namespace consistency, all the drivers should belong to CGI::Session::*, serializers to CGI::Session::Serialize::*, and ID generators to CGI::Session::ID::*.
For driver authors, the CGI::Session distribution includes BluePrint.pm, which can be used as a starting point for any driver. Just fill in the blanks.
For More Information
To learn more about CGI::Session, check out its online documentation and the driver manuals you intend to use. If you have a tough problem and think CGI::Session can be a solution, or if you want to participate in the new releases of the module, join the CGI::Session mailing list at http://ultracgis.com/mailman/ listinfo/cgi-session_ultracgis.com/.
References
- CGI::Session, Apache::Session, and CGI modules. (All available through CPAN mirrors.)
- RFC 2965HTTP State Management Mechanism (ftp://ftp .rfc-editor.org/in-notes/rfc2965.txt).
TPJ