|
| Current Newswire:
WDVL.com: The Perl You Need to Know Special: Introduction to mod_perl Part 2May 15, 2000, 23:27 (1 Talkback[s])(Other stories by Aaron Weiss) By Aaron Weiss, WDVL.com Last month we took a magical journey to the land of mod_perl, an idyllic paradise where the penalties of executing interpreted Perl code are greatly minimized. In that article, we focused on the exciting relationship between the Apache web server and the mod_perl module, and how this relationship optimizes execution of Perl by reducing forking and caching pre-compiled code within Apache child processes. This month, we shift our attention more towards actual code, and some ways in which your Perl code may need to be adapted to function properly within the environment created by mod_perl. A Warm Familiar Place When your Perl scripts are executed within a mod_perl environment, it is as if they are contained within a soothing and comfortable bubble. This bubble is like a microcosm of the "real world", in this case the real world being the operating system environment where Perl scripts are normally executed. Think of the mod_perl bubble like one of those miniature villages captured inside a glass ball which you can shake to cause a snowstorm or an earthquake. Life in this bubble is familiar and most things happen as they should -- but this bubble is not really the outside world, and there are some differences. Like the terms "Xerox" and "Kleenex", the term "CGI" has come to be overextended to all server-side scripts that relate to the web server and browser. Technically, though, mod_perl scripts are not CGI scripts. However, because the CGI technology has come to define how we relate to parameters between the web server and our scripts, including such famous examples as Perl's CGI.pm module, we generally continue to use the CGI model and vocabulary in talking about mod_perl. And we continue to use CGI.pm, for instance, to manipulate form and state parameters sent from the web browser. All of this assumes you are using CGI.pm version of at least 2.36 and Perl 5.004+. It is still worth understanding, though, that mod_perl is in effect emulating the CGI environment for our convenience. In a "real" operating system environment, Perl scripts are by default executed within the package named "main", assuming you have not declared otherwise. We haven't yet discussed packages in the Perl You Need to Know series, but you can think of a package as a namespace, or a context. So, you can have a subroutine or variable named PEPSI in package "blue" which will be independent of a subroutine or variable named PEPSI in package "red". This matters in mod_perl because Perl scripts do not run within the package "main" -- rather, they run within a uniquely named package based upon the URI of the web server resource. If you are familiar with packages, this information alone will be important. If you're thinking "so?", we'll see two important reasons this package stuff matters and the solutions over the next few pages. Scripts are advised against using exit() calls, which are not supported by mod_perl, which prefers the Apache::exit() call. That said, there is a safety net in place and existing exit() calls will be intercepted and passed to the Apache exit routine. Because the mod_perl bubble is such a familiar environment, many Perl scripts do not require modification to run under mod_perl. However, there are some significant cases of handling variables and subroutines where you may need to modify your Perl scripts to behave properly within the familiar yet sometimes strange bubble world of mod_perl. My() TroublesFirst, some ground rules. When developing and testing your Perl scripts under mod_perl, there are some conditions you should set forth which will greatly help you to create properly functional code. To wit:
All that said, let's look at a sample mod_perl script which suffers from a mysterious ailment.
We run this script through the browser, by passing a URL to it
with some parameters; e.g. And so the web page displays: MARTIN MUNGBEAN Notice that the one simple function of this script is to output the supplied name in all uppercase, via the uc() function. Keeping in mind that Apache is running in single process mode (-X), we run the script again through the web browser, this time with the parameters firstname=jane&lastname=frowny. And so the web page displays: MARTIN MUNGBEAN No, that's not a typo on our part. The script seemed to ignore Jane Frowny. Yet, if we went and tried to execute this script from the command line rather than through the web browser (i.e. mod_perl) and supplied the appropriate parameters, it would output the correct name each time. So what's wrong with mod_perl? Nothing -- the question is, what is wrong with this script? Ideally, we were hoping to create $name as a global variable that any subroutine could access. Often times this is not recommended, but there are cases where it is realistic. But we can't just create a true global variable because this is disallowed by the strict rule, which we must use with mod_perl. So, we scope our variables using my(). You can see that we declared $name using my() in the outer code of this example, and then we attempt to reference $name from inside a subroutine. This type of reference will cause problems in Perl when we use nested subroutines -- that is, a subroutine within a subroutine. In fact, because we have the warning switch enabled for Perl, you can see in the Apache errorlog a warning about just this problem:
From the looks of it, our example code does not use a nested subroutine -- so where's the problem? As we alluded to earlier, Perl scripts under mod_perl do not run inside the main package; consequently, code that appears to be outside of any subroutines in our script is actually nested, because our whole script is nested, in a manner of speaking, inside a mod_perl subroutine named handler. This causes us to suffer from the nested subroutine problem with my() variables -- and, as we've seen, the Apache child process "remembers" the parameter values supplied on first invocation, and uses those precompiled values for each subsequent invocation. This is not good. Solutions are good, and there are several. One solution, of course, is not to treat $name as a global variable, but rather to pass it in and out of any subroutines: print &formatName($name); sub formatName { my ($name)=@_; return uc($name); } This solution is advised where possible, but sometimes it is just not feasible to pass a variable around to every subroutine that uses it, especially when one subroutine does not need it but it needs to call another subroutine which does (it happens!). There's a better way. Repackage Your Way to SuccessOften the easiest solution to this problem is to separate your Perl code into a package of its own. Typically you can do this by breaking your Perl script into two parts. Below, we break the example script we've seen into two files: "name.cgi" and "name_lib.pl". name.cgi #!/usr/bin/perl -w use strict; use CGI; require "/home/username/cgi-bin/name_lib.pl"; &name_main::init(); name_lib.pl package name_main; sub init { my $cgiobj=new CGI; print $cgiobj->header; $name=$cgiobj->param("firstname").' '. $cgiobj->param("lastname"); print &formatName($name); sub formatName { return uc($name); } } 1; __END__ What we've done here is basically to encapsulate our code into a library package, which we then pull-in via a require into the mod_perl package space where name.cgi will be executed. The name.cgi file, which is the file we call in a URL, specifies which modules and libraries to pull in -- as you see, this is where we've specified the strict and CGI module, and where we pull in what is now our library, what was the meat of this script, name_lib.pl. The main code of our script has been wrapped inside a subroutine, init(), which is called using the fully qualified name in name.cgi. The strange notation at the end of the package with the number 1 and the __END__ token are needed to appease the Perl interpreter, which wants to receive a true value when it pulls in the package. If you leave these out the script won't work and an appropriate reminder will be dropped into your Apache errorlog. Looking at the code in name_lib.pl, you can see that we've dropped the my() declaration from $name. Now it's as if we've created the global variable we want -- and we have -- although it is only global within this package that we've named name_main. Note that if we later decided to use $cgiobj within a subroutine in this package, we would also have to drop its my() declaration, otherwise the nested subroutine problem would surface again. This all certainly seems like a lot of acrobatics, but it solves our problems: by placing most of the code within its own package, we can create "global" variables within that package, free from the nested subroutine problem with my() scoped variables. Although this all may seem a bit confusing at first, when you boil it all down, it works. The above name.cgi can be reloaded and reloaded in your browser with different parameters and it will never "remember" an earlier invocation. Compilation AmnesiaThe curious reader will wonder, besides why people are madly buying cars that are too large to fit into parking spaces, why we wrapped the outer code of name_lib.pl inside a seemingly unnecessary init() subroutine, and then called this subroutine from name.cgi? After all, wasn't the require() enough to pull all the code together? In fact, it is true that if you did not wrap the code of name_lib.pl inside of a subroutine, it would simply have been executed in sequence when name.cgi was called. It seems we didn't really need &name_main::init(); after all. But wait! If you've tried to test this, now hit reload in the browser and execute the script a second time. Something new: "Document returned no data". Try again -- same message. Now what's going on? When you have "naked" code at the start of your pulled-in package, also known as a BEGIN block, mod_perl generally only executes this code once per compilation. That usually translates into once per child process -- on first invocation. The solution, as we've already seen, is not to code this way, but to enclose your control code inside a subroutine within its package, and call this subroutine from the mod_perl script, exactly as we did in name.cgi. Stubborn, TooNot only will mod_perl conveniently "forget" about BEGIN blocks pulled in via require, it will also tend to ignore changes you make to your library packages or modules. Return to the script we've created, name.cgi with the library name_lib.pl pulled in via a require(). Suppose the script doesn't work, or the script does work but we want to make changes to its output. So, we fire up the trusty editor and hack away at name_lib.pl. Heading back to the browser, hit reload with great anticipation, and what happens? What happens is whatever happened the previous time you ran the script, because mod_perl doesn't see any of the changes you've slaved over! The stubborn little thing does not check to see if name_lib.pl has changed, and so when name.cgi is invoked again it simply pulls in the compiled name_lib.pl that it's been using all along. One way to shock mod_perl into its senses is the drastic method, akin to cold-water-in-the-face: kill or restart the Apache server. For instance (modify to match your installation): /usr/local/apache/bin/apachectl restart That will force it to re-load any requested files from the disk. This method works fine in testing, but assuming you make many small changes to your script during development, you'll quickly find restarting the server everytime you want to test a change rather tedious. No, not coal mining tedious, or transcontinental railroad spike pounding tedious, but tedious nonetheless. Sometimes it's the little things. Our hero is named StatINC, which is an Apache Perl module that will effectively check the files you pull in via a require() to see if they've been updated, every time your mod_perl script is invoked. To enable StatINC, you'll need to modify the Apache server configuration file, httpd.conf, found in /path/to/apache/conf/ (modify to match your installation). Find the portion of your httpd.conf file where mod_perl is configured; using last month's example, that section might look like: Alias /cgi-perl/ "/usr/local/apache/cgi-perl/" SetHandler perl-script PerlHandler Apache::Registry Options ExecCGI PerlSendHeader On Now, we need to add three lines to the above (or whatever configuration is similar to the above in your httpd.conf), marked below in bold and red. PerlModule Apache::StatINC Alias /cgi-perl/ "/usr/local/apache/cgi-perl/" SetHandler perl-script PerlHandler Apache::Registry Options ExecCGI PerlSendHeader On PerlInitHandler Apache::StatINC PerlSetVar StatINCDebug On With StatINC enabled with debugging on, as seen above, the Apache errorlog will now contain messages when a change to one of your require'd script files is detected: Apache::StatINC: process 15420 reloading /home/username/cgi-bin/test_lib.pl ConclusionEvery Perl script is different (except plaigarized ones!), but we've looked at some general guidelines for common traps and hangups many coders run into attempting to write or migrate script to the mod_perl environment. It may seem like a lot of work, but in most cases the mod_perl constraints result in better coding practice, and a deeper understanding of Perl -- which must be worth something! Remember though that mod_perl is not literally about functionality, but optimization -- if modifying large, hairy existing scripts seems unrealistic, it might be worth considering whether to use mod_perl at all for those scripts. In a worst case scenario, if you simply cannot modify an old or beastly script to work properly in the mod_perl fantasy bubble world, all efficiency is not lost. One simple change to your Apache httpd.conf can salvage the effort -- to some extent. Change the line PerlHandler Apache::Registry to: PerlHandler Apache::PerlRun The above change will abandon the use of mod_perl fantasy bubble world, and simply execute your Perl scripts the old-fashioned way, with the exception being that the interpreter is still built into the web server and therefore doesn't need to be launched as a separate process. You certainly lose many optimization possibilities in resorting to the PerlRun module, but in some cases any execution is better than none. Again, this is really a last resort option, and is only a small improvement over not using mod_perl at all. Additional Resources
Related Stories:
|