Linux Today: Linux News On Internet Time.
Search Linux Today
Linux News Sections:  Developer -  High Performance -  Infrastructure -  IT Management -  Security -  Storage -
Linux Today Navigation
LT Home
Contribute
Contribute
Link to Us
Linux Jobs


More on LinuxToday


WDVL.com: The Perl You Need to Know Special: Introduction to mod_perl Part 2

May 15, 2000, 23:27 (1 Talkback[s])
(Other stories by Aaron Weiss)

By Aaron Weiss, WDVL.com

Last month we took a magical journey to the land of mod_perl, an idyllic paradise where the penalties of executing interpreted Perl code are greatly minimized. In that article, we focused on the exciting relationship between the Apache web server and the mod_perl module, and how this relationship optimizes execution of Perl by reducing forking and caching pre-compiled code within Apache child processes. This month, we shift our attention more towards actual code, and some ways in which your Perl code may need to be adapted to function properly within the environment created by mod_perl.

A Warm Familiar Place

When your Perl scripts are executed within a mod_perl environment, it is as if they are contained within a soothing and comfortable bubble. This bubble is like a microcosm of the "real world", in this case the real world being the operating system environment where Perl scripts are normally executed. Think of the mod_perl bubble like one of those miniature villages captured inside a glass ball which you can shake to cause a snowstorm or an earthquake. Life in this bubble is familiar and most things happen as they should -- but this bubble is not really the outside world, and there are some differences.

Like the terms "Xerox" and "Kleenex", the term "CGI" has come to be overextended to all server-side scripts that relate to the web server and browser. Technically, though, mod_perl scripts are not CGI scripts. However, because the CGI technology has come to define how we relate to parameters between the web server and our scripts, including such famous examples as Perl's CGI.pm module, we generally continue to use the CGI model and vocabulary in talking about mod_perl. And we continue to use CGI.pm, for instance, to manipulate form and state parameters sent from the web browser. All of this assumes you are using CGI.pm version of at least 2.36 and Perl 5.004+. It is still worth understanding, though, that mod_perl is in effect emulating the CGI environment for our convenience.

In a "real" operating system environment, Perl scripts are by default executed within the package named "main", assuming you have not declared otherwise. We haven't yet discussed packages in the Perl You Need to Know series, but you can think of a package as a namespace, or a context. So, you can have a subroutine or variable named PEPSI in package "blue" which will be independent of a subroutine or variable named PEPSI in package "red". This matters in mod_perl because Perl scripts do not run within the package "main" -- rather, they run within a uniquely named package based upon the URI of the web server resource. If you are familiar with packages, this information alone will be important. If you're thinking "so?", we'll see two important reasons this package stuff matters and the solutions over the next few pages.

Scripts are advised against using exit() calls, which are not supported by mod_perl, which prefers the Apache::exit() call. That said, there is a safety net in place and existing exit() calls will be intercepted and passed to the Apache exit routine.

Because the mod_perl bubble is such a familiar environment, many Perl scripts do not require modification to run under mod_perl. However, there are some significant cases of handling variables and subroutines where you may need to modify your Perl scripts to behave properly within the familiar yet sometimes strange bubble world of mod_perl.

My() Troubles

First, some ground rules. When developing and testing your Perl scripts under mod_perl, there are some conditions you should set forth which will greatly help you to create properly functional code. To wit:

  1. Always "use strict" in your Perl code. This rule will force you to obey certain coding practices which are necessary under the mod_perl environment, such as how you scope variables. We'll see more on this momentarily.

  2. Enable the warning switch in your Perl script header, such as #!/usr/bin/perl -w. These warnings will be output to your Apache server's errorlog and can help you debug and track down mysterious problems. Do remember however to remove the warning switch once your code goes live, especially if your script does produce harmless warnings, or else your errorlog may grow faster than Louie Anderson at a state fair.

  3. Run your Apache server in "single process mode". Do this with the commandline httpd -X, or wherever the appropriate path to your Apache httpd is. In this mode, Apache will not spawn any children. One of the most common problems of mod_perl development is that sometimes Apache children will "remember" values from a previous invocation of a script, caused by the optimized nature of mod_perl and improperly coded scripts. Often this problem is masked when you test as a single user because new Apache children are spawned, thus failing to guarantee that your repeated tests are handled by a single child process. Confirming your scripts in a single Apache process will provide peace of mind that there are no hidden problems being obscured by the presence of multiple Apache children.

All that said, let's look at a sample mod_perl script which suffers from a mysterious ailment.

#!/usr/bin/perl -w 
use strict; 

use CGI; 

my $cgiobj=new CGI; 

print $cgiobj->header; 

my $name=$cgiobj->param("firstname").' '.
   $cgiobj->param("lastname");

print &formatName($name); 

       

sub formatName { 

 return uc($name); 

} 

We run this script through the browser, by passing a URL to it with some parameters; e.g.
http://my.host/cgi-bin/welcome.cgi?firstname=martin&lastname=mungbean

And so the web page displays:

MARTIN MUNGBEAN

Notice that the one simple function of this script is to output the supplied name in all uppercase, via the uc() function.

Keeping in mind that Apache is running in single process mode (-X), we run the script again through the web browser, this time with the parameters firstname=jane&lastname=frowny. And so the web page displays:

MARTIN MUNGBEAN

No, that's not a typo on our part. The script seemed to ignore Jane Frowny. Yet, if we went and tried to execute this script from the command line rather than through the web browser (i.e. mod_perl) and supplied the appropriate parameters, it would output the correct name each time. So what's wrong with mod_perl? Nothing -- the question is, what is wrong with this script?

Ideally, we were hoping to create $name as a global variable that any subroutine could access. Often times this is not recommended, but there are cases where it is realistic. But we can't just create a true global variable because this is disallowed by the strict rule, which we must use with mod_perl. So, we scope our variables using my(). You can see that we declared $name using my() in the outer code of this example, and then we attempt to reference $name from inside a subroutine. This type of reference will cause problems in Perl when we use nested subroutines -- that is, a subroutine within a subroutine. In fact, because we have the warning switch enabled for Perl, you can see in the Apache errorlog a warning about just this problem:

Variable "$name" will not stay shared at
/home/username/cgi-bin/test.cgi line 10.

From the looks of it, our example code does not use a nested subroutine -- so where's the problem? As we alluded to earlier, Perl scripts under mod_perl do not run inside the main package; consequently, code that appears to be outside of any subroutines in our script is actually nested, because our whole script is nested, in a manner of speaking, inside a mod_perl subroutine named handler. This causes us to suffer from the nested subroutine problem with my() variables -- and, as we've seen, the Apache child process "remembers" the parameter values supplied on first invocation, and uses those precompiled values for each subsequent invocation. This is not good.

Solutions are good, and there are several. One solution, of course, is not to treat $name as a global variable, but rather to pass it in and out of any subroutines:

    
print &formatName($name); 

       

sub formatName { 

 my ($name)=@_;
 return uc($name); 

} 

This solution is advised where possible, but sometimes it is just not feasible to pass a variable around to every subroutine that uses it, especially when one subroutine does not need it but it needs to call another subroutine which does (it happens!). There's a better way.

Repackage Your Way to Success

Often the easiest solution to this problem is to separate your Perl code into a package of its own. Typically you can do this by breaking your Perl script into two parts. Below, we break the example script we've seen into two files: "name.cgi" and "name_lib.pl".

name.cgi

#!/usr/bin/perl -w                             
use strict;
use CGI;
require "/home/username/cgi-bin/name_lib.pl"; 

&name_main::init();

name_lib.pl

package name_main;

sub init {
 my $cgiobj=new CGI;
 print $cgiobj->header;
  
   $name=$cgiobj->param("firstname").' '.
        $cgiobj->param("lastname");
   print &formatName($name);
    
     sub formatName {
       return uc($name);
     }
}
        
1;
__END__

What we've done here is basically to encapsulate our code into a library package, which we then pull-in via a require into the mod_perl package space where name.cgi will be executed. The name.cgi file, which is the file we call in a URL, specifies which modules and libraries to pull in -- as you see, this is where we've specified the strict and CGI module, and where we pull in what is now our library, what was the meat of this script, name_lib.pl. The main code of our script has been wrapped inside a subroutine, init(), which is called using the fully qualified name in name.cgi.

The strange notation at the end of the package with the number 1 and the __END__ token are needed to appease the Perl interpreter, which wants to receive a true value when it pulls in the package. If you leave these out the script won't work and an appropriate reminder will be dropped into your Apache errorlog.

Looking at the code in name_lib.pl, you can see that we've dropped the my() declaration from $name. Now it's as if we've created the global variable we want -- and we have -- although it is only global within this package that we've named name_main. Note that if we later decided to use $cgiobj within a subroutine in this package, we would also have to drop its my() declaration, otherwise the nested subroutine problem would surface again.

This all certainly seems like a lot of acrobatics, but it solves our problems: by placing most of the code within its own package, we can create "global" variables within that package, free from the nested subroutine problem with my() scoped variables. Although this all may seem a bit confusing at first, when you boil it all down, it works. The above name.cgi can be reloaded and reloaded in your browser with different parameters and it will never "remember" an earlier invocation.

Compilation Amnesia

The curious reader will wonder, besides why people are madly buying cars that are too large to fit into parking spaces, why we wrapped the outer code of name_lib.pl inside a seemingly unnecessary init() subroutine, and then called this subroutine from name.cgi? After all, wasn't the require() enough to pull all the code together?

In fact, it is true that if you did not wrap the code of name_lib.pl inside of a subroutine, it would simply have been executed in sequence when name.cgi was called. It seems we didn't really need &name_main::init(); after all. But wait! If you've tried to test this, now hit reload in the browser and execute the script a second time. Something new: "Document returned no data". Try again -- same message. Now what's going on?

When you have "naked" code at the start of your pulled-in package, also known as a BEGIN block, mod_perl generally only executes this code once per compilation. That usually translates into once per child process -- on first invocation. The solution, as we've already seen, is not to code this way, but to enclose your control code inside a subroutine within its package, and call this subroutine from the mod_perl script, exactly as we did in name.cgi.

Stubborn, Too

Not only will mod_perl conveniently "forget" about BEGIN blocks pulled in via require, it will also tend to ignore changes you make to your library packages or modules. Return to the script we've created, name.cgi with the library name_lib.pl pulled in via a require(). Suppose the script doesn't work, or the script does work but we want to make changes to its output. So, we fire up the trusty editor and hack away at name_lib.pl. Heading back to the browser, hit reload with great anticipation, and what happens?

What happens is whatever happened the previous time you ran the script, because mod_perl doesn't see any of the changes you've slaved over! The stubborn little thing does not check to see if name_lib.pl has changed, and so when name.cgi is invoked again it simply pulls in the compiled name_lib.pl that it's been using all along.

One way to shock mod_perl into its senses is the drastic method, akin to cold-water-in-the-face: kill or restart the Apache server. For instance (modify to match your installation):

/usr/local/apache/bin/apachectl restart

That will force it to re-load any requested files from the disk. This method works fine in testing, but assuming you make many small changes to your script during development, you'll quickly find restarting the server everytime you want to test a change rather tedious. No, not coal mining tedious, or transcontinental railroad spike pounding tedious, but tedious nonetheless. Sometimes it's the little things.

Our hero is named StatINC, which is an Apache Perl module that will effectively check the files you pull in via a require() to see if they've been updated, every time your mod_perl script is invoked. To enable StatINC, you'll need to modify the Apache server configuration file, httpd.conf, found in /path/to/apache/conf/ (modify to match your installation).

Find the portion of your httpd.conf file where mod_perl is configured; using last month's example, that section might look like:

Alias /cgi-perl/ "/usr/local/apache/cgi-perl/"    

 

 SetHandler perl-script 

 PerlHandler Apache::Registry 

 Options ExecCGI 

 PerlSendHeader On 


Now, we need to add three lines to the above (or whatever configuration is similar to the above in your httpd.conf), marked below in bold and red.


PerlModule Apache::StatINC
Alias /cgi-perl/ "/usr/local/apache/cgi-perl/"    

 

 SetHandler perl-script 

 PerlHandler Apache::Registry 

 Options ExecCGI 

 PerlSendHeader On

 PerlInitHandler Apache::StatINC

 PerlSetVar StatINCDebug On 


With StatINC enabled with debugging on, as seen above, the Apache errorlog will now contain messages when a change to one of your require'd script files is detected:

Apache::StatINC: process 15420 
 reloading /home/username/cgi-bin/test_lib.pl

Conclusion

Every Perl script is different (except plaigarized ones!), but we've looked at some general guidelines for common traps and hangups many coders run into attempting to write or migrate script to the mod_perl environment. It may seem like a lot of work, but in most cases the mod_perl constraints result in better coding practice, and a deeper understanding of Perl -- which must be worth something! Remember though that mod_perl is not literally about functionality, but optimization -- if modifying large, hairy existing scripts seems unrealistic, it might be worth considering whether to use mod_perl at all for those scripts.

In a worst case scenario, if you simply cannot modify an old or beastly script to work properly in the mod_perl fantasy bubble world, all efficiency is not lost. One simple change to your Apache httpd.conf can salvage the effort -- to some extent. Change the line

PerlHandler Apache::Registry

to:

PerlHandler Apache::PerlRun

The above change will abandon the use of mod_perl fantasy bubble world, and simply execute your Perl scripts the old-fashioned way, with the exception being that the interpreter is still built into the web server and therefore doesn't need to be launched as a separate process. You certainly lose many optimization possibilities in resorting to the PerlRun module, but in some cases any execution is better than none. Again, this is really a last resort option, and is only a small improvement over not using mod_perl at all.

Additional Resources

  • The mod_perl Guide by Stas Bekman is a most thorough exploration of mod_perl.
  • mod_perl_traps is a concise and clear summary of many issues we've seen this month.

Related Stories: