WDVL.com: The Perl You Need to Know Special: Introduction to mod_perl Part 2

By Aaron Weiss, WDVL.com

Last month we took a magical journey to the land of mod_perl, an
idyllic paradise where the penalties of executing interpreted Perl
code are greatly minimized. In that article, we focused on the
exciting relationship between the Apache web server and the mod_perl module, and how this
relationship optimizes execution of Perl by reducing forking and
caching pre-compiled code within Apache child processes. This
month, we shift our attention more towards actual code, and some
ways in which your Perl code may need to be adapted to function
properly within the environment created by mod_perl.

A Warm Familiar Place

When your Perl scripts are executed within a mod_perl
environment, it is as if they are contained within a soothing and
comfortable bubble. This bubble is like a microcosm of the “real
world”, in this case the real world being the operating system
environment where Perl scripts are normally executed. Think of the
mod_perl bubble like one of those miniature villages captured
inside a glass ball which you can shake to cause a snowstorm or an
earthquake. Life in this bubble is familiar and most things happen
as they should — but this bubble is not really the
outside world, and there are some differences.

Like the terms “Xerox” and “Kleenex”, the term “CGI” has come to
be overextended to all server-side scripts that relate to the web
server and browser. Technically, though, mod_perl scripts are
not CGI scripts. However,
because the CGI technology has come to define how we relate to
parameters between the web server and our scripts, including such
famous examples as Perl’s CGI.pm
module, we generally continue to use the CGI model and vocabulary
in talking about mod_perl. And we continue to use CGI.pm, for
instance, to manipulate form and state parameters sent from the web
browser. All of this assumes you are using CGI.pm version of at
least 2.36 and Perl 5.004+. It is still worth understanding,
though, that mod_perl is in effect emulating the CGI environment
for our convenience.

In a “real” operating system environment, Perl scripts are by
default executed within the package named “main”, assuming you have
not declared otherwise. We haven’t yet discussed packages in the
Perl You Need to
Know series, but you can think of a package as a namespace, or
a context. So, you can have a subroutine or variable named
PEPSI in package “blue” which will be independent of a
subroutine or variable named PEPSI in package “red”. This matters
in mod_perl because Perl scripts do not run within the package
“main” — rather, they run within a uniquely named package based
upon the URI of the web server resource. If you are familiar with
packages, this information alone will be important. If you’re
thinking “so?”, we’ll see two important reasons this package stuff
matters and the solutions over the next few pages.

Scripts are advised against using exit() calls, which
are not supported by mod_perl, which prefers the
Apache::exit() call. That said, there is a safety net in
place and existing exit() calls will be intercepted and
passed to the Apache exit routine.

Because the mod_perl bubble is such a familiar environment, many
Perl scripts do not require modification to run under mod_perl.
However, there are some significant cases of handling variables and
subroutines where you may need to modify your Perl scripts to
behave properly within the familiar yet sometimes strange bubble
world of mod_perl.

My() Troubles

First, some ground rules. When developing and testing your Perl
scripts under mod_perl, there are some conditions you should set
forth which will greatly help you to create properly functional
code. To wit:

Always “use strict” in your Perl code. This rule will force you
to obey certain coding practices which are necessary under the
mod_perl environment, such as how you scope variables. We’ll see
more on this momentarily.
Enable the warning switch in your Perl script header, such as
#!/usr/bin/perl -w. These warnings will be output to your
Apache server’s errorlog and can help you debug and track down
mysterious problems. Do remember however to remove the warning
switch once your code goes live, especially if your script does
produce harmless warnings, or else your errorlog may grow faster
than Louie Anderson at a state fair.
Run your Apache server in “single process mode”. Do this with
the commandline httpd -X, or wherever the appropriate path
to your Apache httpd is. In this mode, Apache will not
spawn any children. One of the most common problems of mod_perl
development is that sometimes Apache children will “remember”
values from a previous invocation of a script, caused by the
optimized nature of mod_perl and improperly coded scripts. Often
this problem is masked when you test as a single user because new
Apache children are spawned, thus failing to guarantee that your
repeated tests are handled by a single child process. Confirming
your scripts in a single Apache process will provide peace of mind
that there are no hidden problems being obscured by the presence of
multiple Apache children.

All that said, let’s look at a sample mod_perl script which
suffers from a mysterious ailment.

#!/usr/bin/perl -w 
use strict; 

use CGI; 

my $cgiobj=new CGI; 

print $cgiobj->header; 

my $name=$cgiobj->param("firstname").' '.
   $cgiobj->param("lastname");

print &formatName($name); 

       

sub formatName { 

 return uc($name); 

}

We run this script through the browser, by passing a URL to it
with some parameters; e.g.
http://my.host/cgi-bin/welcome.cgi?firstname=martin&lastname=mungbean

And so the web page displays:

MARTIN MUNGBEAN

Notice that the one simple function of this script is to output
the supplied name in all uppercase, via the uc()
function.

Keeping in mind that Apache is running in single process mode
(-X), we run the script again through the web browser, this time
with the parameters firstname=jane&lastname=frowny.
And so the web page displays:

MARTIN MUNGBEAN

No, that’s not a typo on our part. The script seemed to ignore
Jane Frowny. Yet, if we went and tried to execute this script from
the command line rather than through the web browser (i.e.
mod_perl) and supplied the appropriate parameters, it would output
the correct name each time. So what’s wrong with mod_perl? Nothing
— the question is, what is wrong with this script?

Ideally, we were hoping to create $name as a global
variable that any subroutine could access. Often times this is not
recommended, but there are cases where it is realistic. But we
can’t just create a true global variable because this is disallowed
by the strict rule, which we must use with mod_perl. So,
we scope our variables using my(). You can see that we
declared $name using my() in the outer code of
this example, and then we attempt to reference $name from
inside a subroutine. This type of reference will cause problems in
Perl when we use nested subroutines — that is, a subroutine within
a subroutine. In fact, because we have the warning switch enabled
for Perl, you can see in the Apache errorlog a warning about just
this problem:

Variable “$name” will not stay shared at
/home/username/cgi-bin/test.cgi line 10.

From the looks of it, our example code does not use a nested
subroutine — so where’s the problem? As we alluded to earlier,
Perl scripts under mod_perl do not run inside the main package;
consequently, code that appears to be outside of any subroutines in
our script is actually nested, because our whole script is nested,
in a manner of speaking, inside a mod_perl subroutine named
handler. This causes us to suffer from the nested
subroutine problem with my() variables — and, as we’ve
seen, the Apache child process “remembers” the parameter values
supplied on first invocation, and uses those precompiled values for
each subsequent invocation. This is not good.

Solutions are good, and there are several. One solution, of
course, is not to treat $name as a global variable, but
rather to pass it in and out of any subroutines:

    
print &formatName($name); 

       

sub formatName { 

 my ($name)=@_;
 return uc($name); 

}

This solution is advised where possible, but sometimes it is
just not feasible to pass a variable around to every subroutine
that uses it, especially when one subroutine does not need it but
it needs to call another subroutine which does (it happens!).
There’s a better way.

Repackage Your Way to Success

Often the easiest solution to this problem is to separate your
Perl code into a package of its own. Typically you can do this by
breaking your Perl script into two parts. Below, we break the
example script we’ve seen into two files: “name.cgi” and
“name_lib.pl“.

name.cgi

#!/usr/bin/perl -w                             
use strict;
use CGI;
require "/home/username/cgi-bin/name_lib.pl"; 

&name_main::init();

name_lib.pl

package name_main;

sub init {
 my $cgiobj=new CGI;
 print $cgiobj->header;
  
   $name=$cgiobj->param("firstname").' '.
        $cgiobj->param("lastname");
   print &formatName($name);
    
     sub formatName {
       return uc($name);
     }
}
        
1;
__END__

What we’ve done here is basically to encapsulate our code into a
library package, which we then pull-in via a require into
the mod_perl package space where name.cgi will be
executed. The name.cgi file, which is the file we call in
a URL, specifies which modules and libraries to pull in — as you
see, this is where we’ve specified the strict and
CGI module, and where we pull in what is now our library,
what was the meat of this script, name_lib.pl. The main
code of our script has been wrapped inside a subroutine,
init(), which is called using the fully qualified name in
name.cgi.

The strange notation at the end of the package with the number 1
and the __END__ token are needed to appease the Perl interpreter,
which wants to receive a true value when it pulls in the package.
If you leave these out the script won’t work and an appropriate
reminder will be dropped into your Apache errorlog.

Looking at the code in name_lib.pl, you can see that
we’ve dropped the my() declaration from $name.
Now it’s as if we’ve created the global variable we want — and we
have — although it is only global within this package that we’ve
named name_main. Note that if we later decided to use
$cgiobj within a subroutine in this package, we would also
have to drop its my() declaration, otherwise the nested
subroutine problem would surface again.

This all certainly seems like a lot of acrobatics, but it solves
our problems: by placing most of the code within its own package,
we can create “global” variables within that package, free from the
nested subroutine problem with my() scoped variables.
Although this all may seem a bit confusing at first, when you boil
it all down, it works. The above name.cgi can be reloaded
and reloaded in your browser with different parameters and it will
never “remember” an earlier invocation.

Compilation Amnesia

The curious reader will wonder, besides why people are madly
buying cars that are too large to fit into parking spaces, why we
wrapped the outer code of name_lib.pl inside a seemingly
unnecessary init() subroutine, and then called this
subroutine from name.cgi? After all, wasn’t the
require() enough to pull all the code together?

In fact, it is true that if you did not wrap the code of
name_lib.pl inside of a subroutine, it would simply have
been executed in sequence when name.cgi was called. It
seems we didn’t really need &name_main::init(); after
all. But wait! If you’ve tried to test this, now hit reload in the
browser and execute the script a second time. Something new:
“Document returned no data”. Try again — same message. Now what’s
going on?

When you have “naked” code at the start of your pulled-in
package, also known as a BEGIN block, mod_perl generally only
executes this code once per compilation. That usually translates
into once per child process — on first invocation. The solution,
as we’ve already seen, is not to code this way, but to enclose your
control code inside a subroutine within its package, and call this
subroutine from the mod_perl script, exactly as we did in
name.cgi.

Stubborn, Too

Not only will mod_perl conveniently “forget” about BEGIN blocks
pulled in via require, it will also tend to ignore changes you make
to your library packages or modules. Return to the script we’ve
created, name.cgi with the library name_lib.pl
pulled in via a require(). Suppose the script doesn’t
work, or the script does work but we want to make changes to its
output. So, we fire up the trusty editor and hack away at
name_lib.pl. Heading back to the browser, hit reload with
great anticipation, and what happens?

What happens is whatever happened the previous time you ran the
script, because mod_perl doesn’t see any of the changes you’ve
slaved over! The stubborn little thing does not check to see if
name_lib.pl has changed, and so when name.cgi is
invoked again it simply pulls in the compiled name_lib.pl
that it’s been using all along.

One way to shock mod_perl into its senses is the drastic method,
akin to cold-water-in-the-face: kill or restart the Apache server.
For instance (modify to match your installation):

/usr/local/apache/bin/apachectl restart

That will force it to re-load any requested files from the disk.
This method works fine in testing, but assuming you make many small
changes to your script during development, you’ll quickly find
restarting the server everytime you want to test a change rather
tedious. No, not coal mining tedious, or transcontinental railroad
spike pounding tedious, but tedious nonetheless. Sometimes it’s the
little things.

Our hero is named StatINC, which is an Apache Perl
module that will effectively check the files you pull in via a
require() to see if they’ve been updated, every time your
mod_perl script is invoked. To enable StatINC, you’ll need
to modify the Apache server configuration file,
httpd.conf, found in /path/to/apache/conf/
(modify to match your installation).

Find the portion of your httpd.conf file where mod_perl
is configured; using last month’s example, that section might look
like:

Alias /cgi-perl/ "/usr/local/apache/cgi-perl/"    

 

 SetHandler perl-script 

 PerlHandler Apache::Registry 

 Options ExecCGI 

 PerlSendHeader On

Now, we need to add three lines to the above (or whatever
configuration is similar to the above in your httpd.conf),
marked below in bold and red.

PerlModule Apache::StatINC
Alias /cgi-perl/ "/usr/local/apache/cgi-perl/"    

 

 SetHandler perl-script 

 PerlHandler Apache::Registry 

 Options ExecCGI 

 PerlSendHeader On

 PerlInitHandler Apache::StatINC

 PerlSetVar StatINCDebug On

With StatINC enabled with debugging on, as seen above,
the Apache errorlog will now contain messages when a change to one
of your require’d script files is detected:

Apache::StatINC: process 15420 
 reloading /home/username/cgi-bin/test_lib.pl

Conclusion

Every Perl script is different (except plaigarized ones!), but
we’ve looked at some general guidelines for common traps and
hangups many coders run into attempting to write or migrate script
to the mod_perl environment. It may seem like a lot of work, but in
most cases the mod_perl constraints result in better coding
practice, and a deeper understanding of Perl — which must be worth
something! Remember though that mod_perl is not literally about
functionality, but optimization — if modifying large, hairy
existing scripts seems unrealistic, it might be worth considering
whether to use mod_perl at all for those scripts.

In a worst case scenario, if you simply cannot modify an old or
beastly script to work properly in the mod_perl fantasy bubble
world, all efficiency is not lost. One simple change to your Apache
httpd.conf can salvage the effort — to some extent.
Change the line

PerlHandler Apache::Registry

to:

PerlHandler Apache::PerlRun

The above change will abandon the use of mod_perl fantasy bubble
world, and simply execute your Perl scripts the old-fashioned way,
with the exception being that the interpreter is still built into
the web server and therefore doesn’t need to be launched as a
separate process. You certainly lose many optimization
possibilities in resorting to the PerlRun module, but in some cases
any execution is better than none. Again, this is really a last
resort option, and is only a small improvement over not using
mod_perl at all.

Additional Resources

The mod_perl Guide
by Stas Bekman is a most thorough exploration of mod_perl.
mod_perl_traps
is a concise and clear summary of many issues we’ve seen this
month.