---

WDVL.com: The Perl You Need to Know Special: Introduction to mod_perl

By Aaron Weiss, WDVL.com

Mod_perl, the module that makes for a happy but complex marriage
between Perl and the Apache web server, can ultimately offer
significant performance improvements in Perl-backed web sites.

Perl is
a powerful and flexible as a backend language for web developers,
as the Perl You Need to
Know
series no doubt illustrates. However, serving many pages
which rely on Perl processing can come at a cost in memory and
time. This month we introduce the wonders of mod_perl, an Apache module which integrates Perl
into the Apache web server. We’ll begin by discussing the reasoning
behind mod_perl and its uses, pros, and cons, and in follow-up
articles delve into some code-specific issues when working in a
mod_perl environment. Readers should already be familiar with the
Perl covered in the Perl You Need to Know series — furthermore,
you’ll need hands on access to your own Apache server to employ
mod_perl.

The Story of Forks

Apache, as you may know, is a very popular web server. So
popular, in fact, that as of March 2000 Apache is believed to power
some 60% of web sites on the Internet — and, thank goodness for
open source, it’s free to boot. What an age to be alive! A web
server would be an extremely simple thing if your site only ever
attracted a single visitor at a time. With 6 billion people on this
planet, that’s rather unlikely. Instead, the web server must juggle
and serve a number of suitors simultaneously, not unlike a harried
waitress scurrying between restaurant patrons. Web servers in
general employ one of several schemes for handling incoming
requests, some schemes more efficient than others. Apache, in its
current 1.x incarnation, is what they call a pre-forking
server. This does not mean Apache is older than silverware (“the
time before forks”). Rather, it means that the parent Apache
process “spawns” (like a demon) a number of children processes who
lie in wait anticipating an incoming connection. When a request
comes in, and one child is busy, another child handles that
request. If all the children are busy, Apache may birth more
children depending on the server’s configuration, or — when the
maximum number of children are born — additional requests are
queued and the end user must wait for the next available child
process.

Each child spawned takes up space and resources — namely,
memory and possibly processing time (depending on what it’s doing).
Ideally, Apache keeps just enough children alive to handle incoming
requests. If additional children must be spawned to handle a surge
of requests, Apache will ruthlessly kill them lest they lie around
forever idle, simply consuming resources. The world of Apache is a
brutal place.

How does all of this relate to Perl? A connection request
arrives at an Apache child process, and requests, for example, a
CGI script. The CGI
process occurs external to the Apache child, which means that the
child must fork a new process to launch the CGI script. In
the case of a CGI coded in Perl, the Perl interpreter must be
launched since Perl is not a compiled language. The interpreter is
launched as a separate process, it compiles and executes the Perl
code, and returns the results to the Apache child, who then passes
them along to the visitor. Works great, except for two problems:
it’s slow, since the Perl script has to be re-interpreted every
time it is run, and it consumes even more memory, because the Perl
interpreter must be launched for each execution of the Perl
script.

The above describes your standard garden variety CGI
environment. For sites with low traffic and/or low processing
demands, CGI is easy to implement and the costs are still
reasonable (keep in mind that “slow” in computer terms is still
very, very fast in human terms).Where the CGI model begins to break
down is with sites that must process more than several simultaneous
requests for Perl scripts, and those scripts perform a variety of
activities such as database queries. A web site with these needs
will quickly become bogged down by the sheer inefficiency of CGI,
wasting memory and leaving visitors frustrated with noticeable wait
times.

Enter the Hero

One sunny (or cloudy, we just don’t know) day, a bright fellow
named Doug MacEachern resolved to marry Perl and Apache, so that
rather than interacting as two foreign independent entities, the
two would be joined in holy matrimony, with the advantages and both
combined in union, able to tackle the world till obsolescence do
they part. With a knack for hacking, but perhaps not such a gift
for names, Doug names his new hybrid mod_perl. Put more
accurately, mod_perl is an Apache module that integrates
the Perl interpreter into the Apache web server.

The benefits of this integration are twofold:

  1. Because the Perl interpreter is built into the Apache parent
    process, Perl scripts can be executed much more quickly. At the
    least, the Perl interpreter does not need to be launched for each
    script invocation — at best, depending on configuration, Perl
    modules and/or scripts can be wholly or partially pre-compiled and
    stored in memory. Our focus on mod_perl will be on how to emphasize
    this advantage.

  2. Another benefit of Perl integration is that the Apache server’s
    internal workings are exposed to the Perl interpreter — in short,
    this means that a Perl code can intervene at any stage in request
    processing to take over, or re-implement, the way in which
    processing stages are handled by Apache. This lends to a great deal
    of customization of server behavior, but is admittedly a more
    complex and obscure endeavor than enhancing script performance, and
    we will not focus on this benefit of mod_perl in this short
    series.

Getting the Goods

Most sites that run Apache are based on Unix-like operating systems
such as Linux or FreeBSD, although Apache is also available for the
Windows platforms. You will need to be running an Apache web
server, preferably the newest stable release available (1.3.12 at
the time of writing) to make use of mod_perl, although there are
plug-ins similar to mod_perl for other web servers (nsapi_perl for Netscape
servers, or the commercial PerlEx by ActiveState for
O’Reilly, Microsoft, and Netscape servers).

On Apache under a Unix-like operating system, you can download
the source
for mod_perl
(current version is 1.22). Alternatively, if you
are familiar with the CPAN.pm module the command
install Bundle::Apache will install mod_perl and several
related Perl modules that you may or may not wish to use. You can
also install mod_perl manually, from the source link above, and
then type perldoc Bundle::Apache to view a list of related
modules that you can retrieve and install if you wish.

Apache is also available for Windows, but many Windows Perl
coders use ActiveState’s popular port, ActivePerl. This is a
problem for us here, because mod_perl will not (yet) work under
Windows with ActivePerl. There is hope — you can freely
download a fully bundled set
of binaries containing Apache,
mod_perl, and an alternate port of Perl all for Windows
95/98/NT.

While Windows users have downloaded binaries, many Unix-like
users have downloaded source code. The vagaries of compiling
anything under a Unix environment are complex, but in a typical
scenario you can rely on the built-in compilation scripts included
with Apache and mod_perl. The compilation procedure involves
building of mod_perl first, which then in turn builds the Apache
binaries — the end result will be a new Apache httpd binary. The
installation summary below is reproduced from
Stas Bekman’s thorough “mod_perl Guide”
— you can skip the
first five lines if you’ve already downloaded and unpacked the
Apache and mod_perl sources (which is what these lines do).

  % cd /usr/src
% lwp-download  http://www.apache.org/dist/apache_x.x.x.tar.gz
% lwp-download http://perl.apache.org/dist/mod_perl-x.xx.tar.gz
% tar xzvf apache_x.x.x.tar.gz
% tar xzvf mod_perl-x.xx.tar.gz
% cd mod_perl-x.xx
% perl Makefile.PL APACHE_SRC=../apache_x.x.x/src 
DO_HTTPD=1 USE_APACI=1 EVERYTHING=1
% make && make test && make install
% cd ../apache_x.x.x
% make install

As illustrated, you simply need to unpack the Apache and
mod_perl sources into respective subdirectories, then change into
the mod_perl source directory and execute the “perl Makefile.PL”
command illustrated above. This tells the compiler where to find
the Apache sources and what options to build in — the above
routine defaults to “everything” which is satisfactory for most
uses and certainly a first time experience. Finally, the sources
are all built while your computer churns and smokes for a few
minutes, and installed into place, typically
/usr/local/apache.

Assuming a /usr/local/apache destination, the new httpd
(the binary for the Apache server) will be found in
/usr/local/apache/bin.

Gee, it’s huge

If you’ve previously compiled an Apache server you may have
noticed that the typical httpd size is between 300-400K. Now, with
mod_perl integrated, the httpd has ballooned to over 1 megabyte.
Perl is, you can see, as William Shatner would shill, “big! really
big!”. This brings us to the subject of tradeoffs.

Life is a box of compromises. Buffalo wings and cheesecake are a
swell meal, but make you fatter. Chicken broth and celery stalks
are slimming and dull. And so it is with the Apache web server,
which is much more robust with a belly full of Perl. The trouble in
the henhouse is that Apache, as we discussed, is pre-forking —
which means that a fat parent server will spawn fat children.
Several of them. Isn’t that always the way. That’s the cost of
doing business when you want to execute heavy Perl scripts with
aplomb, but most web sites are composed of more than simply Perl
scripts — such as static web pages. And a static web page is like
a sheet of paper, lightweight. Unfortunately, if your site is
running mod_perl and has many static pages to serve in addition to
Perl scripts, that is one fat child process running around carrying
a tiny load.

So it’s a battle of inefficiencies: vanilla Apache is
inefficient at executing Perl scripts via CGI, while mod_perl
beefed up Apache is inefficient at serving simple web pages. You
need to consider the general breakdown of pages served by your site
— are we looking at 90% Perl scripts vs. 10% simple pages, or 10%
Perl scripts vs. 90% simple pages? Likely somewhere in between. At
the extremes, your best choice is to choose the most efficient
server for most of the time. In a scenario where 10% of your
requests trigger Perl scripts, it might be justifiable to live with
the relative penalty of CGI for the benefit of a small and compact
server process, allowing for more simultaneous visitors in a given
amount of memory. If you serve relatively few simple pages, the
advantages of a beefy mod_perl server will pay off more than the
penalty of a few extra though large processes. Many readers find
themselves somewhere between these two poles, though — say, 30/70
or 40/60 or 50/50.

A nifty solution to this quandary is to run two Apache servers.
One Apache server is the small, compact vanilla version while the
other is the robust and hefty mod_perl enabled Apache server.
Incoming requests are then routed to the mod_perl server when Perl
scripts are required, while simple page requests are handled by the
lightweight server. Elegant enough, but the devil is in the
details. Ultimately, this is the preferred solution when you can’t
justify serving all content from either a slim or fat Apache server
but it has its own pitfalls. You’ll need to maintain two separate
installation trees for each Apache server, including separate
configuration files, and each server will spit out separate log
files, making the job of analyzing traffic a bit more complicated.
The mod_perl server is typically configured to listen on an
alternate network port, such as 8080, but you don’t want end users
to see this — all pages should appear to come from one server lest
problems arise with firewalls, bookmarks, and so on. This is solved
by employing internal proxying within the slim Apache server’s
configuration file, to redirect requests for Perl scripts to the
mod_perl server “behind the scenes”. That’s the short of it — the
long is simply too long and too off-topic for this article, but we
again direct you to Stas Bekman’s thorough coverage of
multiple server arrangements
.

Basic Configuration

For the sake of simplicity in this introduction, we’ll assume a
single Apache server which is mod_perl enabled, even if this is not
the ideal architecture for sites with lots of static content. The
Apache server is configured, prior to launch, in the very long but
well commented httpd.conf file which, in a default
installation, is found in /usr/local/apache/conf
subdirectory. Once again, and not to pass the buck too often,
Apache server configuration is a career unto
itself
, so we will focus only on configuration of the mod_perl
aspect.

Simply put, we want to tell Apache to process Perl scripts via
the Apache::Registry module, which is mod_perl’s pseudo-CGI
environment. This allows us to run Perl scripts written for a
typical CGI environment (such as using the CGI.pm module) under
mod_perl, which is technically not a CGI extension.

The default httpd.conf file installed with Apache is
not configured to use mod_perl; instead, it is configured to
execute scripts via CGI. You will probably find a configuration
directive in your httpd.conf file that looks something
like:

ScriptAlias /cgi-bin/ "/usr/local/apache/cgi-bin/"

This directive tells Apache that any files in the relative path
/cgi-bin/ should be considered scripts, and launched
accordingly. You need to consider whether all scripts on your web
site will be Perl and handled by mod_perl, or whether there are
other scripts that may still need to execute via CGI. The safest
approach is to retain at least one subdirectory for traditional
old-style CGI scripts and one subdirectory for your mod_perl Perl
scripts. The ScriptAlias directive above must
only point to a path with CGI scripts, and not to
the path where you want Perl scripts executed from. Let’s say,
then, that you create a new path —
/usr/local/apache/cgi-perl/ for your mod_perl enabled
scripts.

Of course, if you are running mod_perl scripts exclusively, you
could simply comment out the ScriptAlias directive by
preceding it with a pound symbol (#), and simply use the
cgi-bin/ path for your Perl scripts.

Now we’re ready to add mod_perl specific configuration
directives. If you scroll through the httpd.conf file,
you’ll find a section which contents the commented heading
“Aliases: Add here as many aliases as you need …”. It’s easiest
to scroll down towards the end of this section, just before it is
closed with the tag, and add our new alias here.

Alias /cgi-perl/ "/usr/local/apache/cgi-perl/" 

 
SetHandler perl-script 
PerlHandler Apache::Registry 
Options ExecCGI 
PerlSendHeader On 

Above, we define an alias, linking /cgi-perl/ to the
system path /usr/local/apache/cgi-perl/. The directive
references this alias and defines a number of attributes for it.
First, we tell Apache to let mod_perl handle these files via the
SetHandler directive, and we tell mod_perl to handle them
using its Apache::Registry module. The Registry module is
basically the star of the show here, as it is what handles
emulating a CGI environment and compiling/caching the Perl code. We
tell Apache to handle these files as executable via the
ExecCGI parameter, otherwise the browser would try to send
the script as a text file to the end user — yikes!. Finally, we
tell Apache to send an HTTP header to the browser on script
execution — this is not strictly necessary if your Perl script is
well behaved about sending the header itself, such as by the
CGI->header() method of the CGI.pm module.

Start Your Coding

Our mod_perl Apache server is ready to serve. That’s the good
news. But, like any high performance piece of machinery, mod_perl
is not going to provide its optimum benefits right out of the box
like this. Before you’re ready to tweak and tune, however, it’s
important to get used to developing scripts in the mod_perl
environment (and for better or worse, there is a lot of
tweaking and tuning that can be done under the hood). Of course,
you’ll want to save your Perl scripts to the system directory
aliased to /cgi-perl/ or whatever name you chose. Whether
you are adapting existing scripts or writing anew, your Perl should
interact with the browser just as you did before, via the
CGI.pm module, which we looked at way back in Part 2 of
the Perl You Need to Know
. You can retrieve parameters and send
output to the browser just as before, but keep in mind that
although we continue to use the label “CGI” as a manner of
speaking, scripts executed by mod_perl are not technically using
the CGI extension.

Although many Perl scripts will run as-is in the mod_perl
environment, you are not yet taking full advantage of mod_perl’s
benefits. We’ll close out this month’s installment looking at
pre-loading Perl modules. Next month we’ll look some more at
optimizations, and also at some thorny pitfalls in coding practice
that could undermine Perl scripts that otherwise work fine outside
of mod_perl.

Your Perl scripts most probably begin by linking in some modules
via the use() statement. At the least, you probably:

#!/usr/bin/perl
use CGI;

Because your script invocations will likely keep using many of
the same modules, one mod_perl optimization is to pre-load these
modules, allowing mod_perl to compile them once and keep them
resident in memory. Future script executions do not then need to
recompile these modules, shaving a few more milliseconds off total
execution time. The typical way you can pre-load Perl modules is
with the PerlModule directive, which you can place in
Apache’s httpd.conf file along with your other mod_perl
directives:

Alias /cgi-perl/ 
"/usr/local/apache/cgi-perl/"
PerlModule CGI

SetHandler perl-script
PerlHandler Apache::Registry
Options ExecCGI
PerlSendHeader On

You can list any other Perl modules you wish to pre-load in the
one PerlModule directive, simply separated by spaces.
There is a slightly more sophisticated method of pre-loading
modules that involves using the PerlRequire directive to
load a short script that contains “use ()” statements for each
module — this is not a necessary step to begin with, but is nicely
illustrated in Vivek Khera’s mod_perl_tuning
document
.

Just because you’ve pre-loaded a Perl module does
not mean
that you forego the “use ()” statement in
your Perl script. Leave those in as they are. Perl will not waste
time recompiling the module sources, but it will import necessary
elements of the module into your script’s namespace, allowing you
to leave calls to the module unchanged in syntax within your
script.

Take Home Message: Optimizations

It is tempting and simple to walk away from an introduction to
mod_perl thinking that it magically takes care of all
optimizations. The magical mod_perl genie just compiles your Perl
and everything is milk and honey. Not so fast! The ways in which
mod_perl compiles and caches code varies depending on how it is
used — before we become immersed in details next month, go to
sleep tonight with a good overview of the ways in which mod_perl
can optimize Perl execution.

“Better Than Nothing” Optimization:

If your scripts will not (yet) run under Apache::Registry,
substitute the module Apache::PerlRun. This simplest form of
optimization will only keep the Perl interpreter inside the Apache
server, saving the need to fork a Perl interpreter for each script
execution. The savings here are small, but is still faster than
pure forking.

“Hands Off” Optimization:

Your script runs under Apache::Registry, but you are too
busy/tired/lazy/hungry to further adapt the environment for better
optimization. Mod_perl will have to compile both your modules and
your scripts once per child server process. When a request is
handed to a child process that has not yet invoked the script, the
script and modules will need to again be recompiled. Subsequent
requests served by that child can rely on the already-compiled code
in memory.

“Typical” Optimization:

You pre-load your Perl modules by including them via a
PerlModule or PerlRequire directive in Apache’s
httpd.conf file. In doing so, the Apache parent now
possesses the compiled code for these modules in memory. When a
child is spawned, that child inherits the pre-compiled modules, but
the child must still compile the main Perl script itself once per
process. The compiled script is cached for future requests handled
by this child. This is probably the most common level of mod_perl
optimization, as it balances Apache parent size against compilation
time.

“Extreme but Bloated” Optimization:

It is even possible to give all the Perl code to
Apache, compiled into the parent process. Each child would then
have no code to compile, as it would inherit all pre-compiled code
from the parent. While potentially fastest, this also consumes the
most memory, as each child possesses compiled code that it may not
use, depending on how many scripts the site uses. Certain
benchmarks indicate that the time saved versus Typical Optimization
is not generally worth the extra memory consumption, but there are
probably exceptions which would benefit from Extreme but Bloated
Optimization.

Additional Resources: