WDVL.com: The Perl You Need to Know Special: Introduction to mod_perlApr 11, 2000, 18:45 (0 Talkback[s])
(Other stories by Aaron Weiss)
% cd /usr/src % lwp-download http://www.apache.org/dist/apache_x.x.x.tar.gz % lwp-download http://perl.apache.org/dist/mod_perl-x.xx.tar.gz % tar xzvf apache_x.x.x.tar.gz % tar xzvf mod_perl-x.xx.tar.gz % cd mod_perl-x.xx % perl Makefile.PL APACHE_SRC=../apache_x.x.x/src \ DO_HTTPD=1 USE_APACI=1 EVERYTHING=1 % make && make test && make install % cd ../apache_x.x.x % make install
As illustrated, you simply need to unpack the Apache and mod_perl sources into respective subdirectories, then change into the mod_perl source directory and execute the "perl Makefile.PL" command illustrated above. This tells the compiler where to find the Apache sources and what options to build in -- the above routine defaults to "everything" which is satisfactory for most uses and certainly a first time experience. Finally, the sources are all built while your computer churns and smokes for a few minutes, and installed into place, typically /usr/local/apache.
Assuming a /usr/local/apache destination, the new httpd (the binary for the Apache server) will be found in /usr/local/apache/bin.
If you've previously compiled an Apache server you may have noticed that the typical httpd size is between 300-400K. Now, with mod_perl integrated, the httpd has ballooned to over 1 megabyte. Perl is, you can see, as William Shatner would shill, "big! really big!". This brings us to the subject of tradeoffs.
Life is a box of compromises. Buffalo wings and cheesecake are a swell meal, but make you fatter. Chicken broth and celery stalks are slimming and dull. And so it is with the Apache web server, which is much more robust with a belly full of Perl. The trouble in the henhouse is that Apache, as we discussed, is pre-forking -- which means that a fat parent server will spawn fat children. Several of them. Isn't that always the way. That's the cost of doing business when you want to execute heavy Perl scripts with aplomb, but most web sites are composed of more than simply Perl scripts -- such as static web pages. And a static web page is like a sheet of paper, lightweight. Unfortunately, if your site is running mod_perl and has many static pages to serve in addition to Perl scripts, that is one fat child process running around carrying a tiny load.
So it's a battle of inefficiencies: vanilla Apache is inefficient at executing Perl scripts via CGI, while mod_perl beefed up Apache is inefficient at serving simple web pages. You need to consider the general breakdown of pages served by your site -- are we looking at 90% Perl scripts vs. 10% simple pages, or 10% Perl scripts vs. 90% simple pages? Likely somewhere in between. At the extremes, your best choice is to choose the most efficient server for most of the time. In a scenario where 10% of your requests trigger Perl scripts, it might be justifiable to live with the relative penalty of CGI for the benefit of a small and compact server process, allowing for more simultaneous visitors in a given amount of memory. If you serve relatively few simple pages, the advantages of a beefy mod_perl server will pay off more than the penalty of a few extra though large processes. Many readers find themselves somewhere between these two poles, though -- say, 30/70 or 40/60 or 50/50.
A nifty solution to this quandary is to run two Apache servers. One Apache server is the small, compact vanilla version while the other is the robust and hefty mod_perl enabled Apache server. Incoming requests are then routed to the mod_perl server when Perl scripts are required, while simple page requests are handled by the lightweight server. Elegant enough, but the devil is in the details. Ultimately, this is the preferred solution when you can't justify serving all content from either a slim or fat Apache server but it has its own pitfalls. You'll need to maintain two separate installation trees for each Apache server, including separate configuration files, and each server will spit out separate log files, making the job of analyzing traffic a bit more complicated. The mod_perl server is typically configured to listen on an alternate network port, such as 8080, but you don't want end users to see this -- all pages should appear to come from one server lest problems arise with firewalls, bookmarks, and so on. This is solved by employing internal proxying within the slim Apache server's configuration file, to redirect requests for Perl scripts to the mod_perl server "behind the scenes". That's the short of it -- the long is simply too long and too off-topic for this article, but we again direct you to Stas Bekman's thorough coverage of multiple server arrangements.
For the sake of simplicity in this introduction, we'll assume a single Apache server which is mod_perl enabled, even if this is not the ideal architecture for sites with lots of static content. The Apache server is configured, prior to launch, in the very long but well commented httpd.conf file which, in a default installation, is found in /usr/local/apache/conf subdirectory. Once again, and not to pass the buck too often, Apache server configuration is a career unto itself, so we will focus only on configuration of the mod_perl aspect.
Simply put, we want to tell Apache to process Perl scripts via the Apache::Registry module, which is mod_perl's pseudo-CGI environment. This allows us to run Perl scripts written for a typical CGI environment (such as using the CGI.pm module) under mod_perl, which is technically not a CGI extension.
The default httpd.conf file installed with Apache is not configured to use mod_perl; instead, it is configured to execute scripts via CGI. You will probably find a configuration directive in your httpd.conf file that looks something like:
ScriptAlias /cgi-bin/ "/usr/local/apache/cgi-bin/"
This directive tells Apache that any files in the relative path /cgi-bin/ should be considered scripts, and launched accordingly. You need to consider whether all scripts on your web site will be Perl and handled by mod_perl, or whether there are other scripts that may still need to execute via CGI. The safest approach is to retain at least one subdirectory for traditional old-style CGI scripts and one subdirectory for your mod_perl Perl scripts. The ScriptAlias directive above must only point to a path with CGI scripts, and not to the path where you want Perl scripts executed from. Let's say, then, that you create a new path -- /usr/local/apache/cgi-perl/ for your mod_perl enabled scripts.
Of course, if you are running mod_perl scripts exclusively, you could simply comment out the ScriptAlias directive by preceding it with a pound symbol (#), and simply use the cgi-bin/ path for your Perl scripts.
Now we're ready to add mod_perl specific configuration directives. If you scroll through the httpd.conf file, you'll find a section which contents the commented heading "Aliases: Add here as many aliases as you need ...". It's easiest to scroll down towards the end of this section, just before it is closed with the tag, and add our new alias here.
Alias /cgi-perl/ "/usr/local/apache/cgi-perl/" SetHandler perl-script PerlHandler Apache::Registry Options ExecCGI PerlSendHeader On
Above, we define an alias, linking /cgi-perl/ to the system path /usr/local/apache/cgi-perl/. The directive references this alias and defines a number of attributes for it. First, we tell Apache to let mod_perl handle these files via the SetHandler directive, and we tell mod_perl to handle them using its Apache::Registry module. The Registry module is basically the star of the show here, as it is what handles emulating a CGI environment and compiling/caching the Perl code. We tell Apache to handle these files as executable via the ExecCGI parameter, otherwise the browser would try to send the script as a text file to the end user -- yikes!. Finally, we tell Apache to send an HTTP header to the browser on script execution -- this is not strictly necessary if your Perl script is well behaved about sending the header itself, such as by the CGI->header() method of the CGI.pm module.
Our mod_perl Apache server is ready to serve. That's the good news. But, like any high performance piece of machinery, mod_perl is not going to provide its optimum benefits right out of the box like this. Before you're ready to tweak and tune, however, it's important to get used to developing scripts in the mod_perl environment (and for better or worse, there is a lot of tweaking and tuning that can be done under the hood). Of course, you'll want to save your Perl scripts to the system directory aliased to /cgi-perl/ or whatever name you chose. Whether you are adapting existing scripts or writing anew, your Perl should interact with the browser just as you did before, via the CGI.pm module, which we looked at way back in Part 2 of the Perl You Need to Know. You can retrieve parameters and send output to the browser just as before, but keep in mind that although we continue to use the label "CGI" as a manner of speaking, scripts executed by mod_perl are not technically using the CGI extension.
Although many Perl scripts will run as-is in the mod_perl environment, you are not yet taking full advantage of mod_perl's benefits. We'll close out this month's installment looking at pre-loading Perl modules. Next month we'll look some more at optimizations, and also at some thorny pitfalls in coding practice that could undermine Perl scripts that otherwise work fine outside of mod_perl.
Your Perl scripts most probably begin by linking in some modules via the use() statement. At the least, you probably:
#!/usr/bin/perl use CGI;
Because your script invocations will likely keep using many of the same modules, one mod_perl optimization is to pre-load these modules, allowing mod_perl to compile them once and keep them resident in memory. Future script executions do not then need to recompile these modules, shaving a few more milliseconds off total execution time. The typical way you can pre-load Perl modules is with the PerlModule directive, which you can place in Apache's httpd.conf file along with your other mod_perl directives:
Alias /cgi-perl/ "/usr/local/apache/cgi-perl/" PerlModule CGI SetHandler perl-script PerlHandler Apache::Registry Options ExecCGI PerlSendHeader On
You can list any other Perl modules you wish to pre-load in the one PerlModule directive, simply separated by spaces. There is a slightly more sophisticated method of pre-loading modules that involves using the PerlRequire directive to load a short script that contains "use ()" statements for each module -- this is not a necessary step to begin with, but is nicely illustrated in Vivek Khera's mod_perl_tuning document.
Just because you've pre-loaded a Perl module does not mean that you forego the "use ()" statement in your Perl script. Leave those in as they are. Perl will not waste time recompiling the module sources, but it will import necessary elements of the module into your script's namespace, allowing you to leave calls to the module unchanged in syntax within your script.
It is tempting and simple to walk away from an introduction to mod_perl thinking that it magically takes care of all optimizations. The magical mod_perl genie just compiles your Perl and everything is milk and honey. Not so fast! The ways in which mod_perl compiles and caches code varies depending on how it is used -- before we become immersed in details next month, go to sleep tonight with a good overview of the ways in which mod_perl can optimize Perl execution.
"Better Than Nothing" Optimization:
If your scripts will not (yet) run under Apache::Registry, substitute the module Apache::PerlRun. This simplest form of optimization will only keep the Perl interpreter inside the Apache server, saving the need to fork a Perl interpreter for each script execution. The savings here are small, but is still faster than pure forking.
"Hands Off" Optimization:
Your script runs under Apache::Registry, but you are too busy/tired/lazy/hungry to further adapt the environment for better optimization. Mod_perl will have to compile both your modules and your scripts once per child server process. When a request is handed to a child process that has not yet invoked the script, the script and modules will need to again be recompiled. Subsequent requests served by that child can rely on the already-compiled code in memory.
You pre-load your Perl modules by including them via a PerlModule or PerlRequire directive in Apache's httpd.conf file. In doing so, the Apache parent now possesses the compiled code for these modules in memory. When a child is spawned, that child inherits the pre-compiled modules, but the child must still compile the main Perl script itself once per process. The compiled script is cached for future requests handled by this child. This is probably the most common level of mod_perl optimization, as it balances Apache parent size against compilation time.
"Extreme but Bloated" Optimization:
It is even possible to give all the Perl code to Apache, compiled into the parent process. Each child would then have no code to compile, as it would inherit all pre-compiled code from the parent. While potentially fastest, this also consumes the most memory, as each child possesses compiled code that it may not use, depending on how many scripts the site uses. Certain benchmarks indicate that the time saved versus Typical Optimization is not generally worth the extra memory consumption, but there are probably exceptions which would benefit from Extreme but Bloated Optimization.