Linux Today: Linux News On Internet Time.

WDVL.com: The Perl You Need to Know Special: Introduction to mod_perl Part 3

Jun 20, 2000, 08:25 (1 Talkback[s])
(Other stories by Aaron Weiss)

By Aaron Weiss, WDVL.com

Like all gripping tales of conquering heroes and/or Godfathers, our story of mod_perl is a trilogy. We conclude the mod_perl trilogy with some tricks and sleight-of-hand with which you can squeeze a few extra processor cycles out of your web server -- also known as optimizations. Specifically, we'll look at some ways to take advantage of the mod_perl environment to better optimize usage and, more importantly, re-usage of Perl scripts and database requests.

Reuse and Recycle

If your mod_perl enabled Apache server only needed to execute your Perl scripts one time, forever, we probably wouldn't need this article. Or mod_perl. Or the Internet, for that matter. So it's established that your scripts need to run repeatedly, whether "repeatedly" means once per day, once per minute, or ten times per second. Nobody likes doing the same thing over and over, Apache included -- it's silly and inefficient to repeat an entire task every time you need to reach a certain goal. This brings us to the core purpose of optimization:

Suppose that you need to wash a sinkful of dirty dishes. You'll need hot water from the tap to fill the sink and an open bottle of dish washing detergent, preferably the kind that leaves your hands silky smooth. You could turn on and off the tap for each dish, waiting for the water to get hot each time, re-close and re-open the detergent bottle for each dish, and so on. An onlooker would probably observe that you're not the sharpest tool in the shed -- and they'd be right. But without optimizations, this is exactly what we do with Apache everytime a Perl script is executed: turn on and off the tap water for each dish, wait for the water to get hot, re-open the detergent bottle, and so forth.

To get the most out of mod_perl, we need to think about how mod_perl re-uses compiled code, and design a strategy that best balances the efficiencies of re-using compiled code versus the costs in memory consumption. Understanding mod_perl in this way means understanding the parent-child relationship, which we mean in a purely server-centric way, and not at all in a Dr. Laura/John Bradshaw/Benjamin Spock sort of way.

Who's Your Daddy?

The Apache web server behaves like Wilt Chamberlain, spawning many children. It is a single parent, expected to raise enough children to handle the average load of incoming HTTP requests. Unlike human babies, an Apache child is "born" with all the knowledge of its parent. So, the more know-how, or compiled Perl in this case, we stuff into the parent server, the more its children know at birth. This can be a good thing, but it can also be taken too far -- if daddy has a big full head, then every baby also has a big full head, and with many babies that can eat up a lot of RAM.

Apache's parent process is created at the time the Apache server is started, and its httpd.conf configuration file is evaluated. This is our opportunity to decide what knowledge the parent will pass onto its children. You may recall a typical set of mod_perl configuration directives placed in the httpd.conf file of a mod_perl-compiled Apache server:

Alias /cgi-perl/ "/usr/local/apache/cgi-perl/" 
<Location /cgi-perl> 
SetHandler perl-script
PerlHandler Apache::Registry
Options ExecCGI
PerlSendHeader On
Under the above configuration, all scripts in /usr/local/apache/cgi-perl/ will be handled by mod_perl. As a simple mod_perl configuration, the Apache parent server will contain the Perl interpreter, but that's the only knowledge it will have or pass onto its spawned children. The children themselves will have to compile any Perl scripts and any modules that those Perl scripts require. It is worth remembering that a child process will only need to compile its Perl scripts once during its lifetime -- the first time -- but there may be many child processes, spawning and dying, and so this may still result in quite a few compilations of all scripts and modules.

The PerlRequire directive lets us add more knowledge to the Apache parent server, which will be passed wholly onto its children. The recommended procedure is to PerlRequire a Perl script, inside which we will specify which modules and such to build into the parent server:

Alias /cgi-perl/ "/usr/local/apache/cgi-perl/" 

PerlRequire /usr/local/apache/cgi-perl/startup.pl
<Location /cgi-perl>
 SetHandler perl-script
 PerlHandler Apache::Registry
 Options ExecCGI
 PerlSendHeader On
Let's consider a sample startup.pl script, which fills the Apache parent server with knowledge:
#! /usr/bin/perl
use strict;
use Apache::Registry ();
# pull in the Perl modules and packages we want to compile once,
# each spawned Apache child will inherit this knowledge
use CGI (); CGI->compile(':all');
use DBI ();
use DBD::mysql ();

require "/usr/local/apache/cgi-perl/fastdb.pl";
Our startup.pl is typical for a server which will execute Perl scripts which perform database queries -- a common scenario. The thinking is that all of our Perl scripts which run under this mod_perl server will want the CGI module for easy access to environment variables and form parameters, as well as the database modules DBI and DBD::mysql since we happen to be using the MySQL database.

Notice the inclusion of another Perl script, fastdb.pl. In fact, we'll soon see the value of this script, which we use to optimize database queries in several fun and fascinating ways.

To summarize, we've seen that it is wise to prepare the Apache parent server process with certain bits of "knowledge", or Perl code, which will be compiled only once and passed on to each spawned child process.

More Who's Your Daddy

Deciding just which Perl modules to pre-load into the Apache parent server, such as with our startup.pl script, is a judgment call. If, for instance, only one of your Perl scripts used the module Text::ParseWords, it would not be a great idea to pre-load this module in startup.pl, or else every single Apache child process will inherit this code. That can only lead to bloated chubby children, which simply eat more memory than they're worth.

On a similar note, it would also be possible to pre-load an entire Perl script into the Apache parent server. The obvious allure is that the whole script would only be compiled once, inherited by each child. But like a stack of chocolate chip pancakes, the idea sounds a lot better than it is -- pre-loading these scripts into the Apache parent will significantly bloat the children. Once again, bloated children eat RAM -- this is especially bad because when you eat too much RAM, you run the gamble of running out of physical RAM, forcing the operating system to hit virtual memory on the hard drive. Indigestion indeed, because hitting the hard disk for virtual RAM will incur such a large performance penalty that any gains made by pre-loading a script will be more than wiped out.

Measuring and tuning actually memory consumption is unfortunately not a simple task, and depends quite a bit on the operating system your Apache runs on. Rather than fill these pages with detailed and mind-numbing memory analyses details, we must instead cop-out and refer you to the very thorough and not at all tedious benchmarking procedures described by Stas Bekman in his mod_perl guide: Know Your Operating System.

Database Savoir Faire

Earlier we saw in our pre-loading script, startup.pl, a reference to fastdb.pl. This mysterious script contains a variety of optimizations which significantly grease the wheel of database queries between our Perl scripts and the database management system, and is a recommended approach if your Perl scripts perform any more than a smidgen of database queries.

Unfortunately, without these optimizations, database queries are horrifyingly inefficient. Often times, each script invocation must establish a connection to the database, which itself is a very costly process. Beyond that, statement handles must be established and SQL statements need to be parsed and compiled by the database driver. The more often you repeat similar queries, the more inefficient this whole process becomes -- recall our dishwashing example. Or, to use a different analogy, imagine unlocking and entering a toolshed to retrieve a single tool, and then repeating this process in its entirety for each of 10 individual tools.

The optimizations we've used in fastdb.pl are inspired and barely modified from the work of Jeffrey W. Baker, who has nicely integrated a number of different database optimizations. The first and foremost optimization when it comes to database queries is persistent connections. As we said earlier, it is very costly to re-establish a connection to the database every time a Perl script is executed by Apache. Large savings are gained by opening a database connection at the Apache parent server level, thus allowing each child process to inherit the already open connection. In effect, this is like unlocking and opening the toolshed door once, and leaving it open for subsequent entries. Makes sense!

Sending an SQL to a database also involves a number of steps, including the creation of a statement handle and the compilation of the SQL statement by the database driver. An elegant way to optimize compilation of SQL statements is to construct a statement "map" for the database driver, which maps out what a particular SQL statement will look like. This allows the driver to compile the statement once, and re-use this on subsequent calls of the same statement. For example, suppose a typical query performed by your Perl script requests the name of a user with a known ID value:

where ID = ?
Notice the question mark, which marks a placeholder in this SQL statement. The database driver can compile this statement map, and when necessary, can insert the actual value for ID in real-time by virtue of parameter binding -- a feature of the DBI module which lets you assign a specific value to a placeholder.

Using statement mapping, we can pre-construct the common SQL statements that our script may need in the fastdb.pl script, both compiling these statement maps in the database driver and establishing statement handles that can be used "ready-to-wear" from within the Perl script. It all sounds like a lot of nice talk, so let's look at fastdb.pl and see how this works in action.


package MyPerl::FastDB;
#This package opens a persistent connection to the database 
#(instead of using Apache::DBI), and prepares the primary SELECT
#statements we use, assigning each to a scalar statement handle.
#Optimization adapted heavily from Jeffrey W. Baker's guide to mod_perl performance

use strict;
use DBI;

sub connect {
   if (defined $MyPerl::FastDB::conn) {
      eval {
      if (!$@) {
         return $MyPerl::FastDB::conn;
                                      || die $DBI::errstr;
   #get username from ID query
      from USERTABLE
      where ID = ?
   #get ID from username query
      select ID 
      from USERTABLE 
      where FIRSTNAME = ? 
      and LASTNAME = ? 

As a package, fastdb.pl must end with a true value, hence the 1; on the last line. Moving to the top, the connect subroutine will attempt to establish a database connection, if none exists, or else will return the existing connection. This subroutine will be called from our Perl script when it needs to retrieve a database handle (we'll see the code shortly).

In the remainder of fastdb.pl, we pre-construct two statement maps: one to retrieve a user's name given their ID, and one to retrieve the ID given the user's name. Of course, these statements are based on a purely fictional database that we simply imagine could exist. For each statement, we prepare the statement map and assign the resulting statement handle to a Perl scalar value, such as $MyPerl::FastDB::selectUserFromID. We'll use these statement handles from within our Perl scripts, thereby enjoying the benefits of the pre-compiled and cached SQL code.

Building the Optimized Beast

All of what we've seen so far in this last of the mod_perl trilogy involves preparing an optimized environment. We've pre-loaded common Perl modules and prepared connections to the database as well as some specific SQL statements. Leveraging all this in the active Perl script is the key, and it's not too difficult to boot.

We'll build a very simple Perl script here, which will retrieve the user's ID as submitted from a browser, and spit out their name to the web page. The point isn't the outcome, but our implementation, whose principles you can adapt to far more complex arrangements.

use strict;

use CGI;
use DBI;

#prepare CGI object and browser for output

my $cgiobj=new CGI;
print $cgiobj->header;

#retrieve persistent database handle
my $dbh=MyPerl::FastDB->connect;

#retrieve form data
my $userID=$cgiobj->param("userID");

#setup statement handle and bind parameter
my $sth=$MyPerl::FastDB::selectUserFromID;

#execute SQL statement
$sth->execute || die "Statement failed: ".$dbh->errstr;

#retrieve result using column binding
my ($firstName,$lastName);

print "Hello, $firstName $lastName";
Despite the fact that the CGI and DBI modules were pre-loaded into the Apache parent server, we must still use them in this Perl script as usual, even though they will not be compiled a second time. After we create an instance of the CGI object and output an HTML header, we retrieve a database handle from the persistent connection into a local scalar variable, $dbh, as well as the userID value submitted from a fictional web form.

Now the juicy bits: the local scalar variable $sth is assigned a statement handle for the pre-compiled SQL statement map that we built into fastdb.pl, designed to query the database for a user's name given their ID. To get this statement ready to go, we need to substitute a real value for the placeholder, using bind_param on the statement handle, we bind the value of $userID to the first placeholder in the statement map (although our map only had one placeholder, you could use multiple placeholders for certain statements).

The statement is executed, and the script dies with an error message report by the database handle should the statement fail for some reason. Assuming success, we simply need to retrieve the result from the database. Here, we use another optimization called column binding -- we setup two local variables which are prepared to receive the results from the database, $firstName and $lastName. The bind_columns call tells the statement handle that these two variables will be the destination for the results -- notice that these variables are passed as references, preceded by forward slashes.

The fetchrow_arrayref call, the fastest way to retrieve data from the database, will place the return values into the bound parameters, which we simply output to the web page.

There's nothing miraculous about the outcome of our script -- it's very basic. And you could have coded the same functionality without ever reading this article. The difference is that the above is fast -- much faster than had we coded the same script using an unoptimized mod_perl environment, and will hold up to repeated execution more elegantly, saving processor time and computing resources, allowing for more hits more frequently.

Conclusion and Resources

We don't have the hubris to suggest that our final code is the most optimized script possible; or even that our optimizations are the be-all and end-all of tweaks and tuning. In fact, there are many more tweaks available which span the distance from specific techniques in Perl, to Apache configurations, to operating system fundamentals. Reality constrains most of us from exploring every conceivable performance boost, and so we've completed the mod_perl trilogy with common performance enhancers that provide the most bang-for-the-buck.

Jeffrey W. Baker's "Application Performance using DBI and mod_perl" nicely details and integrates much of the database optimizations we've surveyed here.

Stas Bekman's Performance Tuning section of his mod_perl guide contains thorough, gory details for those who want hard numbers and nearly every tweak imaginable.

The DBI documentation provides a concise overview of working with database calls, especially with regards to performance and the statement handle.

Related Stories: