---

WDVL.com: The Perl You Need to Know Special: Introduction to mod_perl Part 3

By Aaron Weiss, WDVL.com

Like all gripping tales of conquering heroes and/or Godfathers,
our story of mod_perl is a trilogy. We conclude the mod_perl
trilogy with some tricks and sleight-of-hand with which you can
squeeze a few extra processor cycles out of your web server — also
known as optimizations. Specifically, we’ll look at some
ways to take advantage of the mod_perl environment to better
optimize usage and, more importantly, re-usage of Perl scripts and
database requests.

Reuse and Recycle

If your mod_perl enabled Apache server only needed to execute
your Perl
scripts one time, forever, we probably wouldn’t need this article.
Or mod_perl. Or the Internet, for that matter. So it’s established
that your scripts need to run repeatedly, whether “repeatedly”
means once per day, once per minute, or ten times per second.
Nobody likes doing the same thing over and over, Apache included —
it’s silly and inefficient to repeat an entire task every time you
need to reach a certain goal. This brings us to the core purpose of
optimization:

Suppose that you need to wash a sinkful of dirty dishes. You’ll
need hot water from the tap to fill the sink and an open bottle of
dish washing detergent, preferably the kind that leaves your hands
silky smooth. You could turn on and off the tap for each
dish, waiting for the water to get hot each time, re-close and
re-open the detergent bottle for each dish, and so on. An onlooker
would probably observe that you’re not the sharpest tool in the
shed — and they’d be right. But without optimizations, this is
exactly what we do with Apache everytime a Perl script is executed:
turn on and off the tap water for each dish, wait for the water to
get hot, re-open the detergent bottle, and so forth.

To get the most out of mod_perl, we need to think about how
mod_perl re-uses compiled code, and design a strategy that best
balances the efficiencies of re-using compiled code versus the
costs in memory consumption. Understanding mod_perl in this way
means understanding the parent-child relationship, which we mean in
a purely server-centric way, and not at all in a Dr. Laura/John
Bradshaw/Benjamin Spock sort of way.

Who’s Your Daddy?

The Apache web server behaves like Wilt Chamberlain, spawning
many children. It is a single parent, expected to raise enough
children to handle the average load of incoming HTTP
requests. Unlike human babies, an Apache child is “born” with all
the knowledge of its parent. So, the more know-how, or compiled
Perl in this case, we stuff into the parent server, the more its
children know at birth. This can be a good thing, but it can also
be taken too far — if daddy has a big full head, then every baby
also has a big full head, and with many babies that can eat up a
lot of RAM.

Apache’s parent process is created at the time the Apache server
is started, and its httpd.conf configuration file is
evaluated. This is our opportunity to decide what knowledge the
parent will pass onto its children. You may recall a typical set of
mod_perl configuration directives placed in the httpd.conf
file of a mod_perl-compiled Apache server:

Alias /cgi-perl/ "/usr/local/apache/cgi-perl/" 
<Location /cgi-perl> 
SetHandler perl-script
PerlHandler Apache::Registry
Options ExecCGI
PerlSendHeader On
</Location>

Under the above configuration, all scripts in
/usr/local/apache/cgi-perl/ will be handled by mod_perl.
As a simple mod_perl configuration, the Apache parent server will
contain the Perl interpreter, but that’s the only knowledge it will
have or pass onto its spawned children. The children themselves
will have to compile any Perl scripts and any modules that those
Perl scripts require. It is worth remembering that a child process
will only need to compile its Perl scripts once during its lifetime
— the first time — but there may be many child processes,
spawning and dying, and so this may still result in quite a few
compilations of all scripts and modules.

The PerlRequire directive lets us add more knowledge to
the Apache parent server, which will be passed wholly onto its
children. The recommended procedure is to PerlRequire a
Perl script, inside which we will specify which modules and such to
build into the parent server:

Alias /cgi-perl/ "/usr/local/apache/cgi-perl/" 

PerlRequire /usr/local/apache/cgi-perl/startup.pl
<Location /cgi-perl>
 SetHandler perl-script
 PerlHandler Apache::Registry
 Options ExecCGI
 PerlSendHeader On
</Location>

Let’s consider a sample startup.pl script, which fills the
Apache parent server with knowledge:

#! /usr/bin/perl
use strict;
 
use Apache::Registry ();
 
# pull in the Perl modules and packages we want to compile once,
# each spawned Apache child will inherit this knowledge
use CGI (); CGI->compile(':all');
use DBI ();
use DBD::mysql ();

require "/usr/local/apache/cgi-perl/fastdb.pl";
 
1;

Our startup.pl is typical for a server which will execute
Perl scripts which perform database queries — a common scenario.
The thinking is that all of our Perl scripts which run under this
mod_perl server will want the CGI module for easy
access to environment variables and form parameters, as well as the
database modules DBI and DBD::mysql since we happen to be using the
MySQL database.

Notice the inclusion of another Perl script, fastdb.pl.
In fact, we’ll soon see the value of this script, which we use to
optimize database queries in several fun and fascinating ways.

To summarize, we’ve seen that it is wise to prepare the Apache
parent server process with certain bits of “knowledge”, or Perl
code, which will be compiled only once and passed on to each
spawned child process.

More Who’s Your Daddy

Deciding just which Perl modules to pre-load into the Apache
parent server, such as with our startup.pl script, is a
judgment
call. If, for instance, only one of your Perl scripts used the
module Text::ParseWords, it would not be a great idea to
pre-load this module in startup.pl, or else every single
Apache child process will inherit this code. That can only lead to
bloated chubby children, which simply eat more memory than they’re
worth.

On a similar note, it would also be possible to pre-load an
entire Perl script into the Apache parent server. The obvious
allure is that the whole script would only be compiled once,
inherited by each child. But like a stack of chocolate chip
pancakes, the idea sounds a lot better than it is — pre-loading
these scripts into the Apache parent will significantly bloat the
children. Once again, bloated children eat RAM — this is
especially bad because when you eat too much RAM, you run the
gamble of running out of physical RAM, forcing the operating system
to hit virtual memory on the hard drive. Indigestion indeed,
because hitting the hard disk for virtual RAM will incur such a
large performance penalty that any gains made by pre-loading a
script will be more than wiped out.

Measuring and tuning actually memory consumption is
unfortunately not a simple task, and depends quite a bit on the
operating system your Apache runs on. Rather than fill these pages
with detailed and mind-numbing memory analyses details, we must
instead cop-out and refer you to the very thorough and not at all
tedious benchmarking procedures described by Stas Bekman in his
mod_perl guide: Know Your Operating
System
.

Database Savoir Faire

Earlier we saw in our pre-loading script, startup.pl, a
reference to fastdb.pl. This mysterious script contains a
variety of optimizations which significantly grease the wheel of
database queries between our Perl scripts and the database
management system, and is a recommended approach if your Perl
scripts perform any more than a smidgen of database queries.

Unfortunately, without these optimizations, database queries are
horrifyingly inefficient. Often times, each script invocation must
establish a connection to the database, which itself is a very
costly process. Beyond that, statement handles must be established
and SQL statements need to be parsed and compiled by the database
driver. The more often you repeat similar queries, the more
inefficient this whole process becomes — recall our dishwashing
example. Or, to use a different analogy, imagine unlocking and
entering a toolshed to retrieve a single tool, and then repeating
this process in its entirety for each of 10 individual tools.

The optimizations we’ve used in fastdb.pl are inspired
and barely modified from the work of Jeffrey W.
Baker
, who has nicely integrated a number of different database
optimizations. The first and foremost optimization when it comes to
database queries is persistent connections. As we said
earlier, it is very costly to re-establish a connection to the
database every time a Perl script is executed by Apache. Large
savings are gained by opening a database connection at the Apache
parent server level, thus allowing each child process to inherit
the already open connection. In effect, this is like unlocking and
opening the toolshed door once, and leaving it open for subsequent
entries. Makes sense!

Sending an SQL to a database also involves a number of steps,
including the creation of a statement handle and the compilation of
the SQL statement by the database driver. An elegant way to
optimize compilation of SQL statements is to construct a statement
“map” for the database driver, which maps out what a particular SQL
statement will look like. This allows the driver to compile the
statement once, and re-use this on subsequent calls of the same
statement. For example, suppose a typical query performed by your
Perl script requests the name of a user with a known ID value:

select FIRSTNAME,LASTNAME
from USERTABLE
where ID = ?

Notice the question mark, which marks a placeholder in this SQL
statement. The database driver can compile this statement map, and
when necessary, can insert the actual value for ID in real-time by
virtue of parameter bindinga feature of the DBI module which lets you
assign a specific value to a placeholder.

Using statement mapping, we can pre-construct the common SQL
statements that our script may need in the fastdb.pl
script, both compiling these statement maps in the database driver
and establishing statement handles that can be used “ready-to-wear”
from within the Perl script. It all sounds like a lot of nice talk,
so let’s look at fastdb.pl and see how this works in
action.

fastdb.pl

package MyPerl::FastDB;
#This package opens a persistent connection to the database 
#(instead of using Apache::DBI), and prepares the primary SELECT
#statements we use, assigning each to a scalar statement handle.
#Optimization adapted heavily from Jeffrey W. Baker's guide to mod_perl performance

use strict;
use DBI;

sub connect {
   if (defined $MyPerl::FastDB::conn) {
      eval {
         $MyPerl::FastDB::conn->ping;
      };
      if (!$@) {
         return $MyPerl::FastDB::conn;
      }
   }
   
   $MyPerl::FastDB::conn=DBI->connect('dbi:mysql:database_name',
                                      'username','password',
                                      {PrintError=>1,RaiseError=>1}) 
                                      || die $DBI::errstr;
   
   #get username from ID query
   $MyPerl::FastDB::selectUserFromID=$MyPerl::FastDB::conn->prepare(q{
      select FIRSTNAME,LASTNAME
      from USERTABLE
      where ID = ?
   });
   
   #get ID from username query
   $MyPerl::FastDB::selectIDfromUser=$MyPerl::FastDB::conn->prepare(q{
      select ID 
      from USERTABLE 
      where FIRSTNAME = ? 
      and LASTNAME = ? 
   });

1;

As a package, fastdb.pl must end with a true value, hence
the 1; on the last line. Moving to the top, the
connect subroutine will attempt to establish a database
connection, if none exists, or else will return the existing
connection. This subroutine will be called from our Perl script
when it needs to retrieve a database handle (we’ll see the code
shortly).

In the remainder of fastdb.pl, we pre-construct two
statement maps: one to retrieve a user’s name given their ID, and
one to retrieve the ID given the user’s name. Of course, these
statements are based on a purely fictional database that we simply
imagine could exist. For each statement, we prepare the statement
map and assign the resulting statement handle to a Perl scalar
value, such as $MyPerl::FastDB::selectUserFromID. We’ll
use these statement handles from within our Perl scripts, thereby
enjoying the benefits of the pre-compiled and cached SQL code.

Building the Optimized Beast

All of what we’ve seen so far in this last of the mod_perl
trilogy involves preparing an optimized environment. We’ve
pre-loaded common Perl modules and prepared connections to the
database as well as some specific SQL statements. Leveraging all
this in the active Perl script is the key, and it’s not too
difficult to boot.

We’ll build a very simple Perl script here, which will retrieve
the user’s ID as submitted from a browser, and spit out their name
to the web page. The point isn’t the outcome, but our
implementation, whose principles you can adapt to far more complex
arrangements.

#!/usr/bin/perl
use strict;

use CGI;
use DBI;


#prepare CGI object and browser for output

my $cgiobj=new CGI;
print $cgiobj->header;


#retrieve persistent database handle
my $dbh=MyPerl::FastDB->connect;

#retrieve form data
my $userID=$cgiobj->param("userID");

#setup statement handle and bind parameter
my $sth=$MyPerl::FastDB::selectUserFromID;
$sth->bind_param(1,$userID);

#execute SQL statement
$sth->execute || die "Statement failed: ".$dbh->errstr;

#retrieve result using column binding
my ($firstName,$lastName);
$sth->bind_columns($firstname,$lastName);
$sth->fetchrow_arrayref;

print "Hello, $firstName $lastName";

Despite the fact that the CGI and DBI modules were pre-loaded into
the Apache parent server, we must still use them in this
Perl script as usual, even though they will not be compiled a
second time. After we create an instance of the CGI object and
output an HTML header, we retrieve a database handle from the
persistent connection into a local scalar variable, $dbh,
as well as the userID value submitted from a fictional web
form.

Now the juicy bits: the local scalar variable $sth is
assigned a statement handle for the pre-compiled SQL statement map
that we built into fastdb.pl, designed to query the
database for a user’s name given their ID. To get this statement
ready to go, we need to substitute a real value for the
placeholder, using bind_param on the statement handle, we
bind the value of $userID to the first placeholder in the
statement map (although our map only had one placeholder, you could
use multiple placeholders for certain statements).

The statement is executed, and the script dies with an error
message report by the database handle should the statement fail for
some reason. Assuming success, we simply need to retrieve the
result from the database. Here, we use another optimization called
column binding — we setup two local variables which are
prepared to receive the results from the database,
$firstName and $lastName. The
bind_columns call tells the statement handle that these
two variables will be the destination for the results — notice
that these variables are passed as references, preceded by forward
slashes.

The fetchrow_arrayref call, the fastest way to retrieve
data from the database, will place the return values into the bound
parameters, which we simply output to the web page.

There’s nothing miraculous about the outcome of our script —
it’s very basic. And you could have coded the same functionality
without ever reading this article. The difference is that the above
is fast — much faster than had we coded the same script using an
unoptimized mod_perl environment, and will hold up to repeated
execution more elegantly, saving processor time and computing
resources, allowing for more hits more frequently.

Conclusion and Resources

We don’t have the hubris to suggest that our final code is the
most optimized script possible; or even that our optimizations are
the be-all and end-all of tweaks and tuning. In fact, there are
many more tweaks available which span the distance from specific
techniques in Perl, to Apache configurations, to operating system
fundamentals. Reality constrains most of us from exploring every
conceivable performance boost, and so we’ve completed the mod_perl
trilogy with common performance enhancers that provide the most
bang-for-the-buck.

Jeffrey W. Baker’s “Application Performance using DBI and mod_perl”
nicely details and integrates much of the database optimizations
we’ve surveyed here.

Stas Bekman’s Performance Tuning section of his mod_perl guide
contains thorough, gory details for those who want hard numbers and
nearly every tweak imaginable.

The DBI documentation provides a concise overview of
working with database calls, especially with regards to performance
and the statement handle.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends, & analysis