Mail Filtering with procmail
This is the third installment in our detailed look at administering electronic mail. Previously, we considered general mail concepts and the sendmail transport agent. This month, we will look at procmail, a package designed for filtering electronic mail based upon a variety of criteria. This program was written by Stephen van den Berg, and the package's homepage can be found at http://www.procmail.org/.
procmail is a very powerful, general-purpose mail filtering facility; it can be used for several different purposes:
In fact, procmail is the mail filtering tool of choice for most users of Unix systems. It is usually applied to incoming messages in two main ways -- by using it as the local delivery agent (the program to which the transport agent hands off local messages for actual delivery) or by piping incoming mail for individual users to it, usually in the .forward file, as in this example:
|IFS=' ' && exec /usr/bin/
This example first sets the shell's inter-field separator character to a space and then executes procmail, specifying its -Y (assume BSD mailbox format) and -f- (tells the program to update the timestamp in the leading From header) options. You may need to modify the path to one appropriate for your system. If you want to be extra cautious, you can use an entry like:
This version tests for the existence of the procmail executable before running it. The output is wrapped here, but it is a single line in the .forward file. In any case, if procmailfails, the process returns an exit code of 75. The final item is a shell comment; it is required. As the procmail man page explains, this item, "is not actually a parameter that is required by procmail; in fact, it will be discarded by sh before procmail ever sees it; it is however a necessary kludge against overoptimizing sendmail programs."
Note: The individual user .forward file entries are not needed -- and in fact should not be used -- when procmail is the local delivery agent.
procmail gets its instructions about mail filtering operations to perform via a configuration file. The system-wide level configuration file is /etc/ procmailrc. The user-specific procmail configuration file is ~/.procmailrc. The system-wide configuration file is also invoked when individual users run procmail unless its -p option is included or the configuration file to use is explicitly specified as the command's final argument.
When procmail is being used only on a per-user basis, it is best to leave the global configuration file empty. Actions specified in the global configuration file are run in the root account context, so you have to set up this file very carefully to avoid security risks. procmail works by examining each successive mail message it receives, applying the various filters defined in the configuration file (known as "recipes") in turn. The first recipe that results in a destination or other disposition for the message causes all further processing to stop. If all of the recipes are applied without effect (if the messages pass unaffected through all of the filters), then the mail is appended to the user's normal mailbox (which can be defined via the procmailDEFAULT variable). procmailconfiguration file entries have the general format outlined in Figure One.
Let's begin with some simple procmailrc example configurations, as illustrated in Figure Two.
The initial section of the configuration file defines some procmail variables: the mail directory, the search path, and the default message destination for messages not redirected or discarded by any recipe.
The first recipe filters out mail from user jerk at bad-guys.org by redirecting it to /dev/null. Note that the condition is a regular expression against which incoming message text is matched. Contrary to expectations, however, pattern-matching is not case sensitive by default.
The second recipe unconditionally copies all incoming messages to the file ~/Mail/archive -- relative pathnames are interpreted with respect to MAILDIR -- while retaining the original message in the input stream. Since there is no condition specified, all message will match and be processed.
Copying occurs because the c flag (clone the message) is included in the start line. As this recipe indicates, the start line can potentially include a variety of items. The 0 can be followed by one or more code letters (flags specifying message handling variations), and the entire string can be followed by another colon, which causes procmail to use a lock file when processing a message with this recipe. The lock file serves to prevent multiple procmail processes handling different mail messages from trying to write to the same file simultaneously. The terminal colon can optionally be followed by a lock file name. In most cases, the file name is left blank (as it was here), allowing procmail to generate the name itself.
If this was the entire .procmailrc configuration file, then all messages not discarded by the first recipe would end up in the location specified by the DEFAULT variable, ~/Mail/unseen.
Similar recipes can be used to direct procmail to sort incoming mail into bins, as illustrated in Figure Three.
The first recipe sends mail from various users at notaol.org to the indicated mail folder (they are some of my siblings). The remaining three recipes copy all messages addressed to help into the file archive in the indicated directory and then sort the messages into two other mail folders. The third recipe directs messages whose subject line begins with Caseand contains one of the indicated letters followed by three or more consecutive digits into the existing file; all other messages go into the incoming file (both in my ~/supportsubdirectory).
The ordering of recipes can be important. For example, mail to help from one of my siblings will still go into the new-family file, not one of the ~/Mail/support files.
The ^TO_ component used in some of the preceding recipes is actually a procmailkeyword, and it causes the program to check all recipient-related headers for the specified pattern.
You can specify more than one condition by including multiple asterisk lines, as illustrated in Figure Four. The first recipe discards mail from anyone in the indicated domain that contains the indicated string in the subject line. Note that conditions are joined with ANDlogic. If you want to use OR logic, you must make a single condition using the regular expression | construct. The second recipe provides an example of doing so. Its search expression could be written more succinctly, but it is easier to read this way.
This recipe also illustrates the use of configuration file variables. We define a variable named FROM that matches a variety of headers that indicate the sender/origin of the incoming message (the square brackets contain a space and a tab character). The variable is then used in the first condition, and the initial question mark is required to force variable de-referencing within the pattern.
Other Disposition Options
You can also use a pipe as the destination by including a vertical bar as the first character in the line, which is illustrated in Figure Five (pg. 52). This recipe sends all mail not from root or cron (the exclamation mark indicates a negative test) to the indicated Perl script. We don't use procmail locking here; if the script does any writing to files, it will need to do its own locking (procmail locking is not recommended for this purpose).
procmail assumes that commands will be executed in the context of the Bourne (sh) shell at a very deep level. If your login shell is a C shell variant, place the following command at the top of your procmail configuration file:
In the examples in Figure Six, we forward mail to another user and generate and send a mail message within procmail recipes.
The first recipe distributes selected items from a mailing list to a group of local users. Messages from the mailing list are identifiable by the beginning of their subject lines, and the recipe selects those with either "gaussian" or "g9" anywhere in the subject line. The selected messages are forwarded to the two indicated local users, which actually aliases to a list of users.
The second recipe sends all of the remaining messages from the same list to the ccl_allalias. The users in this internal list want to receive the entire mailing list, and the combination of recipes one and two produces that result.
The final recipe sends a reply to any mail messages from the specified user. It uses the formail utility that is part of the procmail package. The formail-r command creates a reply to the mail message that the command receives as input, discarding existing message headers and the message body.
The new body text is then created via the two echo commands that follow, and then the completed message is piped to sendmail for submission to the mail facility. sendmail's -toption tells the program to determine the recipient(s) from the message headers, and -oicauses it not to treat a line containing a sole period as the end of input (only rarely needed, but traditionally included just to be safe).
This message also illustrates a technique for avoiding mail loops with procmail. The formail command adds an X-Loop header to the outgoing mail message (via the -aoption). The conditions also check for the presence of this header, bypassing the message when it is found. In this way, this recipe will prevent procmail from processing the generated message should it bounce. Table One (pg. 54) lists some useful formail options.
procmail recipes can also be used to transform incoming mail messages. Figure Sevencontains a nice example by Tony Nugent (slightly modified).
These recipes introduce several new procmail flags. The set in the first recipe, Bfw, tells procmail to search the message body only (B -- the default is the entire message), that the recipe is a filter (f) and messages should continue to be processed by later configuration file entries after it completes, and finally that the program should wait for the filter program specified as the disposition to complete before proceeding to the next recipe in the configuration file (w).
The sed command in the disposition searches for various PGP-related strings within the message body (because of the B flag). When found, it edits the message, replacing two space-separated hyphens at the beginning of a line with a single hyphen and removing various PGP-related text, signature blocks, and public key blocks (accomplishing the last two operations by using sed's text section removal feature).
The next recipe will be applied only to messages that matched the conditions in the previous recipe (the A flag), operating as a filter (f flag) on the message headers only (hflag), and waiting for the filter program to complete before continuing with the remainder of the configuration file (w flag). The disposition causes the message to be piped to formail, where an X-PGP header is added to the message or an existing header of this type is replaced (-I option). Table Two lists the most important procmail start line flags.
As you can probably gather from what we've looked at so far, procmail represents a lot of material -- more than we can cover in one column. So stay tuned for next month, when we'll look at some of the more powerful things procmail can do, such as automatically discarding spam and scanning mail for security purposes (such as e-mail viruses).