A macro processor scans input text for defined symbols — the macros — and replaces that text by other text, or possibly by other symbols. For instance, a macro processor can convert one language into another.
If you’re a C programmer, you know cpp, the C preprocessor, a simple macro processor. m4 is a powerful macro processor that’s been part of Unix for some 30 years, but it’s almost unknown — except for special purposes, such as generating the sendmail.cf file. It’s worth knowing because you can do things with m4 that are hard to do any other way.
The GNU version of m4 has some extensions from the original V7 version. (You’ll see some of them.) As of this writing, the latest GNU version was 1.4.2, released in August 2004. Version 2.0 is under development.
While you won’t become an m4 wizard in three pages (or in six, as the discussion of m4continues next month), but you can master the basics. So, let’s dig in.
Simple Macro Processing
A simple way to do macro substitution is with tools like sed and cpp. For instance, the command sed’s/XPRESIDENTX/President Bush/’ reads lines of text, changing every occurrence of XPRESIDENTX to President Bush. sed can also test and branch, for some rudimentary decision-making.
As another example, here’s a C program with a cpp macro named ABSDIFF() that accepts two arguments, a and b.
#define ABSDIFF(a, b)
((a)>(b) ? (a)-(b) : (b)-(a))
Given that definition, cpp will replace the code…
diff = ABSDIFF(v1, v2);
… with
diff = ((v1)>(v2) ? (v1)-(v2) : (v2)-(v1));
v1 replaces a everywhere, and v2 replace b. ABSDIFF() saves typing — and the chance for error.
Introducing m4
Unlike sed and other languages, m4 is designed specifically for macro processing. m4manipulates files, performs arithmetic, has functions for handling strings, and can do much more.
m4 copies its input (from files or standard input) to standard output. It checks each token (a name, a quoted string, or any single character that’s not a part of either a name or a string) to see if it’s the name of a macro. If so, the token is replaced by the macro’s value, and then that text is pushed back onto the input to be rescanned. (If you’re new to m4, this repeated scanning may surprise you, but it’s one key to m4 s power.) Quoting text, like ` text‘, prevents expansion. (See the section on “Quoting.”)
m4 comes with a number of predefined macros, or you can write your own macros by calling the define() function. A macro can have multiple arguments– up to 9 in original m4, and an unlimited number in GNU m4. Macro arguments are substituted before the resulting text is rescanned.
Here’s a simple example (saved in a file named foo.m4):
one
define(`one’, `ONE’)dnl
one
define(`ONE’, `two’)dnl
one ONE oneONE
`one’
The file defines two macros named one and ONE. It also has four
lines of text. If you feed the file to m4 using m4 foo.m4, m4 produces:
one
ONE
two two oneONE
one
Here’s what’s happening:
*Line 1 of the input, which is simply the characters one and a newline, doesn’t match any macro (so far), so it’s copied to the output as-is.
*Line 2 defines a macro named one(). (The opening parenthesis before the arguments must come just after define with no whitespace between.) From this point on, any input string one will be replaced with ONE. (The dnl is explained below.)
*Line 3, which is again the characters one and a newline, is affected by the just-defined macro one(). So, the text one is converted to the text ONE and a newline.
*Line 4 defines a new macro named ONE(). Macro names are case-sensitive.
*Line 5 has three space-separated tokens. The first two are one and ONE. The first is converted to ONE by the macro named one(), then both are converted to two by the macro named ONE(). Rescanning doesn’t find any additional matches (there’s no macro named two()), so the first two words are output as two two. The rest of line 5 (a space, oneONE, and a newline) doesn’t match a macro so it’s output as-is. In other words, a macro name is only recognized when it’s surrounded by non-alphanumerics.
*Line 6 contains the text one inside a pair of quotes, then a newline. (As you’ve seen, the opening quote is a backquote or grave accent; the closing quote is a single quote or acute accent.) Quoted text doesn’t match any macros, so it’s output as-is: one. Next comes the final newline.
Input text is copied to the output as-is and that includes newlines. The built-in dnlfunction, which stands for “delete to new line,” reads and discards all characters up to and including the next newline. (One of its uses is to put comments into an m4 file.) Without dnl, the newline after each of our calls to define would be output as-is. We could demonstrate that by editing foo.m4 to remove the two dnl s. But, to stretch things a bit, let’s use sed to remove those two calls from the file and pipe the result to m4:
$ sed ‘s/dnl//’ foo.m4 | m4
one
ONE
two two oneONE
one
If you compare this example to the previous one, you’ll see that there
are two extra newlines at the places where dnl used to be.
Let’s summarize. You’ve seen that input is read from the first character to the last. Macros affect input text only after they’re defined. Input tokens are compared to macro names and, if they match, replaced by the macro’s value. Any input modified by a macro is pushed back onto the input and is rescanned for possible modification. Other text (that isn’t modified by a macro) is passed to the output as-is.
Quoting
Any text surrounded by `’ (a grave accent and an acute accent) isn’t expanded immediately. Whenever m4 evaluates something, it strips off one level of quotes. When you define a macro, you’ll often want to quote the arguments — but not always. Listing One has a demo. It uses m4 interactively, typing text to its standard input.