In Macro Processors and Techniques for Portable Software, P.J. Brown says:
When attending computer conferences and the like, I have listened to (and probably delivered) my full share of boring lectures, but there is one class of bore who easily outshines all the others: This is the man who talks in full details about the way his system has been implemented."
I'm going to try to outshine even the bores of Brown's nightmares by not only describing the implementation but also giving the complete code. We'll start with the simple version that supports definition and replacement.
Listing One shows the complete Awk program, which contains two pattern-action pairs.
Listing One
awk ' /^@define[ \t]/ { name = $2 $1 = $2 = ""; sub(/^[ \t]+/, "") symtab[name] = $0 next } { for (i in symtab) gsub("@" i "@", symtab[i]) print } ' $*
The first pattern recognizes @define
lines. Its action stores the name, erases the @define
and name fields and the white space around them, then stores the remainder of the input line in the symbol table (implemented as an Awk associative array). Execution then proceeds with the next input line. The null second pattern ensures that the action will be executed on all other input lines. The for
loop iterates over all entries in the symbol table, and the gsub
globally substitutes replacement values for their names. The print
statement writes the transformed input line.
In the next version of the program we will add a simple include facility. The input line @include
filename is replaced by the contents of filename. We will restructure the program around a recursive routine to read files and add functions to make it easier to extend.
Listing Two shows the resulting code. If the program is invoked with a single argument, the BEGIN block takes that as the name of the input file; otherwise it processes the standard input. The function dofile
processes a file, dodef
processes a definition, and dosubs
applies the substitutions in the symbol table to its input string. The dodef
function uses a complex regular expression in a sub
command to remove the first two fields (because setting them to blanks as in the first version causes Awk to replace all field separators with a single blank).
Listing Two
awk ' function dofile(fname) { while (getline < fname > 0) { if (/^@define[ \t]/) dodef() else if (/^@include[ \t]/) dofile(dosubs($2)) else print dosubs($0) } close(fname) } function dodef( name) { name = $2 sub(/^[ \t]*[^ \t]+[ \t]+[^ \t]+[ \t]+/, "") symtab[name] = $0 } function dosubs(s, i) { for (i in symtab) gsub("@" i "@", symtab[i], s) return s } BEGIN { if (ARGC == 2) dofile(ARGV[1]) else dofile("/dev/stdin") } ' $*
So far we have assumed that macro definitions expand into unadorned text. But look what happens when the replacement text contains further macro calls, as in:
@define DIR/usr/jlb/macro.paper @define PROBSECFILE @DIR@/sec2.in.
After these definitions, the string @PROBSECFILE@
should be expanded into /usr/jlb/macro .paper/sec2.in. The previous implementation may or may not handle this correctly (details are left as an exercise for Awkophiles). The implementation of dosub
s in Listing Three handles nested macros by repeatedly expanding the string until no more expansions are made.
Listing Three
function dosubs(s, changes, i) { do { changes = 0 for (i in symtab) changes += gsub("@" i "@", symtab[i], s) } while (changes) return s }
That version is correct but slow; we can speed it up with a guard to check for the common case of no remaining @
characters:
... changes = 0 if (s ~ /@.*@/) for (i in symtab) ...
Without the guard, the program takes 5.4 seconds to process one large file; with the guard, the time drops to 2.3 seconds. The faster version of dosubs
described in "A Substitution Function" takes just 0.8 seconds on the same file.
A Substitution Function
Several versions of the dosub
s function perform macro substitution. The final version of the program (Listing Four) uses an even faster version of the function.
The idea is to process the string from left to right, searching for the first substitution to be made. We then make the substitution and rescan the string starting at the fresh text. We implement this idea by keeping two strings: the text processed so far is in L
(for left), and unprocessed text is in R
(for right). Here is the pseudocode for dosubs
(the final version will be shown in Listing Four).
L = Empty R = Input String while R constains an "@" sign do let R = A @ B; set L = L A and R = B if R contains no "@" then L = L "@" break let R = A @ B; set M = A and R = B if M is in Symtab then R = SymTab[M] R else L = L "@" M R = "@" R return L R
Sometimes you want to make a file you can conditionally change. Consider the arduous task of writing a Ph.D. thesis, which can strain even the best professor-student relationship. A friend of mine organized his thesis so that by setting a given flag, he could remove all reference to his thesis advisor. The version he showed his advisor (whom we'll call "Professor Newton" to protect the innocent) was compiled from a file like this:
@define WANTNEWT 1 ... @if WANTNEWT This area was profoundly influenced by the groundbreaking work of Professor Newton. @fi
For his private amusement, the poor student could recompile the document after setting WANTNEWT to zero. The semantics of the @if
statement are that the text up to the next @fi
statement is included if the variable is defined and not equal to zero. To implement the statement, we need this function to discard text:
function gobble (fname) { while (getline < fname > 0) if (/^@fi/) break }
We then add these lines to the chain of if
statements in dofile
:
} else if (/^@if[ \t]/) { if (!($2 in symtab) || symtab[$2] == 0) gobble(fname) }...