The complete m1 program has a couple of additions to this simple conditional. Text may contain nested if
statements; gobble
is modified to keep a counter of the current if/fi
nesting. The @unless
statement is the complement of @if
it includes the subsequent text (up to the same @fi delimiter) if the variable is undefined or defined to be zero.
The final version of m1 also supports multiline @define
s. If a @define
line ends with a backslash (\), the text is continued on the next line (discarding white space before the first text character). To implement long defines, we make the minor change to dodef
to continue reading text as long as lines end with a backslash. We must also make a major change to the I/O structure of the entire program because macro expansion can generate lines that need to be read by the dofile
function. The new readline
function reads a line from the text buffer if it is not empty; otherwise, it reads from the current file. The string s
can be pushed back onto the input stream by concatenating it on the front (left) of buffer by the idiom buffer = s buffer
.
The complete program is adorned with several other bells and whistles. Here are the most interesting and important:
- Comments. It is immoral to design a language without comments. Lines that begin with
@comment
are therefore ignored. - Error checking. The final Awk program has a number of if statements that check for weird conditions, which are reported by the error function.
- Defaults. The
@default
statement is a@define
that takes effect only if the variable was not previously defined; we'll see its use shortly. We could get the same effect with an@unless
around a@define
, but the@default
is used frequently enough to merit its own command. - Performance. When
dofile
reads a line of text unadorned with @ characters, it performs several tests and function calls. The final version adds a newif
statement to print the line immediately.
Figure 1 summarizes the m1 language.
@comment Any text | '. |
@define name value | |
@default name value | Set if name undefined |
@include filename | |
@if varname | Include subsequent text if varname!=0 |
@fi | Terminate @if or unless |
@unless varnmae | Include subsequent text if varname!==0 |
Anywhere in line @name@ |
The m1 program could be extended in many ways. Here are some of the biggest temptations to "feeping creaturism":
- A long definition with a trail of backslashes might be more graciously expressed by a
@longdefine
statement terminated by a@longend
. - An
@undefine
statement would remove a definition from the symbol table. - I've been tempted to add parameters to macros, but so far I have gotten around the problem by using an idiom described in the next section.
- It would be easy to add stackbased arithmetic and strings to the language through
@push
and@pop
commands that read and write variables. - As soon as you try to write interesting macros, you need to have mechanisms for quoting strings (to postpone evaluation) and forcing immediate evaluation.
Listing Four contains the complete implementation of m1 in about 100 lines of Awk, which is significantly shorter than other macro processors.
Listing Four
awk ' function error(s) { print "m1 error: " s | "cat 1> &2"; exit 1 } function dofile(fname, savefile, savebuffer, newstring) { if (fname in activefiles) error("recursively reading file: " fname) activefiles[fname] = 1 savefile = file; file = fname savebuffer = buffer; buffer = "" while (readline() ! = EOF) { if (index($0, "@") == 0) { print $0 } else if (/^ @define[ \t]/) { dodef() } else if (/^ @default[ \t]/) { if (!($2 in symtab)) dodef() } else if (/^ @include[ \t]/) { if (NF != 2) error("bad include line") dofile(dosubs($2)) } else if (/^ @if[ \t]/) { if (NF != 2) error("bad if line") if (!($2 in symtab) || symtab[$2] == 0) gobble() } else if (/^ @unless[ \t]/) { if (NF != 2) error("bad unless line") if (($2 in symtab) && symtab[$2] != 0) gobble() } else if (/^ @fi[ \t]?/) { # Could do error checking } else if (/^ @comment[ \t]?/) { } else { newstring = dosubs($0) if ($0 == newstring || index(newstring, "@") == 0) print newstring else buffer = newstring "\n" buffer } } close(fname) delete activefiles[fname] file = savefile buffer = savebuffer } function readline( i, status) { status = "" if (buffer != "") { i = index(buffer, "\n") $0 = substr(buffer, 1, i-1) buffer = substr(buffer, i+1) } else { if (getline <file <= 0) status = EOF } return status } function gobble( ifdepth) { ifdepth = 1 while (readline()) { if (/^ @(if|unless)[ \t]/) ifdepth++ if (/^ @fi[ \t]?/ && --ifdepth <=0) break } } function dosubs(s, l, r, i, m) { if (index(s, "@") == 0) return s l = "" # Left of current pos; ready for output r = s # Right of current; unexamined at this time while ((i = index(r, "@")) != 0) { l = l substr(r, 1, i-1) r = substr(r, i+1) # Currently scanning @ i = index(r, "@") if (i == 0) { 1 = 1 "@" break } m = substr(r, 1, i-1) r = substr(r, i+1) if (m in symtab) { r = symtab[m] r } else { 1 = 1 "@" m r = "@" r } } return l r } function dodef(fname, str) { name = $2 sub(/^ [ \t]*[^ \ t]+[ \t]+[^ \ t]+[ \t]+/, "") str = $0 while (str ^\$/) { if (readline() == EOF) error("EOF inside definition") sub(/^[ \t]+/, "") sub(^\$/, "\n" $0, str) } symtab[name] = str } BEGIN { EOF = "EOF" if (ARGC == 1) dofile("/dev/stdin") else if (ARGC == 2) dofile(ARGV[1]) else error("usage: m1 fname") } ' $*
The program uses several techniques that can be applied in many Awk programs:
- Symbol tables are easy to implement with Awk's associative arrays.
- The program makes extensive use of Awk's string-handling facilities: regular expressions, string concatenation, gsub, index, and substr.
- Awk's file handling makes the dofile procedure straightforward.
- The readline function and pushback mechanism associated with buffer are of general utility.