Embedding Python objects in HTML pages
Sidebar: What's New in Python 1.5?
Sidebar: About Python
Recently, I was prototyping a CGI infrastructure in Python to investigate possible implementation strategies for a production system. I had achieved a multithreaded, fast CGI service with persistent objects, but the HTML text was embedded inside the Python code, represented as print statements. This was cumbersome and failed to adhere to the object paradigm present elsewhere in the Python code (see Listing One.) Also, when editing, I had to contend with two syntaxes at the same time -- Python and HTML -- and I was constantly introducing errors in the latter with nearly every change. Furthermore, I was unable to easily use Emacs' wonderful HTML editing mode, with its complement of template HTML constructs and syntax coloring -- reason enough to look for an alternative.
At the most primitive level, I needed to insert run-time values into specific, tagged locations in a block of text. However, since I was working with rows of data from an object database, I wanted conditional and iterative control over regions of text; the former would include a block of text only if a condition was True, while the latter would map a set of values onto a text block, substituting different values for the tagged locations with each iteration. I had tried the HTML classes available for Python, but they focused on run-time HTML generation. I wanted, instead, to create a template of an HTML document, and at run time, substitute the placeholders in the template with run-time values.
The solution I developed is called "BoilerPlate." Listing Two is a sample BoilerPlate template that contains examples of the three properties just described. The %()... constructs represent placeholders for run-time values. Conditional inclusion is represented by #if# and #else# tags, and the #for# tag starts a region of text iterated over with a run-time set of values. Compared to Listing One, the HTML structure is predominant, and the BoilerPlate # tags clearly stand out. Unfortunately, the placeholder tag %()... does not. This is Python's string formatting descriptor, similar to that found in the C Standard Library printf family. I kept the Python syntax over a custom tag to keep the BoilerPlate implementation simple. As you will see, its power more than makes up for its orphaned appearance.
The complete BoilerPlate source is available electronically (see "Resource Center," page 3). This includes two source files and a patch file for Python 1.4. BoilerPlate.py contains the classes described in this article. The file Sink.py contains the definition of the Sink class, a utility class used by BoilerPlate to collect formatted output. It is a bit faster than standard Python string concatenation using the "+" operator.
Although I predominantly use BoilerPlate for HTML documents, it has no ties to the HTML language. It can be used to process files of any content -- including Python code and, with some care, binary files.
BoilerPlate Processing Model
Processing within the BoilerPlate classes occurs in three stages. These stages work with three conceptual data types which roughly correspond to specific BoilerPlate classes; Table 1 presents the class hierarchy. The data types in the processing model are:
- template, text with embedded BoilerPlate tags. Represented in the code as native Python strings.
- text block, a range of text taken from a template. Represented by the RawText and Block classes.
- format dictionary, a mapping of BoilerPlate tag names to values. Represented by the Formatter class.
These data types are used in the following processing stages:
- fragmentation, which takes a template and breaks it up into one or more text blocks based on the BoilerPlate tags encountered.
- value acquisition, which generates a mapping or dictionary of BoilerPlate tag names and their run-time value.
- formatting, which creates a Python string by applying the current format dictionary to the text blocks of the template document.
The fragmentation stage normally occurs only once for each template object. The BoilerPlate classes save the resulting text blocks for later processing in the formatting stage. The other phases can occur at any time, to place new values in the active format dictionary or to format output. The BoilerPlate base class also remembers the result of the last formatting stage. However, any changes to the the active format dictionary will clear this cached value so that BoilerPlate instances always emit up-to-date information.
Fragmentation Stage
The BoilerPlate base class Block performs the bulk of the processing of template objects during the fragmentation stage. It looks for special tags that denote regions of text in the template. All tags begin and end with the "#" character. I chose this character because it is Python's comment character, and would least likely be found in the expressions allowed in BoilerPlate constructs. A BoilerPlate text block begins with a tag that follows the format #<kind><data>#, where <kind> represents the block type, and <data> contains whatever information is required by the tag to process the block. A block ends with a corresponding tag formatted as #end<kind> (an optional space between the end and <kind> value is allowed). The <kind> values of the start and end tags must match.
Also recognized, though not actually a block, is the sequence #char# which will leave a single "#" in the formatted text. You would use this in the unlikely event that there is a conflict between a BoilerPlate tag and unformatted template text. Table 2 presents a list of legal BoilerPlate tags.
BoilerPlate supports nested text blocks. The template in Listing Two has a for block nested inside an else block. Apart from memory constraints, there is no limit to how deep you can nest.
Normally, all text between matching tags is owned by the Block instance indicated by the start tag. This includes any spaces and line-terminating characters that may appear after the block's start tag or immediately before the block's end tag. For some applications, such behavior may result in excessive whitespace when a BoilerPlate instance emits its output. To overcome this, the BoilerPlate Block class hierarchy supports a parsing mode called lineMode, which trims the first and last characters of the text owned by a Block instance if the characters are either a space or a line terminator. Entering lineMode is accomplished by passing a nonzero value for the lineMode argument in the __init__ method of any Boilerplate Block class.
I usually set lineMode to True when using a template with HTML tables since the HTML table tag <TD> begins accepting data right after the closing ">" character. With lineMode on, I can place BoilerPlate tags on their own lines within <TD> elements without introducing spurious whitespace characters. Example 1 shows the difference between normal and lineMode processing. Notice how the lines of output terminate at different locations.
Conditional Text
BoilerPlate IfBlock instances represent if blocks. Their behavior should be familiar, since they act like the traditional conditional constructs found in programming languages. An IfBlock instance expects a valid Python expression as its <data> component. During the formatting stage, the instance evaluates the expression using Python's built-in eval function. If the result is True, then the corresponding text block is formatted and output. Example 2 shows the results of if processing.
After an initial if block but before its matching end tag, a template may contain any number of elif blocks. If the if condition does not result in a True value, the IfBlock instance visits each elif block in succession until one returns True and emits its output. Finally, an if block can close with an else block that will output its formatted text block if all preceding conditions return False. The classes ElifBlock and ElseBlock represent these additional constructs.
Iterating Over Text
A for block represents an iteration over a block of text. Its behavior is defined in the ForBlock class. The syntax of the <data> part of the for tag is <name> in <value>, where <name> is a legal Python identifier and <value> is a valid Python expression. This is the same as Python's native for syntax. Before formatting its text blocks, the ForBlock instance evaluates <value> to obtain the set of run-time values to iterate over (again, using Python's eval function). The result of this evaluation must be a sequence (list, tuple, string) or a dictionary; anything else is an error (see Example 3).
During the formatting stage, the ForBlock instance makes available to the template text certain iterator values through the identifier <name>. For instance, if <name> is foo, you can access the current iterator value with the qualification foo.value. Table 3 lists the iterator attributes available from an iteration. The presence of a sequence or dictionary is dependent on the Python type being iterated on.
The iterator attributes of a for block are available through the <name> identifier even after the proper end of the block. As a result, you can show summary values at the end of formatted text, or use an iterator attribute in future if or for constructs. However, if a future for block has the same value for <name>, it will overwrite whatever values were there before. This also applies to any values stored in the formatting dictionary: Any slot in the dictionary that has a key that matches a for block's <name> value will be overwritten with the iteration attributes.
BoilerPlate Comments
BoilerPlate supports comments through the CommentBlock class. Comments begin and end with the sequences #!# and #end!#, respectively. All text contained in a comment block is ignored during the formatting stage, and is never output.
Value Acquisition
The BoilerPlate base class lets you supply values for the format dictionary in three methods: during instance creation in the __init__ method, in the Remember method, and in the output generation method Value. You can call the Remember method as many times as you want to build up the values held in the formatting dictionary. However, key collisions are not detected; only the last value corresponding to a particular key is remembered in the formatting dictionary.
All three methods accept format dictionary values in two ways: You can simply pass in a dictionary, or you can list name/value assignments within the method call, using Python's keyword argument feature. In brief, keyword arguments look just like Python assign statements, but appear within function calls. For instance, if a method is defined as
def foo( a, b, c, d )
then the call
foo( d=4, c=3, b=2, a=1 )
will invoke foo with the assignments listed in the call. The assignments do not have to follow the order in which the argument names appear in the function's definition. Furthermore, if the last variable in the definition begins with "**" (def bar( a, b, **c ), for instance), then upon entry to the function, the variable will contain a dictionary of all unassigned keyword arguments. Using this call sequence, positional arguments a and b would again receive the values 1 and 2, respectively; however, c will contain a dictionary with the keys c and d, and values 3 and 4.
Formatting Stage
The BoilerPlate classes use the Python string class format operator (%) to substitute placeholder tags in a template with run-time values. This entire operation takes place in the Cook method of the RawText class (Listing Three). The syntax of the format operator is <string> % <data>, where <string> is a Python string instance with embedded format descriptors that start with a "%" character (similar to those of the C printf). For each descriptor, the format operator takes a value from <data>, formats it according to the descriptor flags, and replaces the descriptor with the resulting value.
Although the <data> part of the expression is usually a Python tuple or list sequence, there is a variant called named value formatting that requires a dictionary on the right side of the % operator. Inside the <string> value, each format descriptor has a key value. The syntax for this extension is %(<key>).... When the format operator encounters the "(" character in a format descriptor, it grabs <key>, attempts to fetch a value from the <data> dictionary that corresponds to <key>, and formats the value per the rest of the descriptor.
There are three interesting facts about the % operator that are not obvious:
- Text between the left and right parentheses of the % descriptor can contain any character, including spaces and additional embedded parenthesis pairs. (Support for embedded parentheses is in Python Version 1.5. There is a patch available for Version 1.4.)
- The key is not evaluated in any way by the Python interpreter before it is used to access a value in the dictionary.
- The right side of the % operator can be an instance of a class that implements a __getitem__ method, and not just a native Python dictionary.
As a result of these conditions, a Python script can gain control during the processing of each format descriptor when it uses named value formatting. This is how Formatter class instances resolve attribute references and function names.
Formatter Functions
The stock Formatter class in BoilerPlate contains simple methods that you can invoke within a % descriptor to change a value before Python applies it to a formatting descriptor. In Example 4, for instance, the format descriptor
%( HtmlEncode( '<' + Lower( Roman( foo ) ) + '>' ) )s
invokes three Formatter methods: the first, Roman, converts the contents of foo (a number) into its Roman numeral equivalent. The result is next given to Lower, which converts all uppercase characters in the string to lowercase. That result is then used in a Python string concatenation operation, which is the argument to the last Formatter method, HtmlEncode. It replaces characters in the set (<, &, >, ") with their corresponding HTML encoding.
Finally, the format operator converts the encoded result into a string because of the "s" format flag at the end of the descriptor (an NOP in this case). Table 4 lists the formatting functions in the Formatter class.
An unusual format function is the Null method. Like its siblings, it takes as its first argument the value to work on. Its second argument specifies a value to return if the first is a Python False value -- a member of the set (None, 0, 0.0, ", (), {}). Because Formatter uses the eval function, you can also use Python's logical operators to achieve the same effect. For instance,
%( foo or 'N/A' )s
and
%( Null( foo, 'N/A' ) )s
will always produce the same formatted output for all values of foo.
You can easily add your own functions either by subclassing the Formatter class or by installing them in the active format dictionary via the Remember method.
BoilerPlate and Python eval
BoilerPlate uses Python's eval built-in function to obtain conditional, iterative, and placeholder values. CGI developers might be concerned about this if they plan to use HTML form data in BoilerPlate expression tags. First, a standard rule to follow in any CGI application is to never blindly accept values received from a form. Otherwise, malicious users might be able to cause problems in your program with the data they enter. I have tried various Python expressions (range( 0, 99999999 ), for instance, and 1 / 0) in a simple CGI application without incident: Python gracefully raises an appropriate exception (MemoryError and ZeroDivisionError, respectively) and continues on. That is not to say that Python will always properly handle all errors or even on all platforms; however, I think the amount of mischief that can be caused is minimal since only expressions are evaluated, and not Python statements. Again, always be wary of data obtained from an external source.
If this is unsatisfactory, you can implement your own expression resolution mechanism for BoilerPlate instances to use. The Formatter class is the only one that uses Python's eval built-in function; it does so in its Resolve method. Simply create a Formatter subclass with your own custom Resolve method.
Conclusion
BoilerPlate has proven to be an extremely useful library for CGI programming. My CGI applications are clearer and less cluttered, and editing an application's HTML components is no longer a chore. Perhaps most important, the incidence of HTML coding errors has dramatically decreased.
DDJ
Listing One
print "<HTML><HEAD><TITLE>%s</TITLE></HEAD><BODY>" % title# Trap when there is no data to show. if len( data ) == 0: print "<B>No data available</B><P>" else: # Print table heading, then each row print "<TABLE><TR><TH>Index</TH><TH>Value</TH></TR>" for index in range( 0, len( data ) ): print "<TR><TD>%d</TD>" % index print "<TD>%s</TD></TR>" % data[ index ] print "</TABLE>" print "</BODY></HTML>"
Listing Two
<HTML><!-- Example of BoilerPlate HTML --> <HEAD> <TITLE>%(title)s</TITLE> </HEAD> <BODY> <!-- -- Trap when there is no data --> #if len( data ) == 0# <B>No data available</B><P> #else# <!-- -- Print table heading, then each row --> <TABLE> <TR><TH>Index</TH><TH>Value</TH></TR> #for each in data# <TR><TD>%(each.index)d</TD><TD>%(each.value)s</TD></TR> #end for# <TR><TH>Total:</TH><TD>%(each.sum)</TD></TR> #end if# </BODY> </HTML> </p>
Listing Three
# Cook -- apply the given dictionary to a range of text we own.def Cook( self, sink, dict ): sink.Append( self.text % dict )
Copyright © 1998, Dr. Dobb's Journal