The Rule
Ultimately, we want to encapsulate an arbitrarily complex parser expression in a placeholder variable that can be used just like any other parser. This is the tricky part. How do you encode an arbitrarily complex type, produced by a rather complex compositional expression, into a variable without knowing its exact type beforehand?
The problem is that each expression evaluates to a unique type signature.
As shown above, the expression a | b
results in a type quite different
from the expression a | b | c
, for example. Yet, we want both rule
= a | b
and rule = a | b | c
to work. The only similarity between
any types produced by compositional expressions is that they have the parser
base class. Even then, parser
is parameterized by its derived class.
Thus, two parser bases with different derived template parameters are essentially
unique types.
The solution: encode the type in a run-time polymorphic class that is derived
from an abstract_parser
class and contains a virtual member function
parse
. Make both the copy constructor and assignment operator of the
rule member templates parameterized by the type of parser argument. Whenever
the copy constructor or assignment operator of the rule is invoked, a concrete
instance of this abstract_parser
is created. A pointer to abstract_parser
is saved in a member variable.
Listing Six shows the class declaration
of our abstract_parser
. Also shown is concrete_parser
, a subclass
that knows about the type of parser and multiply inherits from parser
and abstract_parser
.
Listing Six: The class declaration of abstract_parser and concrete_parser
// Copyright (c) 2001, Joel de Guzman and Dan Nuffer // Permission is granted to use this code without restriction as // long as this copyright notice appears in all source files. template <typename IteratorT> class abstract_parser { public: abstract_parser() {} virtual ~abstract_parser() {} virtual match parse( IteratorT& first, IteratorT const& last) const = 0; }; template <typename ParserT, typename IteratorT> class concrete_parser : public ParserT , public abstract_parser<IteratorT> { public: concrete_parser(ParserT const& parser) : ParserT(parser) {} virtual ~concrete_parser() {} virtual match parse(IteratorT& first, IteratorT const& last) const { return ParserT::parse(first, last); } };
Listing Seven shows a rule's assignment
operator. This assignment operator is a template member function parameterized
by the type of the parser parameter. Any unique parser type composed using SEBNF
expressions, no matter how complex, can be encapsulated this way. Now that the
right-hand side (rhs
) of an assignment (and copy constructor) can be
encapsulated in a rule, the rule may be used anywhere just like any other parser.
Listing Seven: A rule's assignment operator
// Copyright (c) 2001, Joel de Guzman and Dan Nuffer // Permission is granted to use this code without restriction as // long as this copyright notice appears in all source files. template <typename ParserT> rule& operator = (ParserT const& parser) { assert(meta == 0); meta = new impl::concrete_parser<ParserT, IteratorT>(parser); return *this; }
Conclusion
Spirit makes it easy to write parsers directly in C++. Thanks to the expressiveness of C++ operator overloading and the flexibility of template metaprogramming, grammars can be written in a high-level syntax very similar to EBNF. These grammar descriptions can mix freely with C++ code. Semantic actions, attached to any part of the grammar, seamlessly link EBNF grammar specifications to C++ code. The intricacies of the parsing engine framework are hidden beneath an intuitive C++/EBNF interface. Using Spirit is an order of a magnitude easier, compared to coding a parser by hand or perhaps even using a stand-alone parser such as YACC. Since everything is done in C++, there is no need to use an external parser generator along with all the excess baggage that comes with it. Spirit is a flexible, lightweight, and simple-to-use C++ library suitable for many parsing related programming tasks.
Notes and References
[1] James Coplien. "Curiously Recurring Template Patterns," C++ Report, February 1995.
[2] Todd Veldhuizen. "Expression Templates," C++ Report, June 1995.
[3] Peter Naur (ed.). "Report on the Algorithmic Language ALGOL 60," CACM, May 1960.
[4] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns, Elements of Reusable Object-Oriented Software (Addison-Wesley, 1995).
[5] T. J. Parr, H. G. Dietz, and W. E. Cohen. PCCTS Reference Manual, Version 1.00 (School of Electrical Engineering, Purdue University, West Lafayette, August 1991).
[6] Adrian Johnstone and Elizabeth Scott. "RDP, A Recursive Descent Compiler Compiler," Technical Report CSD TR 97 25, Dept. of Computer Science, Egham, Surrey, England, Dec. 20, 1997.
[7] Joel de Guzman. "Spirit Version 1.1," July 2001.
[8] Joel de Guzman. "Spirit Version 1.2," November 2001.
Dan Nuffer is a software engineer. His interests include parsers, compilers, C++ XML, software engineering and Linux. Joel de Guzman is a consultant and software engineer at Boost Consulting (www.boost-consulting.com) with specialization and expertise in the implementation of generic libraries and frameworks, Joel is adept at writing code using modern C++ techniques, especially, but not limited to, template metaprogramming and functional programming in C++. After work, Joel loves to play distortion-laden heavy rock guitar, play around in his home studio, and record his own compositions.