Composites
By itself, the chlit
class might not seem very useful. Yet, composed to
form a hierarchy, trivial parser classes can produce quite complex and elaborate
parsing functionalities (see the alternative
class in Listing
Three for an example).
Listing Three: The alternative class.
// Copyright (c) 2001, Joel de Guzman and Dan Nuffer // Permission is granted to use this code without restriction as // long as this copyright notice appears in all source files. template <typename A, typename B> struct alternative : public binary<A, B> , public parser<alternative<A, B> > { alternative(A const& a, B const& b) : binary<A, B>(a, b) {} template <typename IteratorT> match parse(IteratorT& first, IteratorT const& last) const { if (match hit = this->left().parse(first, last)) return hit; return this->right().parse(first, last); } };
This class is a composite parser class. It has two sub-parsers: left
and right
. parse
returns a success if one of its subjects is successful.
Through object composition alone, we can build parsers of arbitrary complexity.
The library includes a couple of predefined primitives (like chlit
) as
well as composites (such as alternative
). Primitives for parsing include
literal strings, ranges, and numerics. Composites include alternatives, sequences,
intersections, and differences. These parsers can be combined to form a hierarchy
in a multitude of ways.
Operators
Primitives and composites can be composed to form more complex composites using
operator overloading, coded in a way that mimics EBNF. Composing an alternative
,
for example requires two operands and is accomplished by overloading the operator
|
as shown here:
template <typename A, typename B> inline alternative<A, B> operator | (parser<A> const& a, parser<B> const& b) { return alternative<A, B>( a.derived(), b.derived()); }
Given two chlit
objects a
and b
, we can form an alternative
composite object using the |
operator:
a | b
Assuming that both a
and b
have the type chlit<char>
,
the resulting composite is:
alternative<chlit<char>, chlit<char> >
Continuing the example, the expression:
a | b | c
where a
, b
, and c
all have the type chlit<char>
,
will generate a nested composite with the type:
alternative<alternative< chlit<char>, chlit<char> >, chlit<char> >
All composite parsers follow the same style of object aggregation. The difference
lies in the algorithms used by each class's parse
member function.
The sequence
class appears in Listing Four.
Listing Four: The sequence class
// Copyright (c) 2001, Joel de Guzman and Dan Nuffer // Permission is granted to use this code without restriction as // long as this copyright notice appears in all source files. template <typename A, typename B> struct sequence : public binary<A, B> , public parser<sequence<A, B> > { sequence(A const& a, B const& b) : binary<A, B>(a, b) {} template <typename IteratorT> match parse(IteratorT& first, IteratorT const& last) const { IteratorT s = first; match ma, mb; if ((ma = this->left().parse(s, last)) && (mb = this->right().parse(s, last))) { first = s; return ma + mb; } return match(); } };
The sequence
class is very similar to alternative
, but it has
a slight twist in the implementation of its parse
member function. sequence
matches both its subjects (left
and right
) in strict sequence.
Like alternative
, sequence
has a corresponding overloaded operator:
the >>
operator.
Unary and Binary Composites
So far, we have just tackled binary composites such as alternative
and
sequence
. Unary composites work similarly to binary composites but with
only a single subject. The Kleene star, used for zero or more iterations, and
the positive closure, used for one or more iterations, are examples of unary composites. Listing Five shows the Kleene star class implementation.
Listing Five: The Kleene star class implementation.
// Copyright (c) 2001, Joel de Guzman and Dan Nuffer // Permission is granted to use this code without restriction as // long as this copyright notice appears in all source files. template <typename S> struct kleene_star : public unary<S> , public parser<kleene_star<S> > { kleene_star(S const& a) : unary<S>(a) {} template <typename IteratorT> match parse(IteratorT& first, IteratorT const& last) const { match hit(0); while (match next = this->subject().parse(first, last)) hit += next; return hit; } };
As is obvious by now, binary composites inherit from the binary class that
does the work of storing a copy of its left and right subjects. In the same
vein, unary composites inherit from the unary class, similar to the binary class
but with only one subject. Most fundamentally, all parsers, whether primitive
or composite, inherit from the base parser
class.
All entities in the library are variations of the examples presented. There
are entities that deal with set
-like operations, iteration (Kleene star
and its brethren), semantic actions (hooks to external C/C++ code), directives
(unary composites that modify the functionality of its subjects), parser assertions,
and error handlers.