C/C++

Practical Parsing for ANSI C

By Daniele Paolo Scarpazza, December 12, 2006

Daniele discusses the design of an ANSI C parser front-end, identifying the pitfalls that make design tricky.

...And Trickier

Type stacks are employed to process type specifiers, but type specifiers can appear both in declarations and in cast expressions. You observe no conflicts, except when the two features are used together, as in:


1: typedef int * intp_t;
2: char pc;
3: void f() {
4: int * pi = (intp_t) & pc;
5: }

where line 4 contains a variable declaration and initialization, and the initializer contains a cast expression. Appropriate countermeasures must be taken to tokenize intp_t in line 4 as TYPE_NAME. In fact, according to the aforementioned logic, it is tokenized as IDENTIFIER. The lexer should inhibit the behavior "force to IDENTIFIER" when it is in the middle of a variable initializer. Unfortunately, whether or not we are in the middle of a variable initializer is something that only the parser can say, and only after the lexical analysis is complete. The lexical choice to tokenize intp_t would then depend on how a superset of input text is parsed after lexical analysis: It is a circular dependence.

To solve the issue, I exploit the fact that cast expressions always appear between parentheses. The lexer maintains a "parenthesis level"; that is, a counter of how many parentheses are open. This counter is incremented at each encountered "(", and decremented at each ")". Then, type names are not forced to IDENTIFIER when we are inside a couple of parentheses. The corresponding modification can be seen in Listing Three.

int category()
{
  Block * blockp = Block::currentp;
  ...
  while (blockp) 
  {
     if (blockp->symbol_table.name_is_defined(yytext)) 
     {
        if (blockp->symbol_table.name_is_typedef(yytext)) 
        {
           if (Block::currentp->type_stack.is_valid_type() &&
               parenthesis_level==0)
         return IDENTIFIER;
       else 
         return TYPE_NAME;
        } 
        else 
          return IDENTIFIER; 
     }
     blockp = blockp->parentp;
  }
  return IDENTIFIER;
}
 ...

Listing Three

Previous 1 2 3 4 5

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

C/C++

Practical Parsing for ANSI C

...And Trickier

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

C/C++

Practical Parsing for ANSI C

...And Trickier

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content