...And Trickier
Type stacks are employed to process type specifiers, but type specifiers can appear both in declarations and in cast expressions. You observe no conflicts, except when the two features are used together, as in:
1: typedef int * intp_t; 2: char pc; 3: void f() { 4: int * pi = (intp_t) & pc; 5: }
where line 4 contains a variable declaration and initialization, and the initializer contains a cast expression. Appropriate countermeasures must be taken to tokenize intp_t in line 4 as TYPE_NAME. In fact, according to the aforementioned logic, it is tokenized as IDENTIFIER. The lexer should inhibit the behavior "force to IDENTIFIER" when it is in the middle of a variable initializer. Unfortunately, whether or not we are in the middle of a variable initializer is something that only the parser can say, and only after the lexical analysis is complete. The lexical choice to tokenize intp_t would then depend on how a superset of input text is parsed after lexical analysis: It is a circular dependence.
To solve the issue, I exploit the fact that cast expressions always appear between parentheses. The lexer maintains a "parenthesis level"; that is, a counter of how many parentheses are open. This counter is incremented at each encountered "(", and decremented at each ")". Then, type names are not forced to IDENTIFIER when we are inside a couple of parentheses. The corresponding modification can be seen in Listing Three.
int category() { Block * blockp = Block::currentp; ... while (blockp) { if (blockp->symbol_table.name_is_defined(yytext)) { if (blockp->symbol_table.name_is_typedef(yytext)) { if (Block::currentp->type_stack.is_valid_type() && parenthesis_level==0) return IDENTIFIER; else return TYPE_NAME; } else return IDENTIFIER; } blockp = blockp->parentp; } return IDENTIFIER; } ...