Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Quincy: The Architecture of a C Interpreter


SP 94: Quincy: The Architecture of a C Interpreter

Quincy: The Architecture of a C Interpreter

A complete environment for rapid application development

Al Stevens

Al is a DDJ contributing editor and can be contacted on CompuServe at 71101,1262.


Quincy is an interactive C-language interpreter with a user interface similar to that of QBasic. Quincy runs under MS-DOS in text mode. I originally developed Quincy as a Standard C language teaching aid and included it with my book, Al Stevens Teaches C (MIS:Press, 1994), a C tutorial for developers who program in other languages. I subsequently presented the interpreter in a series of my "C Programming" columns in Dr. Dobb's Journal, commencing with the May, 1994 issue. This article provides an overview of Quincy's software architecture.

Architectural Overview

Figure 1 shows Quincy's two primary subsystems: the integrated development environment (IDE) and the translator. The IDE provides a user interface, source-code editor, and source-level debugger. The translator contains the C preprocessor, lexical scanner, linker, and interpreter.

The two subsystems are loosely coupled in that neither depends heavily on the details of the other. My goals were to build an IDE that is independent of the programming language it supports and a language translator that I could port to other operating environments with a minimum of effort.

The C-Language Implementation

Quincy interprets a subset of Standard C, including most of the preprocessing directives (#line and #pragma are not implemented), most of the standard library functions, and most of the C language, with these exceptions: Structure bitmap fields are not supported, and arrays are limited to four dimensions.

A Quincy program consists of one translation unit, which means that the program does not link object files and libraries built by a compiler. All the code for the interpreted program is contained in one source-code file and the header files that it includes. Programs developed with Quincy can be compiled with ANSI Standard C compilers.

Programs written in Quincy use the standard input/output console device. Its implementation uses Standard C library functions. There are also some nonstandard conio.h functions to support direct console input/output that bypasses DOS and goes directly to BIOS.

Error detection is shared between the processes that prepare the code for execution and the run-time interpreter.

Performance is biased toward the development cycle. Compile-time efficiency is emphasized, and execution time takes a backseat. This strategy produces an interpreter that begins running the program almost immediately. By deferring most of the translation to the run-time interpreter, the strategy penalizes execution time in favor of fast turnaround in the interactive development environment. This trade-off is necessary to support the tutorials for which I designed Quincy.

The IDE

Quincy's IDE implements the user interface with menus and dialog boxes, a source-code editor, and an interactive source-level debugger including breakpoints, watch variables, and support for examining and modifying variables while the interpreted program is running.

User Interface

To approximate the common user access interface shared by contemporary applications, I used the D-Flat function library, which I developed as a Dr. Dobb's Journal "C Programming" column project over the past several years. D-Flat supports an application window, drop-down menus, and dialog boxes, with user access through the mouse and keyboard. It offers a look-and-feel similar to those of MS-DOS utility programs such as DOSSHELL, QBasic, and EDIT. Its Windows-like Help subsystem was easily adapted to the tutorial sessions that I built as exercises in chapters of the book.

Quincy's user interface consists of a source-code-editor application window with menus and dialog boxes that support text editing and interactive debugging of the program. The user can shell out to DOS, get online help, and set several options, which are automatically saved for subsequent sessions.

The application window can display the source-code text and an optional watch window that displays variables being watched during a debug session.

Source-Code Editor

D-Flat has an EDITBOX control oriented more toward simple word processing than source-code editing. For Quincy I derived an EDITOR class from the EDITBOX class to remove word wrapping and add support for tab characters embedded in the text.

You can load an existing source-code file into the editor or begin writing a new program. A clipboard includes cut, copy, and paste commands, and there are text search and replace commands. You can print the current source-code file and save it to disk, giving it any name you choose. The program recognizes when you have changed and not saved a file and prompts you to save it if you are exiting the program or replacing the file in memory.

Debugger

The debugger responds to commands from the user interface to run the program, step through it, set breakpoints, set watch variables, and examine and modify variable values. When the user runs or steps through the program, the debugger calls the translator, which compiles and interprets the program.

After interpreting each statement, the interpreter calls back into the debugger, which updates watch variables, tests for breakpoints, and gives control back to the user interface if the user is single-stepping through the program. The debugger updates the editor's source-code display to reflect the current interpreted statement and to highlight any breakpoints.

The debugger's watch and examine processes call the translator to dereference the variables being viewed.

The Translator

The translator compiles and runs the program. Compilation consists of running the preprocessor, the lexical scanner, and the linker. If the compile was successful, the program begins running. The translator builds and interprets a one-line program with a call to the main function, which can be anywhere in the source code. Command-line arguments are simulated by the debugger, which passes them to the translator. The translator constructs argc and argv arguments to the main function.

Preprocessor

The preprocessor translates a source-code file with preprocessing directives into a source-code file ready to compile. The debugger has passed the address of the editor's source-code buffer to the translator. The translator passes the preprocessor the source-code buffer address and the address of a buffer to receive the preprocessed source code.

The preprocessor strips all comments and unnecessary white space. It defines and resolves macros and processes compile-time conditional directives. It inserts a short C comment for each nonblank source-code line. This comment identifies the file and line number of the source-code line. File numbers represent the source-code file in the editor buffer and all source-code files included by the #include preprocessing directive. These file/line number comments provide line-number information for error reporting and debugger actions.

Lexical Scanner

When the preprocessor returns with no errors, the translator calls the lexical scanner to convert the preprocessed source code into language tokens. Tokens are character values that represent discrete language elements. Each C keyword and operator is a token and each constant is a token that identifies the constant and the constant's value. Each unique identifier is a token that identifies the identifier and an integer offset into a symbol table. The scanner recognizes function declarations and puts them into a symbol table. Subsequent uses of the same identifier are assigned a function-call token and an integer offset into the function table. The scanner searches the standard-library table for standard-library function names and resolves them to their own token and an integer value that identifies the function. The scanner is ahead of traditional scanners in its treatment of function names. It also recognizes statement labels, puts them into a table, and resolves goto references to them.

When it is finished, the lexical scanner has produced a stream of tokens ready to be linked. Identifiers in the token stream are resolved to point to their respective symbol- or function-table entries.

Linker

The linker passes through the program's token stream and resolves global declarations, building tables of global variables and function prototypes and initializing the global variables. For each statement block in each function definition, the linker builds tables of local variable declarations and initializes the static local variables. Each function has a table of parameter variables and local variables. These tables are used to build a run-time declaration and initialization of the variables when the function is called.

Interpreter

The interpreter executes the program by interpreting tokens in the token streams of functions. The translator calls the interpreter and tells it to begin executing the tokens in the main function. The interpreter interprets the tokens in main and returns when main returns. The main function, of course, may call other functions, and the interpreter processes these calls.

Statements

Quincy's interpreter executes code one statement or statement block at a time. A statement block is initiated when Quincy sees the left brace in the token stream. The statement process initializes the block's local variables and then calls itself recursively until it sees a right brace.

Each individual statement is examined to see if it is a flow-control keyword (goto, if, else, while, do, for, switch, case, default, return, break, or continue). If so, the interpreter evaluates the controlling expressions and executes the statements that should execute as a result of the flow control. Statements that are not flow-control keywords are assumed to be expressions.

Expressions

Quincy evaluates expressions by interpreting the tokens and performing a recursive-descent parse at run time. Quincy employs an expression evaluation stack, which can contain entries of any data type that can be computed by expression evaluation, passed to a function as an argument, or returned from a function. The stack contains lvalue entries that consist of indirect pointers to the values in variable memory and rvalue entries that consist of values themselves. Constants and addresses are examples of rvalues. Variable references are examples of lvalues. The expression evaluator evaluates each element of the expression and pushes the result on the stack. Operators in the expression act on values that are already on the stack. The binary addition operator, for example, assumes that the stack has the left value already evaluated and pushed. The operator's function calls the expression evaluator recursively to evaluate the right value. Then it pops the two values, sums them, and pushes the result. Other operators behave similarly. The position of the operator in the recursive descent determines its precedence. Operators with the same precedence are processed in the same position. Their associativity is determined by the sequence in which they are processed.

A translator that strives for maximum run-time performance converts prefix-notation expressions to postfix notation and evaluates them in postfix order. The burden of the recursive descent is borne by the compiler rather than the interpreter. As a result, it takes longer to prepare a program to run than it does when the run-time interpreter processes the recursive descent. Quincy intentionally produces programs that run slower to gain the advantage of rapid turnaround in the tutorial environment.

Quincy similarly evaluates structure-member operations and subscript operators at run time.

Function Calls

One element in an expression can be a function call. When the expression evaluator sees one, it suspends interpreting the current function, saves its context, and prepares to interpret the called function. First, the interpreter checks the types of the arguments in the call against the function's prototype. When that test has passed, the interpreter calls the expression evaluator once for each argument in the function call's argument list. This operation pushes the arguments onto the expression-evaluation stack.

The interpreter tests to see if the function being called is in the user's program or taken from the standard library. If it is a user function, the interpreter pops the arguments from the stack and initializes parameter variables with the argument values. Then the interpreter begins executing the function by executing its first, outer statement block. When that block has finished executing, the interpreter restores the context of the function that made the call and resumes executing it.

Library Functions

When a program calls a Standard-C library function, the interpreter passes control to a process that executes the library function. Library functions are not interpreted. Quincy uses the standard-library functions of the compiler with which it is compiled to service standard-library function calls from the interpreted program. Depending on which function is called, the interpreter pops the arguments into local variables, calls the standard-library function, and pushes the returned value, if any, onto the expression-evaluation stack.

If the library function uses the keyboard or screen, the interpreter notifies the IDE to relinquish the screen to the run-time system. This is necessary when you step through the code one line at a time. After the library function executes, the interpreter notifies the IDE that it can take the screen back if it needs to.

When the function is of printf or scanf form, the interpreter uses the host compiler's vprintf and vscanf functions to build the variable argument list.

Quincy keeps a table of file handles that the interpreted program opens, deleting table entries when the program closes the files. If the program terminates without closing all the files it opened, the interpreter closes them. This behavior emulates that of a program running under DOS and prevents an errant interpreted program from gobbling up DOS's limited number of file handles.

Quincy does not recognize standard-library function calls unless they have been declared, so the using program must include the standard header files that declare library functions. The prototypes assure that the function calls pass the correct types and number of arguments.

Memory Allocation

The interpreted program uses Quincy's heap for mallocs. Quincy uses the DOS system heap. To prevent an interpreted program from allocating memory and not freeing it, the interpreter maintains a table of allocated memory buffers and frees any buffers left allocated after the interpreted program terminates.

Error Processing

Quincy can detect source-code errors at any time during translation. A common error-processing function accepts an error number as a parameter and performs a longjmp call to restore the interpreter to its condition just prior to beginning the translation.

Future Directions

Quincy evolved from a K&R interpreter developed nearly ten years ago. When I separated the interpreter from the user interface and added support for the ANSI language extensions, I had a long-term goal in mind other than the immediate need for a tutorial tool. I wanted a C-interpreter engine I could install wherever it made sense--as a scripting language, a macro language, or a portable visual programming environment. Separating the interpreter from its operating environment was the first step toward that goal. Other improvements should rebalance the compile and run-time responsibilities and improve performance. The translator should resolve the expression evaluation at compile time. The interpreter should use tokens as the argument for a finite state machine rather than to the large switch statements it currently uses. The compiler should support incremental compiles to rebuild only those discrete program entities--functions--that you changed since the last compile. Whether or not Quincy ever gets those improvements depends on time available to do the work and how compelling a need I feel to get it done.

How to Get Quincy

Quincy is available to download from the DDJ Forum on CompuServe and on the Internet by anonymous ftp at site ftp.mv.com. Alternatively, you can send a diskette and a stamped, addressed mailer to me at Dr. Dobb's Journal, 411 Borel, San Mateo, CA 94402 and I'll send you a copy of the source code. Quincy is free, but if you want to support my Careware charity, include a dollar for the Brevard County Food Bank.

Figure 1 Quincy's two primary subsystems--the integrated development environment (IDE) and the translator.


Copyright © 1994, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.