What Is Forth Anyway?
Forth is a stack-based procedural programming language that was invented by Charles H. Moore around 1968. By 1970 Forth was being used to control the space telescopes at Kitt Peak Observatory in Arizona. At that time programming tools for small computer systems were rather crude and computer memory was very expensive. C compilers were not yet widely available, BASIC was still fully interpreted and therefore very slow, and Fortran compilers were too expensive for many. Programming during this time period usually meant assembler language or worse yet machine language. Forth offered a higher level of programming that increased productivity immensely while still offering the assembly language interface if required for an application. And as counter-intuitive as it seems, the larger a Forth program got the more efficient it used memory. In fact, Forth programs could be smaller than equivalent assembler language programs on the same hardware platform.
Forth source code is line-oriented and can be entered interactively via a keyboard or stored on a mass storage device of some kind and loaded in. In the early days, Forth source code was stored in 1K blocks on disk, but today it is stored in normal files. A line of Forth source consists of a series of space character delimited tokens. The Forth interpreter reads in the tokens and either executes them interactively or, if in compile mode, compiles them for later execution.
The Forth word is a language element that can be used interactively or within a program. Source code for a particular word is called its "definition." Primitive Forth words typically have their definitions coded in the CPU's native machine language, whereas non-primitive words have their definitions coded in Forth. A Forth program is run by executing the highest level word which defines the program.
The Forth word ":" (colon) puts the interpreter into compile mode and the word ";" (semicolon) ends compile mode. Immediate Forth words execute even in compile mode, whereas non-immediate Forth words have their addresses compiled into the word being defined.
Many things make Forth unique but two things need to be specifically pointed out.
- First, Forth is stack-oriented language, meaning that all arguments to and values returned from Forth words are unnamed, untyped, and located on a stack.
- Second, Forth uses Reverse Polish Notation (RPN) like many HP calculators for expression evaluation.
Let's use the Forth word DUP as an example of argument passing. DUP duplicates the value currently on the top of the data stack. DUP takes its input value from the stack and pushes two copies of that value back onto the stack.
Using RPN for expression evaluation takes getting used to when you're used to normal algebraic or infix evaluation. The algebraic expression 2 + 5 * 3 evaluates as 2 + (5 * 3), whereas the RPN evaluation becomes 2 5 3 * +. Expression conversion to RPN is easy if you keep the following in mind:
- Identifiers are used in the same order.
- Operators are used in the same order.
- Operators follow identifiers.
It should be noted that Forth leaves all data typing issues to the programmer. Typing is usually handled by having Forth words defined for each data type. For example, the addition operator "+" expects its arguments to be integers and integers only. If floating-point addition is required in an application a special operator like "F+" might be defined. (In the Forth implementation discussed later, I have overloaded operators like "+" to handle all valid data types.)
Traditionally Forth was implemented as a threaded, interpretive language (TIL) though that is not necessarily the case today where Forth can be compiled into machine code for direct execution just like C++. A TIL implementation (as discussed in this article) has to do with how a language's source code is translated into executable form. TILs generally employ a two-phase translation process. In the first phase (compile mode), source code is interpreted into a list of fully resolved address references to underlying functionality. At runtime, a thread of execution is established from the top-level Forth word through the lists of compiled addresses of the words with which it was defined. Forth source code is converted into a fully analyzed internal form (lists of addresses) by the first phase of the translation process so the compiler is no longer needed. In addition, no further interpretation is used at runtime as the thread of execution has been fully established.
Three components -- a dictionary, the inner interpreter, and the outer interpreter -- make up traditional Forth implementations. Typically implemented with a linked list, the dictionary contains an entry for every Forth word (primitives and non-primitives) that can be executed interactively or used in the definition of new Forth words. New, successfully compiled Forth words are added to the dictionary for subsequent lookup.
The outer interpreter provides users with a crude command-line interface to the Forth environment. It typically provides a prompt such as ">" and waits for user input. The outer interpreter gathers up a line of input at a time and submits that line to the inner interpreter where all the action happens. If the line is interpreted successfully, "OK" is output and the process repeats until no more input is available.
Listing One is pseudocode for the inner interpreter I used in my Java implementation. It is fairly typical in its operation.
begin prepare tokenizer for parsing line of text while tokens exist on line get next token if not in compile mode if token is string literal push string onto data stack continue endif search dictionary for word if word found in dictionary execute it else try to parse word as number if numeric push number onto data stack else output error indication - word followed by ? return false endif endif else ( in compile mode ) if token is string literal add string to word being defined continue endif search dictionary for word if word found in dictionary if word marked as immediate execute word else add word to word being defined endif else ( word not found in dictionary ) try to parse word as number if numeric add number to word being defined else output error indication - word followed by ? return false endif endif endif end while return true end
The following examples give you a sense of how Forth works. The ">" prompt and the "OK" are provided by the outer interpreter.
> 1 . 1 OK
Since we are not in compile mode, the first 1 is identified as a numeric literal and is pushed onto the data stack. The "." (called "dot") causes the item at the top of the stack to be popped off and output to the console; hence the second 1.
> 10 HEX . A OK
This example performs number base conversion. Unless changed, Forth uses base 10 for numeric operations. Here we push a decimal 10 value onto the stack, change the numeric base to HEX and then use dot to print the top of the stack value. An "A" is printed as A is 10 in hex.
> : hello_world "*** Hello World ***" . cr cr ; OK
This is an example of Forth compilation. Here we define a new word called hello_world which echoes the string *** Hello World *** to the console followed by a couple of carriage returns. The final OK signifies successful compilation. Later when hello_world is typed on the command line, the string will be output.
> "c:/archive/java/jforth/testcode" load OK
Here a file/pathname string is pushed onto the data stack and the Forth word load pops this string off of the stack and compiles the Forth code in the file. This is how you would compile a complete Forth program. It is interesting to note that both compile and runtime behaviors can be combined in a file of loaded Forth code. Below, the definition of dl1 is compiled and the very next statement executes the new word interactively.
( Do loop with positive 1 loop increment ) : dl1 10 0 do i . loop ; " do loop with positive 1 loop increment: " . dl1 cr
Consult any book on Forth for hundreds of other examples.