Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

JVM Languages

JForth: Implementing Forth in Java


What Is Forth Anyway?

Forth is a stack-based procedural programming language that was invented by Charles H. Moore around 1968. By 1970 Forth was being used to control the space telescopes at Kitt Peak Observatory in Arizona. At that time programming tools for small computer systems were rather crude and computer memory was very expensive. C compilers were not yet widely available, BASIC was still fully interpreted and therefore very slow, and Fortran compilers were too expensive for many. Programming during this time period usually meant assembler language or worse yet machine language. Forth offered a higher level of programming that increased productivity immensely while still offering the assembly language interface if required for an application. And as counter-intuitive as it seems, the larger a Forth program got the more efficient it used memory. In fact, Forth programs could be smaller than equivalent assembler language programs on the same hardware platform.

Forth source code is line-oriented and can be entered interactively via a keyboard or stored on a mass storage device of some kind and loaded in. In the early days, Forth source code was stored in 1K blocks on disk, but today it is stored in normal files. A line of Forth source consists of a series of space character delimited tokens. The Forth interpreter reads in the tokens and either executes them interactively or, if in compile mode, compiles them for later execution.

The Forth word is a language element that can be used interactively or within a program. Source code for a particular word is called its "definition." Primitive Forth words typically have their definitions coded in the CPU's native machine language, whereas non-primitive words have their definitions coded in Forth. A Forth program is run by executing the highest level word which defines the program.

The Forth word ":" (colon) puts the interpreter into compile mode and the word ";" (semicolon) ends compile mode. Immediate Forth words execute even in compile mode, whereas non-immediate Forth words have their addresses compiled into the word being defined.

Many things make Forth unique but two things need to be specifically pointed out.

  • First, Forth is stack-oriented language, meaning that all arguments to and values returned from Forth words are unnamed, untyped, and located on a stack.
  • Second, Forth uses Reverse Polish Notation (RPN) like many HP calculators for expression evaluation.

Let's use the Forth word DUP as an example of argument passing. DUP duplicates the value currently on the top of the data stack. DUP takes its input value from the stack and pushes two copies of that value back onto the stack.

Using RPN for expression evaluation takes getting used to when you're used to normal algebraic or infix evaluation. The algebraic expression 2 + 5 * 3 evaluates as 2 + (5 * 3), whereas the RPN evaluation becomes 2 5 3 * +. Expression conversion to RPN is easy if you keep the following in mind:

  • Identifiers are used in the same order.
  • Operators are used in the same order.
  • Operators follow identifiers.

It should be noted that Forth leaves all data typing issues to the programmer. Typing is usually handled by having Forth words defined for each data type. For example, the addition operator "+" expects its arguments to be integers and integers only. If floating-point addition is required in an application a special operator like "F+" might be defined. (In the Forth implementation discussed later, I have overloaded operators like "+" to handle all valid data types.)

Traditionally Forth was implemented as a threaded, interpretive language (TIL) though that is not necessarily the case today where Forth can be compiled into machine code for direct execution just like C++. A TIL implementation (as discussed in this article) has to do with how a language's source code is translated into executable form. TILs generally employ a two-phase translation process. In the first phase (compile mode), source code is interpreted into a list of fully resolved address references to underlying functionality. At runtime, a thread of execution is established from the top-level Forth word through the lists of compiled addresses of the words with which it was defined. Forth source code is converted into a fully analyzed internal form (lists of addresses) by the first phase of the translation process so the compiler is no longer needed. In addition, no further interpretation is used at runtime as the thread of execution has been fully established.

Three components -- a dictionary, the inner interpreter, and the outer interpreter -- make up traditional Forth implementations. Typically implemented with a linked list, the dictionary contains an entry for every Forth word (primitives and non-primitives) that can be executed interactively or used in the definition of new Forth words. New, successfully compiled Forth words are added to the dictionary for subsequent lookup.

The outer interpreter provides users with a crude command-line interface to the Forth environment. It typically provides a prompt such as ">" and waits for user input. The outer interpreter gathers up a line of input at a time and submits that line to the inner interpreter where all the action happens. If the line is interpreted successfully, "OK" is output and the process repeats until no more input is available.

Listing One is pseudocode for the inner interpreter I used in my Java implementation. It is fairly typical in its operation.

begin
    prepare tokenizer for parsing line of text
    while tokens exist on line
        get next token
        if not in compile mode
            if token is string literal
                push string onto data stack
                continue
            endif
            search dictionary for word
            if word found in dictionary
                execute it
            else
                try to parse word as number
                if numeric
                    push number onto data stack
                else
                    output error indication - word followed by ?
                    return false
                endif
            endif
        else ( in compile mode )
            if token is string literal
                add string to word being defined
                continue
            endif
            search dictionary for word
            if word found in dictionary
                if word marked as immediate
                    execute word
                else
                    add word to word being defined
                endif
            else ( word not found in dictionary )
                try to parse word as number
                if numeric
                    add number to word being defined
                else
                    output error indication - word followed by ?
                    return false
                endif
            endif
        endif
     end while
     return true
end
Listing One: Pseudocode for the Inner Interpreter

The following examples give you a sense of how Forth works. The ">" prompt and the "OK" are provided by the outer interpreter.

> 1 . 1 OK

Since we are not in compile mode, the first 1 is identified as a numeric literal and is pushed onto the data stack. The "." (called "dot") causes the item at the top of the stack to be popped off and output to the console; hence the second 1.

> 10 HEX . A OK

This example performs number base conversion. Unless changed, Forth uses base 10 for numeric operations. Here we push a decimal 10 value onto the stack, change the numeric base to HEX and then use dot to print the top of the stack value. An "A" is printed as A is 10 in hex.

>  : hello_world  "*** Hello World ***" .  cr cr ; OK

This is an example of Forth compilation. Here we define a new word called hello_world which echoes the string *** Hello World *** to the console followed by a couple of carriage returns. The final OK signifies successful compilation. Later when hello_world is typed on the command line, the string will be output.

> "c:/archive/java/jforth/testcode" load OK

Here a file/pathname string is pushed onto the data stack and the Forth word load pops this string off of the stack and compiles the Forth code in the file. This is how you would compile a complete Forth program. It is interesting to note that both compile and runtime behaviors can be combined in a file of loaded Forth code. Below, the definition of dl1 is compiled and the very next statement executes the new word interactively.

( Do loop with positive 1 loop increment ) 
: dl1 10 0 do i . loop ;
" do loop with positive 1 loop increment: " . dl1 cr

Consult any book on Forth for hundreds of other examples.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.