In recent years, a multitude of little languages have been proposed for extending and customizing applications. In general, these extension languages should have the following attributes:
- Clear and simple syntax (since it is not the main language for most of its users).
- Small size and small implementation (so the cost of adding it to the host will not be too high).
- Good data-description facilities (to make it useful as a configuration language).
- Adequate extensibility (to allow its use in high abstraction levels for interfacing with users in diverse domains).
Since extension languages are not for writing large pieces of software, mechanisms for supporting programming-in-the-large, like static type checking and information hiding, are not essential.
Lua, the extensible, embedded language we present here, satisfies these requirements. Its syntax and control structures are simple and familiar. Lua is small--the whole implementation is less than 6000 lines of ANSI C. Besides the facilities common to most procedural languages, Lua has special features that make it a powerful high-level extensible language:
- Ability to define and manipulate functions as first-class values, which greatly simplifies the implementation of object-oriented facilities.
- Associative arrays powerful language constructs that implement most data containers.
- Garbage collection, negating the need for explicit managing of memory allocations a major source of programming errors.
- A fallback mechanism, allowing extension of the semantics of the language.
- Reflexive facilities, allowing the creation of highly polymorphic parts.
Lua is a general-purpose embedded programming language designed to support
procedural programming with data-description facilities. Although it is not
in the public domain (TeCGraf retains the copyright), Lua is freely available for
both academic and commercial purposes at http://www.inf.puc-rio.br/~roberto/lua.html. The distribution also includes a standard library of mathematical
functions (sin, cos,
and so on), I/O and system functions, and
string-manipulation functions. This optional library adds some 1000 lines of code
to the system. Also included are a debugger and a separate compiler that produces
portable binary files containing bytecodes. The code compiles without change in
most ANSI C compilers, including gcc (on AIX, IRIX, Linux, Solaris, SunOS, and
ULTRIX), Turbo C (on DOS), Visual C++ (on Windows 3.1/95/NT), Think C (MacOS),
and CodeWarrior (MacOS).
All external identifiers are prefixed with lua
to avoid name clashing when
linking with applications. Even the code generated by yacc passes through a sed
filter to comply with this rule, so that it is possible to link Lua with
applications that use yacc for other purposes.
The Lua Implementation
Lua is provided as a small library of C functions to be linked to host
applications. For example, the simplest Lua client is the interactive,
stand-alone interpreter in Listing One. In this
program, the function lua_dostring
calls the interpreter over a section of
code contained in a string. Each chunk of Lua code may contain a mixture of
statements and function definitions.
The header file lua.h defines Lua's API, which has about 30 functions. Besides
lua_dostring
, there is a lua_dofile
function to interpret Lua code
contained in files, lua_getglobal
and lua_setglobal
to manipulate
Lua global variables, lua_call
to call Lua functions, lua_register
to make C functions accessible from Lua, and so on.
Lua has a syntax somewhat similar to Pascal. To avoid dangling
els
e
s, control structures like i
f
s and
whil
e
s finish with an explicit end
. Comments follow the Ada
convention, starting with "--" and run until the end of the line. Lua supports
multiple assignment; for example, x, y = y, x
swaps the values of x
and y
. Likewise, functions can return multiple values.
Lua is a dynamically typed language. This means that values have types but
variables don't, so there are no type or variable declarations. Internally, each
value has a tag that identifies its type; the tag can be queried at run time with
the built-in function type
. Variables are typeless and can hold values of
any type. Lua's garbage collection keeps track of which values are being used,
discarding those that are not.
Lua provides the types nil
, string
, number
, user
data
, function
, and table
. nil
is the type of the value
ni
l
; its main property is that it is different from any other
value. This is handy to use as the initial value of variables, for instance. The
type number
represents floating-point real numbers. string
has the
usual meaning. Type user data corresponds to a generic void*
pointer in C,
and represents host objects in Lua. All of these types are useful, but the
flexibility of Lua is due to functions and tables, the product of two key lessons
from Lisp and Scheme:
- Functions should be first-class values.
- Languages should have a single and strong unifying data constructor (lists in Lisp, tables in Lua).
Function values in Lua can be stored into variables, passed as parameters to other functions, stored in tables, and the like.
When you declare a function in Lua (see Listing Two), the function body is precompiled into bytecodes, creating a function value. This value is assigned to a global variable with the given name. C functions, on the other hand, are provided by the host program through an appropriate call to the API. Lua cannot call C functions that have not been registered by their host. Therefore, the host has complete control over what a Lua program can do, including any potentially dangerous access to the operating system.
Tables
Tables are for Lua what lists are for Lisp: powerful data-structuring mechanisms. A table in Lua is similar to an associative array. Associative arrays can be indexed with values of any type, not just numbers.
Many algorithms become trivial when implemented with associative arrays, because the data structures and algorithms for searching them are implicitly provided by the language. Lua implements associative arrays as hash tables.
Unlike other languages that implement associative arrays, tables in Lua are not bound to a variable name. Instead, they are dynamically created objects that can be manipulated much like pointers in conventional languages. In other words, tables are objects, not values. Variables do not contain tables, only references to them. Assignment, parameter passing, and function returns always manipulate references to tables, and do not imply any kind of copy. While this means that a table must be explicitly created before it is used, it also allows tables to freely refer to other tables. So, tables in Lua can be used to represent recursive data types and to create generic graph structures, even those with cycles.
Tables simulate records simply by using field names as indices. Lua makes this
easier by providing a.name
as syntactic sugar for
a
["name"]
. Sets also can be easily implemented by storing
their elements as indices of a table. Note that tables (and therefore sets) need
not be homogeneous; they can store values of all types simultaneously, including
functions and tables.
Lua provides a constructor, a special kind of expression to create tables, that is handy for initializing lists, arrays, records, etc. See Example 1.
User-Defined Constructors
Sometimes you need finer control over the data structures you are building.
Following the philosophy of providing only a few general metamechanisms, Lua
provides user-defined constructors. These constructors are written
nam
e
{...},
which is a more intuitive version of
nam
e
({...})
. In other words, with such a constructor, a
table is created, initialized, and passed as a parameter to a function. This
function can do whatever initialization is needed, such as dynamic type checking,
initialization of absent fields, and auxiliary data-structure update, even in the
host program. (Listing Three)
User-defined constructors can be used to provide higher-level abstractions. So,
in an environment with proper definitions, you can write
window
1=
Windo
w
{
x
=200,
y
=
300,
colo
r
="blu
e
"}
and think about
"windows," not plain tables. Moreover, because constructors are expressions, they
can be nested to describe more complex structures in a declarative style, as in Listing Four.
Object-Oriented Programming
Because functions are first-class values, table fields can refer to functions. This is a step toward object-oriented programming, and one made easier by simpler syntax for defining and calling methods.
A method definition is written as Example 2(a),
which is equivalent to Example 2(b). In
other words, defining a method is equivalent to defining a function, with a
hidden first parameter called self
and storing the function
in a table field.
A method call is written as receiver: method(params),
which is translated
to receiver.method(receiver,params)
. The receiver of the method is passed
as the first argument of the method, giving the expected meaning to the parameter
self
.
These constructions do not provide information hiding, so purists may (correctly) claim that an important part of object orientation is missing. Moreover, Lua does not provide classes; each object carries its own method-dispatch tables. Nevertheless, these constructions are extremely light, and classes can be simulated using inheritance, as is common in other prototype-based languages, such as Self.
Fallbacks
Because Lua is an untyped language, many abnormal run-time events can happen: arithmetic operations being applied to nonnumerical operands, nontable values being indexed, nonfunction values being called. In typed, stand-alone languages, some of these conditions are flagged by the compiler; others result in aborting the program at run time. It's rude for an embedded language to abort its host program, so embedded languages usually provide hooks for error handling.
In Lua, these hooks are called "fallbacks" and are also used for handling
situations that are not strictly error conditions, such as accessing an absent
field in a table and signaling garbage collection. Lua provides default fallback
handlers, but you can set your own handlers by calling the built-in function
setfallback
with two arguments: a string identifying the fallback
condition (see Table 1), and the function to be
called whenever the condition occurs. setfallback
returns the old fallback
function, so you can chain fallback handlers if necessary.