SWIG extends scripting languages
Dave is the developer of SWIG. He is currently a graduate student in computer science at the University of Utah, and can be reached at [email protected].
Sidebar: SWIG Applications
Developers routinely face the challenge of building software that is both easy to maintain and extend. For the past eight years, for instance, I've spent much of my time developing computational physics applications at Los Alamos National Laboratory. As computing problems have grown in size, the nature of modern research applications has become more complicated. Users demand simulation, data analysis, visualization, networking capabilities, and more. At the same time, there are varying platforms, including supercomputers, symmetric multiprocessors, workstation clusters, and PCs. Developing software in this environment can best be described as total chaos.
One approach to addressing problems like these is to utilize scripting languages such as Perl, Python, or Tcl. Scripting languages are good at simplifying complex programming tasks and are portable between a variety of platforms. In addition, you can extend them with C/C++ libraries or third-party extensions. This makes scripting languages excellent "glue" languages for mixing different packages together, writing extensible applications, and reusing existing code. In this mixed-language environment, you can exploit the best of both worlds -- the performance of C/C++ and flexibility of scripting. Of course, how you go about integrating these disparate languages can be problematic.
Most scripting languages share a common model for interfacing with C code. The Simplified Wrapper and Interface Generator (SWIG) is a freely available, nonproprietary, and fully documented programming tool that exploits this commonality, letting you generate interfaces to a variety of scripting languages from a common interface description. SWIG runs on most platforms including UNIX, Windows 95/NT, and Power Macintosh. The SWIG web page can be accessed at http://www.cs.utah.edu/~beazley/SWIG/, and the software can be downloaded via anonymous FTP at ftp:// ftp.cs.utah.edu/pub/beazley/SWIG/ and from DDJ (see "Resource Center," page 3).
Scripting Language Basics
When creating scripting-language extensions, you typically write a wrapper function for each C function to be accessed. In addition, an initialization function is written to register the wrapper functions with the scripting-language interpreter. Finally, all of this code is compiled into a shared library (or DLL) that can be dynamically loaded into the scripting-language interpreter. When the scripting language wants to use the module, it loads the shared library and executes the initialization function. At this point, all of your new functions can be used (as new scripting language commands).
Listings One, Two, and Three provide the wrapper code needed to call a simple factorial function from Perl, Python, and Tcl. Listing One: (a) is a Perl5 wrapper module and Listing One: (b) is the Perl code needed to dynamically load a module. Listing Two is a Python wrapper module, and Listing Three is the Tcl wrapper module. In each case, the wrapper functions are similar in structure; only the specific details vary.
While writing a few small wrapper functions may not be difficult, the task becomes tedious as the number of functions increases. Luckily, most scripting languages have code-generation tools. For example, Perl is packaged with xsubpp and h2xs, and Python is packaged with modulator. Tcl has a variety of automated wrapper generators. Sometimes, large packages are bundled with special-purpose interface generators, but these tend to be domain specific.
These tools can simplify the interface-generation process, but they are usually specific to a single scripting language and can be limited in their ability to handle common C idioms such as pointers, arrays, structures, and classes. In addition, extension tools can be poorly documented and misunderstood by users. Consequently, the development of scripting-language extensions seems to be confined to the world of scripting gurus, not everyday application developers.
SWIG
SWIG uses a subset of ANSI C/C++ syntax that has been extended with additional directives. The idea behind SWIG is simple: Given a list of ANSI C prototypes, the glue code (or "interface") is automatically constructed to the chosen scripting language without having to worry about the underlying technical details. As for implementation details, SWIG is an extensible system consisting of a common interface parser and collection of "language modules" that are used for code generation and documentation; see Figure 1. This design lets SWIG support multiple scripting languages and allows SWIG to be extended with new capabilities.
SWIG does not require modifications to existing C code and is relatively easy to apply to existing systems. Interfaces are defined entirely by the input to SWIG, and the output is a fully functional scripting-language module that should never have to be modified. Finally, SWIG is designed to be as transparent as a C compiler.
To illustrate the idea behind SWIG, consider the factorial function previously described. Rather than writing wrappers, you would write the SWIG interface file in Example 1. In this interface file, you specify the name of the extension module and list the ANSI C prototypes of functions that you would like to access.
To create the wrapper code, you run SWIG with an appropriate command-line option (-perl5, -python, -tcl, -guile, etc.). This produces a C file that needs to be compiled and linked with the rest of your C program into a shared library or DLL. Should you decide to use a different scripting language, you can easily run SWIG with a different command-line option and recompile the resulting wrapper code.
At this point, you can start using your C functions from the chosen scripting language; see Example 2. In the process of creating the scripting interface, SWIG turns C functions into scripting-language commands, C variables into scripting variables, and so on. You can view the resulting interface as an interpreted extension of the underlying C code; that is, you can interactively call C functions, write test scripts, prototype new features, and so forth.
The SWIG Pointer Model
The biggest obstacle in building scripting interfaces is the mapping of C datatypes to and from an appropriate scripting representation. For fundamental C datatypes (int, short, long, char, float, double, void, and so on), SWIG applies natural mapping that turns scripting integers into C integers, scripting floats into C floating points, and the like. However, most C programs have derived datatypes (such as arrays, pointers, and structures) all over the place.
To address this problem, SWIG allows C pointers to be managed directly from a scripting-language interface. For example, the SWIG interface in Example 3 could be used to access a few functions in the C stdio library (not particularly useful, but simple enough to illustrate the point). From the scripting language, these functions could then be used in a completely natural manner. In the Perl script in Example 4, for instance, $f1, $f2, and $buffer are all pointer values. If you printed them out, you would see values containing both an address and type signature (Example 5). The type signature is used to perform run-time checking of pointer values. When a mismatch occurs, an error message is generated as in Example 6.
While using pointers this way may seem dangerous, it has proven to be reliable and powerful in practice. Perhaps the greatest strength is that it avoids the problem of data representation entirely. SWIG does not force you to write code conforming to any standard. In addition, objects need not be precisely specified (as in CORBA IDL or a similar specification language).
Using pointers, you can work with incomplete information and essentially use any kind of C/C++ object. For example, the scripting language does not need to know the structure of a FILE to use it effectively. Many of SWIG's users work on large, data-intensive applications involving many gigabytes of memory. The pointer-based model is both simple and memory efficient -- avoiding data duplication and providing straightforward handling of underlying C data structures.
Objects
The pointer model is used throughout SWIG as a mechanism for managing complex objects. In some cases, however, you may want to provide greater support for classes and structures by providing methods for construction, destruction, and access to other members. Suppose you have a Stack class like Example 7. One method for accessing it would be to write accessor functions (see Listing Four). With these accessor functions, you could manipulate Stacks, as shown in Example 8.
Writing accessor functions is simple, but can be tedious if there are lots of classes and structures. Therefore, SWIG automatically generates accessor functions if it is given the definition of a C structure, C++ class, or Objective-C interface such as that shown in Example 7.
One twist on the use of accessor functions is to write container classes in the target scripting language (also known as a shadow class); see Listing Five. These classes provide a high-level interface to the underlying object and, in many cases, completely hide the low-level accessor functions. For example, using the container class in Listing Five, you could rewrite the Python script as in Example 9. This provides a more natural interface to objects, allowing them to be used in a "friendly" manner. Since the creation of container classes is relatively straightforward, SWIG can automatically generate them for defined classes (when enabled with a command-line option).
Objects can be manipulated at a number of levels. At the lowest level, the scripting language can pass opaque object pointers between functions. If you need access to the internals of an object, special accessor functions can be written for the relevant object members. If you want high-level access, SWIG can automatically generate accessor functions and scripting container classes to make objects work naturally. All of these modes have proven to be useful in practice. Depending on the application, you might use a mix of styles.
Advanced SWIG Features
SWIG includes a library of useful modules for manipulating various data types, building interfaces to common C libraries, handling pointer manipulation, and customizing. In many cases, the library files are language independent.
Library files are used by placing directives such as %include malloc.i in a SWIG interface file. In this case, wrappers would be created around the C memory allocation functions in the target scripting language. In other cases, language-specific library files are available. For example, %include wish.i would include Tcl-specific code needed to rebuild the wish shell (this code is not included when building non-Tcl extensions, however). The real power of the library is that it lets you reuse interfaces and build new packages by grabbing different pieces and gluing them together as needed.
Documentation files describing the resulting scripting interface can be generated in ASCII, HTML, and LaTeX formats. The documentation files use the same syntax rules as the target language, and C/C++ comments can be turned into descriptive text. While not intended to be a substitute for a full-blown documentation system, this approach makes it easier to figure out what SWIG has built into an interface. Given that many SWIG applications involve hundreds of C functions, this can be useful.
Even with no additional specification, SWIG will gladly make interfaces to all sorts of packages. However, many users want more control and flexibility. SWIG's typemap mechanism provides a means for changing SWIG's behavior.
For example, when SWIG sees void foo(double *vector), it makes the assumption that vector is a pointer. Of course, this by itself doesn't say much. Perhaps vector is meant to be a single input value or an array of values. Maybe it's a list of output values. Unfortunately, there's no way for SWIG to infer any meaning beyond the fact that it's a pointer. However, typemaps let you customize SWIG and provide special processing for certain datatypes; see Example 10.
Typemaps work by looking for specific type/name patterns in the interface file. In this case, whenever double *vector appears, SWIG applies special processing by inserting the C code provided in the typemap definition. In this process, the $source and $target symbols are replaced by the names of C variables containing the source and target values of a type conversion.
For improving the reliability of modules, SWIG can convert C++ exceptions and other kinds of errors into scripting-language exceptions. This is done with the %except directive; see Example 11. It is also possible to apply constraints and conditions on particular values using typemaps. For example, you can require a pointer to be nonNULL, a numerical value to be positive, and so on. Should one of these conditions be violated, a scripting error will occur. Such techniques are useful for improving the reliability of C extensions -- particularly those that weren't originally designed for a scripting interface.
Finally, all of the different SWIG scripting language modules are implemented as C++ classes. New target scripting languages can be added to the system by inheriting from a Language base class and implementing about a dozen virtual functions. At this writing, SWIG supports Perl4, Perl5, Python, Tcl, Tcl8, and Guile. Several users have developed other modules for the likes of Java and MatLab 5. In the future, SWIG may support even more target languages.
Limitations and Future Directions
While SWIG is being used successfully in a wide variety of applications, development is an ongoing effort. One limitation of the current system is that it is not a full C/C++ compiler. As a result, SWIG can be confused by complex C declarations or nonANSI syntax.
There is also minimal support for several C++ features such as function overloading, operator overloading, templates, and namespaces (although many of these limitations are being addressed in future releases). It is also unclear whether SWIG would be able to work directly with advanced C++ features such as expression templates or the STL (SWIG could be used with applications that used these features, but might not be able to provide a direct interface).
Certain applications may not work well in a scripting environment due to namespace clashes, use of nonportable features, inability to use shared libraries, and so forth. Finally, SWIG is primarily designed for use with preexisting C code. It is not always appropriate if you are writing Tk widgets or other kinds of highly specialized scripting-language extensions.
SWIG's future development is focused along a number of fronts. First, there are continuing efforts to improve and expand the SWIG library. I hope that the library will eventually provide substantial functionality and interfaces to a wide variety of common packages. Second, there are efforts to add new language modules to SWIG. Many of these efforts are aimed at simplifying the process of adding new modules. There is also considerable interest in extending the SWIG parser to allow access to Fortran and other interface definition formats such as CORBA IDL. Finally, I'm finding that SWIG can be used as a powerful tool for exploiting the cross-platform nature of scripting languages and allowing the construction of highly extensible C/C++ applications under UNIX, MacOS, and Windows NT.
Acknowledgment
I'd like to acknowledge everyone who has made SWIG possible by providing bug reports, suggesting improvements, and contributing to the distribution (unfortunately, there are too many to list here, but you know who you are). In particular, I'd like to acknowledge Michael Bell, David Brydon, Roger Burnham, Kevin Butler, Dominique Dumont, Simon Gibson, Mark Hammond, Soeren Henkel, Carolyn Johnston, Jonah Lee, Bharat Mediratta, Daniel Michelson, Chris Myers, Harald Singer, Peter Tinker, Mike Weiblen, and Jody Winston, for providing feedback and supplying descriptions of SWIG applications. I'd also like to thank Peter-Pike Sloan and Patrick Tullmann for their helpful comments and their review of this article. Finally, I'd like to acknowledge Peter Lomdahl at Los Alamos National Laboratory and Chris Johnson at the University of Utah for their continued support.
DDJ
Listing One
<b><i>(a) </i></b>#include "EXTERN.h" #include "perl.h" #include "XSUB.h" extern int fact(int); XS(_wrap_fact) { int _result; int _arg0; dXSARGS ; if (items != 1) croak("Usage: fact(int );"); _arg0 = (int )SvIV(ST(0)); _result = (int )fact(_arg0); ST(0) = sv_newmortal(); sv_setiv(ST(0),(IV) _result); XSRETURN(1); } XS(boot_example) { dXSARGS; char *file = __FILE__; newXS("example::fact", _wrap_fact, file); ST(0) = &sv_yes; XSRETURN(1); } </p> <b><i>(b)</i></b> # file : example.pm package example; require Exporter; require DynaLoader; @ISA = qw(Exporter DynaLoader); package example; bootstrap example; @EXPORT = qw( ); 1;
Listing Two
#include "Python.h"extern int fact(int); static PyObject *_wrap_fact(PyObject *self, PyObject *args) { PyObject * _resultobj; int _result; int _arg0; if(!PyArg_ParseTuple(args,"i:fact",&_arg0)) return NULL; _result = (int )fact(_arg0); _resultobj = Py_BuildValue("i",_result); return _resultobj; } static PyMethodDef exampleMethods[] = { { "fact", _wrap_fact, 1 }, { NULL, NULL } }; void initexample() { Py_InitModule("example", exampleMethods); }
Listing Three
#include <tcl.h>#include <string.h> #include <stdlib.h> </p> extern int fact(int ); static int _wrap_fact(ClientData clientData, Tcl_Interp *interp, int argc,char *argv[]) { int _result; int _arg0; if ((argc < 2) || (argc > 2)) { Tcl_SetResult(interp, "Wrong # args. fact { int } ",TCL_STATIC); return TCL_ERROR; } _arg0 = (int ) atol(argv[1]); _result = (int )fact(_arg0); sprintf(interp->result,"%ld", (long) _result); return TCL_OK; } /* Initialization function that gets called upon module loading */ int Example_Init(Tcl_Interp *interp) { if (interp == 0) return TCL_ERROR; Tcl_CreateCommand(interp, "fact", _wrap_fact, (ClientData) NULL, (Tcl_CmdDeleteProc *) NULL); return TCL_OK; }
Listing Four
%module stack%{ #include "stack.h" %} // These functions are inlined into SWIG's output and wrapped %inline %{ Stack *new_Stack() { return new Stack(); } void delete_Stack(Stack *s) { delete s; } void Stack_push(Stack *s, char *value) { s->push(value); } char *Stack_pop(Stack *s) { return s->pop(); } int Stack_depth(Stack *s) { return s->depth(); } %}
Listing Five
import stackclass Stack : def __init__(self): self.this = stack.new_Stack() def __del__(self): stack.delete_Stack(self.this) def push(self,arg0): stack.Stack_push(self.this,arg0) def pop(self): return stack.Stack_pop(self.this) def depth(self): return stack.Stack_depth(self.this) def __repr__(self): return "<C Stack instance>"
Copyright © 1998, Dr. Dobb's Journal