Eli is a software developer, and Chuck the chief architect at Borland Software. They can be contacted at [email protected] and [email protected], respectively.
When introduced in the early 1990s, Borland's Delphi application development environment targeted Windows only. With Kylix, however, Borland (the company we work for) has brought the Delphi toolset and environment to Linux. As with Delphi, Object Pascal is at the heart of Kylix. Consequently, in bringing Delphi to Linux we first had to port Object Pascal and all of its language features. And, as it turns out, one of the most challenging features to port involved exception handling.
Language Specifics
Object Pascal sports a nonresumptive exception-handling model specifically designed to be efficient enough to be used as the basis of error handling for a variety of applications. The syntax of exception handling is straightforward, and the semantics are kept in control because the language is comparatively well designed, and exceptions are restricted (they can only be thrown by reference and must be derived from TObject).
There are two methods of raising a language exception and three ways of catching exceptions. As illustrated in Example 1 (where the value of <expression> must be a class instance derived from TObject), the first method of raising an exception is straightforward. The second method of raising an exception, however, is somewhat more complex (and we'll examine it shortly).
The three forms used to catch exceptions have different semantics and usage.
- The first and simplest is try/except, as in Example 2(a). The try part of a try/except statement specifies the start of a block of one or more statements to be protected. The except part specifies the start of a block of one or more statements to be executed in the event an exception occurred while executing the statements in the try part. Within the except block, users can elect to use the second method of raising an exception, which is simply using raise without any expression following it. This causes the current exception to be reraised from the current execution point.
- A more specialized form of try/except is try/except/on; see Example 2(b), where the statements in the except block only execute if the exception being raised is of the type TMyException, or derived from there. If the except block is executed, the variable E is bound to the exception object for the duration of the exception block.
- The last form for handling exceptions is the more specialized try/finally; see Example 2(c). try/finally differs from the other forms of exception-handling statements in that the statements in the finally clause are always executed, even if no exception is thrown while executing the try block. This is useful for writing cleanup code that must always be executed for a function, to free resources, or to bring data structures back to sanity from a partial dissection.
Delphi Implementation
The Delphi implementation of exception handling is, in many ways, simpler than the Kylix one. The overriding contributing factor here is the fact that Windows has a built-in mechanism for dispatching and unwinding exceptions.
Delphi implements exception support via a stack-based chain of exception handlers, threaded through a global, per-thread list that is accessible by both the operating system and the client application. (For a more detailed explanation of the Window's implementation, see http://www.borland.com/borlandcpp/papers/cppexcp/cppexcp1.html.)
There are several advantages to the Windows scheme. In the first place, everyone uses the same method for dispatching exceptions, including the operating system when hardware exceptions such as access violations occur. This enables nearly seamless behavior in the implementation of support for hardware exceptions, as opposed to language exceptions. The global list of handlers, threaded through the FS segment register, makes it efficient to dispatch exceptions. It is also easy to perform stack unwinding because the records on the stack contain the information needed to reset the stack and frame pointers to the proper offsets for a given function in the event of exceptions. There is no need to inspect activation records of functions that do not have exception-handling constructs. Finally, the run-time installation of handlers laid out on the stack makes it easy for any language to participate in exception handling, including assembler code.
The one large disadvantage of the Windows scheme is the impact on performance of the nonexception case. Any function that contains exception-handling constructs must install a frame on the stack, and thread the entry onto the per-thread list of handlers even if no exception is raised during the execution of the function. By their nature, exceptions are intended to be uncommon in nature, not the normal case during the average execution of a function body. Therefore, the installation of the exception frame is a burden on the normal execution thread of most functions. This impact is more sorely felt by frequently called small functions. An added negative is the impact that the installation of the exception frame has on the optimizer. The compiler is hampered by the need to install this handler in the optimization of many functions, most notably small functions.
Linux Implementation
The Linux method for dealing with exceptions is radically different from Windows. For instance, there is no OS-defined mechanism for dispatching and unwinding exceptions. Language vendors are on their own with respect to how to deal with exceptions. Most vendors use a PC-mapped exception-handling scheme on this platform. In a PC-mapped scheme, the compiler and linker collude to emit a static map of address ranges for functions, and for blocks of code that are wrapped with exception handlers, and for their handlers. In addition, the compiler generates static data describing each function's stack layout to allow a run-time system to unwind the stack past an invocation of a given function. An external run-time library is provided by the language implementation to handle the dispatching and unwinding of exceptions.
The run-time library unwinds an exception by looking at the address from which the exception is raised, then looking that up in the PC map generated by the linker to identify which function this came from. It uses this to gain access to the per-function descriptors about the stack layout, and for the per-function ranges of potential handlers in the function for the exception. The former are used to unwind the stack to the next caller, if need be, while the latter are used to evaluate whether this function had any protected code that needs to have its exception handlers called in the unwind process.
This scheme has a couple of significant advantages over the Windows scheme. For one thing, there is zero impact on the nonexception execution case because all the data that is emitted for dealing with exceptions is purely static data. Additionally, the absence of run-time stack frames for exceptions tends to make the optimizer's job easier for dealing with exception handling. Many people feel that zero impact in the nonexception case is absolutely necessary if exception handling is to be adopted as a normal tool in software development.
The disadvantages of the PC-mapped implementation are numerous, though they are mostly restricted to the language implementations. In some cases, they can be mitigated through standards, which we will discuss shortly.
It is much harder on the compiler to participate in a PC-mapped scheme. The compiler front- and back-end require significant new interaction to be able to emit the data required by the run-time library for unwinding stack frames. The linker has to be involved as well. Care must be taken to ensure that the static data generated by the compiler is position independent, or prohibitive load-time hits can occur due to relocating of exception-only data at application startup. Stack unwinding is much slower because the static information for functions must be found and interpreted in order to unwind function frames, even for functions that do not contain exception-handling constructs. It is not easy for many languages to participate in the exception-handling process since there is no way to install an exception handler dynamically. Most notably, assembly code suffers from this. It becomes necessary to add language constructs, which have implications all through the compiler to support any sort of exception handling within a language. There is currently no operating-system specification for dealing with exception handling under Linux, which makes it difficult for languages to interoperate with each other when exceptions are being unwound past frames from two or more intermixed languages. Finally, hardware exceptions don't mix easily with language exceptions. Signal handlers under Linux are global, and there are some interesting issues surrounding dispatching certain signals (SIGINT, for instance) to threads.
In short, these issues make it difficult but not impossible to implement a reasonable exception-handling scheme under Linux.
Delphi Specifics: Linux
For Kylix, we decided to implement a PC-mapped scheme. We couldn't do this under Windows, because the operating system had a standard dispatching mechanism, and we needed to be able to interoperate with multiple languages. The scheme we use has two levels, the first taken largely from some C++ Application Binary Interface (ABI) discussions, and the second entirely specific to the Kylix implementation.
The basic flow of control in the unwinding process for an exception is as follows: The exception is raised to the run-time library (RTL). The RTL then does a two-phase walk up the stack. In the first phase (the search phase), it looks for exception handlers interested in dealing with the exception. Once it finds a handler that is interested, the RTL enters the second phase walk (the unwind phase). This phase tells any handlers between the current stack frame and target handler stack frame that we are unwinding, and they should perform whatever cleanup they deem necessary. The basics of this part of the unwinding mechanism can be factored out of practically all languages that support exception handling, and so are encapsulated in the first level of exception-handling support. The second level is where things get specific to an implementation. For the most part, this includes specifics of the language semantics what sort of cleanup needs to be done during the unwind phase, and how to interpret the exception itself. For the rest of this discussion, we will refer to the library that handles all the mechanics of the first layer of control as the "unwinder," and the second, language-dependent layer as the "RTL."
For Kylix, we implemented the first level of control by taking the current incarnation of the C++ ABI discussions for IA-64, then altered it slightly to deal with some issues specific to the IA-32 platform. Specifically, the current Intel IA-64 specifications have completely covered both the linker-generated PC maps (including their location) and the format of the stack unwind information generated by the compiler. There is no such industry-wide specification for the IA-32, and we didn't feel it was to anyone's benefit at this point to attempt introduction of such a standard. Therefore, we changed the semantics of the implementation slightly in one place, and added a couple of APIs to specify information about the linker-generated PC map.
To raise an exception, the language calls a routine in the unwind library, passing in a reference to an exception object of arbitrary, implementation-defined format: RaiseException(Exception). This routine implements the search and unwind phases of exception handling. It captures the current processor context (registers and the like), and uses the caller's address as a starting point to do the unwind. The caller's address is looked up in a global map of function address ranges created by the linker. Associated with this entry in the map is a personality routine specific to a language, implemented in the RTL: UnwindResult Personality(Exception, Phase, Context). The unwind library calls this routine with the exception, the phase that we're in (search/unwind), and the processor context. The personality routine is expected to return whether to continue the search, or to stop searching, unwind the stack, and execute a handler routine. In addition, the RTL is expected to update the processor context to contain values that reflect the state of the processor prior to the call of the function in question. To put it differently, it has to unwind the current stack frame, as indicated by the processor context. It is this last bit of the task where the compiler really has to get its hands dirty.
Crawling the stack under a PC-mapped system can be a tricky task. The simple case, when a simple function has a stack frame, is easy. However, you don't want to burden functions with stack frames that's the whole point of PC mapping. When the optimizer gets its teeth into a function and that function's activation record is on the stack between the place where the exception has been raised and where it must be caught, then things get dicey. To allow the RTL to unwind the frame on something other than a geological time scale, the compiler has to supply information to assist in the unwinding.
Basically, what is required from the compiler is information about how a given function modifies the stack throughout its execution, and where certain registers have been saved on the stack. Given an address within a function's code, the RTL must be able to find out what stack modifications have occurred up to that point in the execution in the function. It uses that information so that it can reverse those modifications, and calculate a stack pointer value at which it can find the return address into the caller. Given that return address, the unwinder continues the process, crawling its way up the stack of callers. This process must not miss any function calls, because you might miss routines whose handlers needed to be called. Nor may it fail because of lack of information.
Example 3(a) is a simple function, and 3(b) its pseudogenerated code. It's not what would really be generated, but is useful for discussion. In this example, suppose that an exception happens while you're inside of G. When the unwinder unwinds the stack, it finds a return address pointing to label1. The RTL is asked to unwind this frame, and needs to know that the push instruction (just before the call) modified the stack. So it takes the offset of label1 relative to the beginning of the function, and feeds it into the information generated by the compiler to determine how much the stack has been modified by. It will get back the result of 4 bytes, add that to the stack pointer in the context provided by the unwinder, pull the return address of f off the stack at that location, and set that into the context as the next place the unwinder needs to go. It also modifies the stack pointer in the context by an additional 4 bytes for the size of the return address, then returns to the unwinder. At this point, the stack frame for the invocation of f has been completely erased.
Besides the information regarding the unwinding of the stack frame itself, the compiler also emits language-specific information about the frame; see Example 4.
To make the language semantics for Object Pascal exception handling work, the compiler has to emit code ranges for the try clause, along with type information for the variable E. During the unwinder's search phase, the RTL's personality routine has to look up the current instruction pointer in the function to see if it falls into the range of the try block. If so, then we want the exception; otherwise the search phase continues. During the unwind phase, once the unwinder has noted that this function wishes to handle the exception, the RTL's personality routine has to be able to find the offset of the code fragment in the function to transfer control to when we have finished unwinding the stack. It is important that this information the unwind information, language-specific information of code ranges, and type information be position independent. Under Linux, having pointers in this information is a significant liability, since relocations are applied at loadtime of the image to the entire image. Because the unwind information is generally proportional in size to the size of the generated code, this can introduce terrible startup time issues, as the application loader traverses and touches this data at startup time if it contains relocations.
Special Considerations
The compiler handles all the unwind information for code that it generates for Object Pascal. It also has the burden of dealing with another language assembler. Since Object Pascal supports inline-assembler code and the Kylix RTL has a significant amount of assembler code in it it was necessary to make the compiler support the generation of unwind information for user-supplied assembly code. This is not the easiest thing to accomplish, and has its limitations.
Basically, the compiler looks at the instructions generated for a block of code, and notes which ones modify the stack pointer, and by how much. Given a particular offset into the function, you need to know by how much the stack pointer has been modified. Why? So that if you have to unwind the entire stack frame, you can use that offset to calculate the inverse change necessary to clean up the stack, and get back to the instruction pointer that is the return address. This method introduces some restrictions. First of all, the mechanism can only record static modifications to the stack. You do not track the case where modifications are made to the stack pointer based on run-time values that are passed into a function. For instance, if you use a parameter passed into a function and subtract it from ESP to get a new stack pointer, you won't be able to properly clean up the stack, if there is no stack frame; see Example 5. If G raises an exception, then you won't be able to unwind properly, because there is no way to make a static determination of how much ESP was modified during the course of execution of the function.
If there is a stack frame, then things are easier. You can just use the EBP register to throw away the stack frame, and continue the unwind. The compiler automatically detects the presence of a standard form of stack frame in inline-assembler code, then notes it, greatly reducing the size of the unwind information. So if you are not overly concerned about the efficiency of the prologue of your function, and your inline assembly function might raise an exception, then it's a good idea to put in a standard stack frame to limit the size of the unwind information. In particular, if you want to violate any of these rules (where the stack is being modified dynamically in your inline-assembly routine), you must use a standard stack frame.
Signal Handling
Signal handling under Linux is tricky. While there are global handlers, the semantics under which they are called for different threads is unusual. For signals that are raised by a particular thread, the signal is generally dispatched to that thread only. For signals that are raised externally (users pressing Control-C, for example), the signal is dispatched to all threads.
Delphi supports handling of hardware/platform-specific exceptions as language exceptions in a seamless fashion. To accomplish this under Linux, we installed signal handlers for a range of hardware exceptions, and wrapped the resulting signal-specific information into a Kylix class instance, which is then thrown via the normal language implementation. The details of the special cases (and the artful dodging we have to do to deal with the nuances of Linux) are beyond the scope of this article. By way of example, the contents of the stack between the user code that caused the signal and the signal handler itself are generally filled by frames with return addresses that are in code that participates in no PC-mapping system whatsoever, and therefore, are impossible to unwind through.
Separation of Unwinder and RTL
From a design standpoint, we kept the unwinder and RTL separate, but we also separated the implementations. The unwinder is implemented in C and distributed in two forms.
- In the default form, it is bound statically into Kylix applications.
- In the shared form, it is bound as a shared object.
The sources used to build the unwinder library we shipped in Kylix are available in the distribution or from the FreeCLX projext (http://freeclx.sourceforge.net/). The sources that implement the interfaces that we modeled after the C++ ABI discussions are under LGPL, in a different license format from the rest of the Kylix source code. It is our hope that we can use this as a basis for a system-level unwind library, shared by all applications and languages not just Kylix. We will encourage other tool vendors to adopt this. For the time, we've kept the namespace controlled so that we won't step on people's toes. If the rest of the community is willing to adopt this unwinder, we will change the namespace over to one agreed on by everyone, and switch our RTL to use the resulting unwinder. Standardizing on the first level stack unwinder will solve a number of interoperability issues that we have with other languages under Linux, and will make things easier for other language vendors in general.
Conclusion
Exceptions are an integral part of the implementation of Delphi and we strongly believe in their use as a means of error recovery. You can expect Borland to improve the implementation of exception handling in all our languages on all platforms in the future, and we will continue to share information with the community, in hopes of furthering interoperability.
DDJ