Singleton Creation the Thread-safe Way
Jonathan Ringle
Singletons avoid problems with order of construction, at the cost of more problems for multithreading.
Introduction
Recently, I was troubleshooting a problem in which some working code, for some reason, stopped working. No changes had been made to the code since the time the code was functioning. After stepping through the code, I discovered that it used a global object A, whose constructor depended upon global object B. Object A and object B were declared in different translation units, and C++ does not have a way to specify the order of construction of non-local static objects (i.e. global objects) across translation units. I found myself in the situation described by Scott Meyers in Effective C++, item 47, where he writes: "Ensure that non-local static objects are initialized before they're used." [1] So, when the compiler/linker decided by its own whim to change the order of construction, so that object A was now constructed before object B, the code stopped working. Object A's constructor was acting on a non-constructed object B.
I looked at object A and object B, and decided that both objects were prime candidates for the Singleton Pattern [2], and proceeded to wrap each with a singleton implementation. However, a feature of the singleton's global point of access is that the construction of the singleton object does not occur until first use. This presented me with a new problem, because now multiple threads could contend to execute the "first use" code. If the "first use" construction mechanism in the global point of access is not protected properly against the possibility of multiple threads, then it is possible for either multiple instances of the singleton to be created, or for a thread to act on a partially constructed singleton object.
The global point of access to the singleton is typically implemented through a static member function of the singleton, returning either a pointer or a reference to the singleton instance. My personal preference is for the Singleton::instance function to return a reference to the singleton object. The issue of returning a pointer or a reference is irrelevant to the multithreading issues I am presenting here.
I describe the context of the problem with a traditional implementation of the singleton in a multithreaded environment. The following code shows how a typical singleton might be coded:
class Singleton { public: static Singleton& instance(); protected: Singleton(); private: static Singleton* _instance; }; Singleton* Singleton::_instance = NULL; Singleton& Singleton::instance() { if(!_instance) // Race condition _instance = new Singleton; return *_instance; }
The problem with this approach is that it is not thread safe. In a multithreaded environment, it is possible for two threads of execution to enter the Singleton::instance method while _instance is still NULL, causing two instances of Singleton to be created. The following race condition is possible: Thread 1 enters Singleton::instance and evaluates if(!_instance) as true. Thread 1 then proceeds to allocate and construct a Singleton instance. Suppose that during the construction of Singleton Thread 1 is preempted by the operating system, and Thread 2 gets control. Thread 2 now enters Singleton::instance and evaluates if(!_instance) as true. Remember that Thread 1 has not yet assigned a value to _instance, because it hasn't yet finished the construction of Singleton. Thread 2 then proceeds to allocate and construct another instance of Singleton!
Testing Common Solutions
When you're faced with a problem, sometimes the solution doesn't come around until you've given your mind a rest and you've slept on it for a little while. The same is true when trying to debug multithreaded programs involving race conditions. "Sleeping" on the problem opens up the race condition timing window so the problem is much easier to reproduce. It is also subsequently easier to test the validity of a proposed solution.
With a little help from a Sleep statement in the Singleton constructor, the race condition is easy to duplicate. Also, with the addition of some state data in the Singleton which is initialized after the Sleep function it is possible to verify that a thread is acting on a fully constructed instance of the Singleton. The singleton can be exercised in a multithreaded manner by a short program as shown in test.cpp, Listing 1. This program simply spawns off a new thread immediately, and both the main thread and the created thread excercise the singleton with a call to Singleton::instance().show_state. The main thread waits for the spawned thread to terminate. (When a thread terminates it is commonly referred to as "joining" with its parent thread.) The main thread then executes Singleton::instance().show_state one more time and exits. The top of test.cpp, Listing 1, has a series of commented #include directives, one for each solution to be explored.
The first attempt, shown in try1.h, Listing 2, reflects the singleton implementation that was described earlier. Executing this code yields an output similar to the following:
main tid is 244 Singleton 0x00301F00 has been allocated by tid 244 thread2 tid is 313 Singleton 0x00301ED0 has been allocated by tid 313 Singleton 0x00301F00 -state initialized by tid 244 show_state(): this=0x00301F00 state=0xABCD tid=244 Singleton 0x00301ED0 -state initialized by tid 313 show_state(): this=0x00301ED0 state=0xABCD tid=313 show_state(): this=0x00301ED0 state=0xABCD tid=244
This output shows that both threads created an instance of Singleton. Clearly, this is not the intended behavior.
The Local Static Singleton Implementation
Another common implementation of Singleton::instance uses a local static object for the singleton instance variable [1].
Singleton& Singleton::instance() { static Singleton _instance; return _instance; }
This implementation relies on the fact that the local static variable _instance will be automatically constructed the first time Singleton::instance is invoked. The C++ Programming Language, 3rd Edition, Section 7.1.2, states the following:
If a local variable is declared static, a single, statically allocated object will be used to represent that variable in all calls of the function. It will be initialized only the first time the thread of execution reaches its definition [3].
At first glance, this implementation appears to be a viable solution, since the language specification will take care of the "first use" code implicitly. However, notice that the above excerpt says nothing about the behavior of the language in a multithreaded environment. In fact, as far as I am aware, the C++ specification is silent on multithreading issues altogether. That means it is up to the compiler implementers to decide what to do about local static objects in a multithreaded environment, if anything at all. Empirical testing with Visual C++ 6.0, using my little test-bed program, reveals that a method of a local static object may be invoked before the local static object is fully constructed! The following is the test program output using the Singleton::instance implementation shown in try2.h, Listing 3:
main tid is 201 Singleton 0x00436048 has been allocated by tid 201 thread2 tid is 310 show_state(): this=0x00436048 state=0x0000 tid=310 Singleton 0x00436048 -state initialized by tid 201 show_state(): this=0x00436048 state=0xABCD tid=201 show_state(): this=0x00436048 state=0xABCD tid=201
Because the behavior of a local static object is undefined in a multithreaded environment, I have rejected use of this feature as a viable solution.
Locking the Implementation Down
This brings me back to doing something about the original implementation of Singleton::instance. It is clear that the code in the original Singleton::instance must be invoked atomically and that a locking mechanism is required for it to be thread-safe.
I am a big fan of the "resource acquisition is initialization" C++ idiom to manage resources. Since mutexes and critical sections are resources that can be acquired and released, I have implemented the locking mechanism using resource management techniques. The code for the Critical_Section class and the Lock_Guard template class used appears in sync.h, Listing 4. The release of the Critical_Section object is guaranteed and exception-safe when the local Lock_Guard object goes out of scope. (For more information on resource management, point your browser to http://www.relisoft.com/resource/index.htm and read Bartosz Milewski's excellent tutorial on the subject [4].)
In try3.h, Listing 5, a static Critical_Section object is added to the Singleton class definition to provide the locking resource. The pseudo-code for the resulting Singleton::instance implementation is as follows:
{ acquire the mutex lock if(_instance == NULL) { allocate and create _instance } release the mutex lock return _instance }
Running the try3.h test program shows that the code is functionally correct and thread-safe:
main tid is 276 Singleton 0x00301E50 has been allocated by tid 276 thread2 tid is 299 Singleton 0x00301E50 -state initialized by tid 276 show_state(): this=0x00301E50 state=0xABCD tid=299 show_state(): this=0x00301E50 state=0xABCD tid=276 show_state(): this=0x00301E50 state=0xABCD tid=276
Only one instance of Singleton is created, and function Singleton::show_state always acts on a fully constructed object. However, the locking mechanism costs a great deal in the form of a performance penalty every time Singleton::instance is invoked. This penalty is especially severe considering that the locking is required only when it is time to construct a new singleton.
One erroneous optimization attempt is to move the check to the outer layer, as follows:
Singleton& Singleton::instance() { if(!_instance) // Race condition here { Lock_Guard<Critical_Section> gate(_key); _instance = new Singleton; } return *_instance; }
The above solution only reintroduces the original race condition.
The Double-Checked Locking Pattern
The Double-Checked Locking Pattern [5] provides an elegant solution to providing synchronization for a critical section of code that is invoked infrequently. In this case, the ICS (infrequent critical section) of code is the allocation and creation of the Singleton instance. This critical code needs to run only once for the lifetime of the Singleton. The pattern is probably best described by the following pseudo-code:
{ if(flag == false) // Lock-hint test (Race condition exists) { acquire the mutex lock if(flag == false) // Double-check test (Race resolved) { execute infrequent critical section of code flag = true } release the mutex lock } }
The idea here is that a cheap "lock-hint" test is used to test for the abnormal first-use case. If the flag is false, then an expensive locking mechanism is used. The thread that enters first acquires the mutex and proceeds to the double-check test. Since the state of flag has not changed, the thread proceeds to execute the ICS. Any other thread that comes in and evaluates the cheap "lock-hint" test during this time will block trying to acquire the mutex lock. By the time the thread acquires the mutex, the flag will have been set to true by the first thread and will fail the double-check test, resolving the race condition. Refer to solution.h, Listing 6, for the implementation that makes use of the double-checked locking pattern.
Using the double-checked locking pattern in a singleton results in a thread-safe implementation that does not impose the performance penalty of a critical section lock for the nominal execution of Singleton::instance. In fact, once _instance has been allocated and constructed, the invocation of Singleton::instance has the same performance as the original non-thread-safe solution.
What About Singleton Destruction?
As it turns out, the Singleton implementation I arrived at requires some different magic at the time of the singleton's destruction. That is the subject of another discussion. Ultimately, since most singletons face the same challenges, the singleton cries out for a templatized solution. I have written a template solution that takes care of the double-checked locking discussed here and also manages the destruction of singletons. The code is available on the CUJ ftp site perhaps one day I will have the opportunity to describe it in detail. [Elsewhere in this issue, see the article, "Controlling the Destruction Order of Singleton Objects," by Evgeniy Gabrilovich. It does provide a templatized method for ensuring the correct destruction order of singleton objects. However, it is not guaranteed to be thread safe. mb]
Closing Remarks
Before I ran across this problem, I knew that multithreaded programming was difficult. Now I realize it is even more difficult than I had first thought! I gained a new respect for multithreaded programming when I learned that even language features have to be scrutinized in the context of multithreading.
References
[1] Scott Meyers. Effective C++: 50 Specific Ways to Improve Your Programs and Designs (Addison-Wesley Longman, 1998), pp. 219-223.
[2] Erich Gamma, et al. Design Patterns: Elements of Reusable Object-Oriented Software (Addison-Wesley, 1995), pp. 127-134
[3] Bjarne Stroustrup. The C++ Programming Language, 3rd Edition (Addison-Wesley, 1997), p. 145
[4] Bartosz Milewski, The Official Resource Management Page, http://www.relisoft.com/resource/index.htm.
[5] Douglas Schmidt and Tim Harrison. "Double-Checked Locking: An Optimization Pattern for Efficiently Initializing and Accessing Thread-safe Objects," http://www.cs.wustl.edu/~schmidt/DC-Locking.ps.gz. (Most of the papers available at Doug's web site are in Postscript format. If you don't have a way to read or print Postscript files, visit http://www.ghostscript.com and obtain the latest version of Ghost Script. It is available on a variety of platforms.) o
Jonathan Ringle is a software engineer with over 15 years of programming experience, and eight years experience in writing Computer Telephony products for companies both in the United States and internationally. He is currently working for Key Voice Technologies in Sarasota, Florida, on their Unified Messaging product. He can be reached by email at [email protected].