In my most recent articles [1, 2], I presented reasons why concurrency (for example, multithreading) will be the next revolution in the way we develop software a sea change of the same order as the object-oriented revolution. I also stated that "the vast majority of programmers today don't grok concurrency, just as the vast majority of programmers 15 years ago didn't yet grok objects."
In this column, I'd like to consider just one question that several people wrote to ask, namely: "Is concurrency really that hard?" In particular, a few readers felt that lock-based programming is well understood; it is, after all, the status quo mainstream solution to concurrency control.
So, "is concurrency really that hard?" My short answer is this:
- Lock-based programming, our status quo, is difficult for experts to get right. Worse, it is also fundamentally flawed for building large programs. This article focuses exclusively on lock-based programming just because there's so much to say in even a 50,000-foot overview that I ran out of room.
- Lock-free programming is difficult for gurus to get right. I'll save this for another time, but if you're interested, you should check out Andrei Alexandrescu's recent articles for a sampling of the issues in lock-free programming and hazard pointers [3, 4]. (Aside: Yes, I'm implying Andrei is a guru. I hope he doesn't mind my outing him in public like this. I don't think it was much of a secret.) (More aside: The hazard pointer work shows in particular why, if you're writing lock-free data structures, you really really want garbage collection. You can do it yourself without garbage collection, but it's like working with knives that are sharp on both edges and don't have handles. But that's another article. Specifically, it's Andrei's other article.)
Unfortunately, today's reality is that only thoughtful experts can write explicitly concurrent programs that are correct and efficient. This is because today's programming models for concurrency are subtle, intricate, and fraught with pitfalls that easily (and frequently) result in unforeseen races (i.e., program corruption) deadlocks (i.e., program lockup) and performance cliffs (e.g., priority inversion, convoying, and sometimes complete loss of parallelism and/or even worse performance than a single-threaded program). And even when a correct and efficient concurrent program is written, it takes great care to maintain it's usually brittle and difficult to maintain correctly because current programming models set a very high bar of expertise required to reason reliably about the operation of concurrent programs, so that apparently innocuous changes to a working concurrent program can (and commonly do, in practice) render it entirely or intermittently nonworking in unintended and unexpected ways. Because getting it right and keeping it right is so difficult, at many major software companies there is a veritable priesthood of gurus who write and maintain the core concurrent code.
Some people think I'm overstating this, so let me amplify. In this article, I'll focus on just the narrow question of how to write a lock-based program correctly, meaning that it works (avoids data corruption) and doesn't hang (avoids deadlock and livelock). That's pretty much the minimum requirement to write a program that runs at all.
Question: Is the following code thread-safe? If it is, why is it safe? If it isn't always, under what conditions is it thread-safe?
T Add( T& a, T& b ) { T result; // ... read a and b and set result to // proper values ... return result; }
There are a lot of possibilities here. Let's consider some of the major ones.
Lock-Based Solutions?
Assume that reading a T
object isn't an atomic operation. Then, if a
and/or b
are accessible from another thread, we have a classic race condition: While we are reading the values of a
and/or b
, some other thread might be changing those objects, resulting in blowing up the program (if you're lucky, say, by causing the object to follow an internal pointer some other thread just deleted) or reading corrupt values.
How would you solve that? Please stop and think about it before reading on...
Ready? Okay: Now please stop a little longer and think about your solution some more, and consider whether it might have any further holes, before reading on...
Now that you've thought about it deeply, let's consider some alternatives.
Today's typical lock-based approach is to acquire locks so that uses of a
and b
on one thread won't interleave. Typically, this is done by acquiring a lock on an explicit synchronization object (a mutex, for instance) that covers both objects, or by acquiring locks on an implicit mutex associated with the objects themselves. To acquire a lock that covers both objects, Add
has to know what that lock is, either because a
and b
and their lock are globals (but then why pass a
and b
as parameters?) or because the caller acquires the lock outside of Add
(which is usually preferable). To acquire a lock on each object individually, we could write:
T SharedAdd( T& a, T& b ) { T result; lock locka( a ); // lock is a helper // whose constructor acquires a lock lock lockb( b ); // and whose // destructor releases the lock // ... read a and b and set result to // proper values ... return result; } // release the locks
I renamed the function because probably most T
objects are never shared and we don't want to incur the cost of the lock on most Add
operations.
Aside: In some cases, we might even be more efficient and have a reader-writer lock that allows multiple concurrent readers, but to do that we have to guarantee at least that the readers won't physically change the object, including any internal data indirectly reachable by the object; this is a much stronger requirement than just plain old const
because const
is logical and shallow, not physical and deep.
Is this enough to make the function thread-safe? No. For one thing, there's no way to guarantee that all code that uses a
and b
will respect the lock. There has to be an enforced discipline to ensure everyone plays by the locking rules to take a lock (and it has to be the same lock) before touching a
or b
.
So let's assume you have a nice baseball bat (or Nerf gun) and can go around your office and force everyone on your team to play by the rules, so that nobody ever forgets to acquire the right locks when touching a
or b
. Now are we thread-safe? No, for a completely different reason: SharedAdd
as written above is a recipe for deadlock. For a simple example, consider the following code that calls SharedAdd
:
T global1, global2; // on thread A SharedAdd( global1, global2 ); // on thread B SharedAdd( global2, global1 );
One fine day, thread A
acquires the lock on global1
, gets interrupted by (or runs concurrently with) thread B
, which acquires the lock on global2
, then both wait forever for each other's locks. Not a great place to be.
I'm sure a good number of readers saw this one coming; good for you. The next question is, how would you fix this? One approach would be to recognize that we're okay as long as the locks are always acquired in the same order, and smart folks like Andrei have published helper types that make it easier to automate that. Let's roll our own: Let's have SharedAdd
harden itself to this by consistently taking the locks in a given global order, say by the objects' addresses, and we'll even be smart and remember that the Standard doesn't allow direct comparisons between pointers to arbitrary objects but does allow them using the std::less
library helper:
T SharedAdd( T& a, T& b ) { T result; lock lock1( less<T*>(&a,&b) ? a : b ); lock lock2( less<T*>(&a,&b) ? b : a ); // ... read a and b and set result to // proper values ... return result; } // release the locks
Now the problematic code we showed earlier is fine, because regardless of the order that global1
and global2
are passed in, we will always lock the one having the lowest address first, and so the deadlock cannot happen...right?
True, it cannot happen in that example. But any other code in the world might have acquired a lock on whichever one has the higher address, then called SharedAdd
. Ouch. And it could be code we don't control, such as third-party libraries we only have binaries to. (And it could be code that hasn't been written yet and won't exist until a few years from now, such as new features or new overrides of virtual functions. More on this in a moment.)