Frameworks and Lock Hierarchies
It is a curious thing that major frameworks that supply mutexes and locks do nothing to offer any direct support for lock hierarchies. Everyone is taught that lock hierarchies are a best practice, but then are generally told to go roll their own.
The frameworks vendors will undoubtedly fix this little embarrassment in the future, but for now, here's a useful recipe to follow as you do roll your own level-aware mutex wrapper. You can adapt this simple sketch to your project's specific needs (for example, to suit details such as whether your lock
operation is a method or a separate class):
- Write a wrapper around each of your favorite language- or platform-specific mutex types, and let the wrapper's constructor(s) take a level number parameter that it saves in a
myLevel
member. Use these wrappers everywhere. (Where practical, save time by making the wrapper genericas a C++ template, or a Java or .NET genericso that it can be instantiated to wrap arbitrary mutex types that have similar lock/unlock features. You might only have to write it once.) - Give the wrapper class a thread-local static variable called
currentLevel
, initialized to a value higher than any valid lock level. - In the wrapper's
lock
method (or similar), assert thatcurrentLevel
is greater thanmyLevel
, the level of the mutex that you're about to try to acquire. Remember, if the previous value ofcurrentLevel
is using another member variable, then setcurrentLevel = myLevel
; and acquire the lock. - In the wrapper's
unlock
method (or similar), restore the previous value ofcurrentLevel
. - As needed, also wrap other necessary methods you need to be able to use, such as
try_lock
. Any of these methods that might try to acquire the lock should do the same things aslock
does. - Finally, write a "lock-multiple" method
lock( m1, m2, ... )
that takes a variable number of lockable objects, asserts that they are all at the same level, and locks them in their address order (or their GUID order, or some other globally consistent order).
The reason for using assertions in the lock
methods is so that, in a debug build, we force any errors to be exposed the first time we execute the code path that violates the lock hierarchy rules. That way, we can expect to find violations at test time and have high confidence that the program is deadlock-free based on code path coverage. Enabling such deterministic test-time failures is a great improvement over the way concurrency errors usually manifest, namely as nondeterministic runtime failures that can't be thoroughly tested using code path coverage alone. But often our test-time code path coverage isn't complete, either because it's impossible to cover all possible code path combinations or because we might forget a few cases; so prefer to also perform the tests in release builds, recording violations in a log or diagnostic dump that you can review later if a problem does occur.