A Solution: Lock Hierarchies and Layering
The idea of a lock hierarchy is to assign a numeric level to every mutex in the system, and then consistently follow two simple rules:
- Rule 1: While holding a lock on a mutex at level
N
, you may only acquire new locks on mutexes at lower levels<N
. - Rule 2: Multiple locks at the same level must be acquired at the same time, which means we need a "lock-multiple" operation such as
lock( mut1, mut2, mut3, ... ).
This operation internally has the smarts to make sure it always takes the requested locks in some consistent global order. [1] Note that any consistent order will do; for example, one typical strategy is to acquire mutexes at the same level in increasing address order.
If the entire program follows these rules, then there can be no deadlock among the mutex acquire operations, because no two pieces of code can ever try to acquire two mutexes a
and b
in opposite orders: Either a
and b
are at different levels and so the one at the higher level must be taken first; or else they are at the same level and they must be requested at the same time, and the system will automatically acquire them in the same order. The two simple rules have provided a convenient and understandable way to conveniently express a total order on all locking performed in the system.
But where do we find the levels? The answer is: You probably already have them. Mutexes protect data, and the data is already in layers.
Lock levels should directly leverage and mirror the layering already in place in the modular structure of your application. Figure 1 illustrates a typical example of layering (or "hierarchical decomposition" and "into a directed acyclic graph," if you prefer five-dollar words), a time-tested technique to control the dependencies in your software. The idea is to group your code into modules and the modules into layers, where code at a given layer can only call code at the same or lower layers, and should avoid calling upward into higher layers.
Figure 1: Sample module/layer decomposition.
If that sounds a lot like the Two Rules of lock hierarchies, that's no coincidence. After all, both the layering and the mutexes are driven by the same goal: to protect and control access to the encapsulated data that is owned by each piece of code, and to keep it free from corruption by maintaining its invariants correctly. As in Figure 1, the levels you assign to mutexes will normally closely follow the levels in your program's layered structure. A direct consequence of Rule 1 is that locks held on mutexes at lower levels have a shorter duration than locks held at higher levels; this is just what we expect of calls into code at lower layers of a layered software system.
Software can't always be perfectly layered, but exceptions should be rare. After all, if you can't define such layers, it means that there is a cycle among the modules somewhere that includes code in what should be a lower level subsystem calling into higher level code somewhere, such as via a callback, and you have the potential for reentrancy even in single-threaded code. And remember, reentrancy is a form of concurrency, so the program can observe corrupt state even in single-threaded code. If higher level code is in the middle of taking the system from one valid state to another, thus temporarily breaking some invariant, and calls into lower level code, the trouble is that if that call could ultimately call back into the higher level code it might see the broken invariant. Layering helps to solve this single-threaded concurrency problem for the same reasons it helps to solve the more general multithreaded version.