Consequences: What Is "Unknown Code"?
It's one thing to say "avoid calling unknown code while holding a lock" or while inside a similar kind of critical section. It's another to do it, because there are so many ways to get into "someone else's code." Let's consider a few. While inside a critical section, including while holding a lock:
- Avoid calling library functions. A library function is the classic case of "someone else's code." Unless the library function is documented to not take any locks, deadlock problems can arise.
- Avoid calling plug-ins. Clearly, a plug-in is "someone else's code."
- Avoid calling other callbacks, function pointers, functors, delegates, and so on. C function pointers, C++ functors, C# delegates, and the like can also fairly obviously lead to "someone else's code." Sometimes, you know that a function pointer, functor, or delegate will always lead to your own code, and calling it is safe; but if you don't know that for certain, avoid calling it from inside a critical section.
- Avoid calling virtual methods. This may be less obvious and quite surprising, even Draconian; after all, virtual functions are common and pervasive. But every virtual function is an extensibility point that can lead to executing code that doesn't even exist today. Unless you control all derived classes (for example, the virtual function is internal to your module and not exposed for others to derive from), or you can somehow enforce or document that overrides must not take locks, deadlock problems can arise if it is called while inside a critical section.
- Avoid calling generic methods, or methods on a generic type. When writing a C++ template or a Java or .NET generic, we have yet another way to parameterize and intentionally let "someone else's code" be plugged into our own. Remember that anything you call on a generic type or method can be provided by anyone, and unless you know and control all the types with which your template or generic can be instantiated, avoid calling something generic from within a critical section.
Some of these restrictions may be obvious to you; others may be surprising at first.
Avoidance: Noncritical Calls
So you want to remove a call to unknown code from a critical section. But how? What can you do? Four options are: (a) move the call out of the critical section if you didn't need the exclusion anyway; (b) make copies of data inside the critical section and later pass the copies; (c) reduce the granularity or power of the critical section being held at the point of the call; or (d) instruct the callee sternly and hope for the best.
We can apply the first approach directly to Example 2. There is no reason the plugin needs to call browser.CountHiddenElements() while holding its internal lock. That call should simply be moved to before or after the critical section.
The second approach is to pass copies of data, which solves the correctness problem at the expense of space and performance. Variants of this approach include passing a subset of the data, and passing the copies via messages to run the callee asynchronously.
To improve Example 1, for instance, it might be appropriate to change the RenderElements method to hold the lock only long enough to take copies of the necessary shared information in a local container, then doing processing outside the lock, passing the copied elements. (This could be inappropriate if the data is very expensive to copy, or the callee needs to work on the real data.) Alternatively, perhaps the callee doesn't really need all the information it gets from being given direct access to the protected object, and it would be both sufficient and efficient to pass copies of just those parts of the data the callee does need.
The third option is to reduce the power or granularity of the critical section, which implicitly trades off ease-of-use because making your synchronization finer-tuned and/or finer-grained also makes it harder to code correctly. One example of reducing the power of the critical section is to replace a mutex with a reader-writer mutex so that multiple concurrent readers are allowed; if the only deadlocks could arise among threads that are only performing reads of the protected resources, then this can be a valid solution by enabling the use of a read-only lock instead of a read-write lock. And an example of making the critical section finer-grained is to replace a single mutex protecting a large data structure with mutexes protecting parts of the structure; if the only deadlocks possible are among threads that use different parts of the structure, then this can be a valid solution (Example 1 is not such a case).
The fourth option is to tell the callee not to block, which trades off enforceability. In particular, if you have the power to impose requirements on the callee (as you do with plug-ins to your software, but not with simple calls into existing third-party libraries), then you can require them to not take locks or otherwise perform blocking actions. Alas, these requirements are typically going to be limited to documentation, and are typically not enforceable automatically. Tell the callee what (not) to do, and hope he follows the yellow brick road.
Summary
Be aware of the many opportunities modern languages give us to call "someone else's code," and eliminate external opportunities for deadlock by not calling unknown code from within a critical section. If you additionally eliminate internal opportunities for deadlock by applying a lock hierarchy discipline within the code you control, your use of locks will be highly likely to be correct...and we'll consider lock hierarchies next month. Stay tuned.