Ordering Problems in Consume
Here's another reordering problem, this time from Consume:
if (iNext != iTail) { iHead = iNext; // C t = *iHead; // D
Note that Consume updates iHead to advertise that it has consumed another item before it actually reads the item's value. Is that a problem? We might think it's innocuous, because the producer always leaves the iHead item alone to stay at least one item away from the part of the list the consumer is using.
It turns out this code is broken regardless of which order we write lines C and D, because the compiler or processor or cache can reorder either version in unfortunate ways. Consider what happens if the consumer thread performs a consecutive two calls to Consume: The memory reads and writes performed by those two calls could be reordered so that iHead is incremented twice before we copy the two list nodes' values, and then we have a problem because the producer may try to remove nodes the consumer is still using. Note: This doesn't mean the compiler or processor transformations are broken; they're not. Rather, the code is racy and has insufficient synchronization, and so it breaks the memory model guarantees and makes such transformations possible and visible.
Reordering isn't the only issue. Another problem is that compilers and processors can invent writes, so they could inject a transient value:
// Problematic compiler/processor transformation if (iNext != iTail) { <font color="#FF0000">iHead = 0xDEADBEEF;</font> iHead = iNext; t = *iHead;
Clearly, that would break the producer thread, which would read a bad value for iHead. More likely, the compiler or processor might speculate that most of the time iNext != iTail:
// Another problematic transformation <font color="#FF0000">__temp = iHead; iHead = iNext; // speculatively set to iNext</font> if (iNext == iTail) { // note: inverted test! <font color="#FF0000">iHead = __temp; // undo if we guessed wrong } else {</font> t = *iHead;
But now iHead could equal iTail, which breaks the essential invariant that iHead must never equal iTail, on which the whole design depends.
Can we solve these problems by writing line D before C, then separating them with a full fence? Not entirely: That will prevent most of the aforementioned optimizations, but it will not eliminate all of the problematic invented writes. More is needed.
Next Steps
These are a sample of the concurrency problems in the original code. Marginean showed a good algorithm, but the implementation is broken because it uses an inappropriate type and performs insufficient synchronization/ordering. Fixing the code will require a rewrite, because we need to change the data structure and the code to let us use proper ordered atomic lock-free variables. But how? Next month, we'll consider a fixed version. Stay tuned.
Notes
[1] H. Sutter, "The Trouble With Locks," C/C++ Users Journal, March 2005. (www.ddj.com/cpp/184401930).
[2] P. Marginean, "Lock-Free Queues," Dr. Dobb's Journal, July 2008. (www.ddj.com/208801974).
[3] B. Dawes, et al., "Thread-Safety in the Standard Library," ISO/IEC JTC1/SC22/WG21 N2669, June 2008. (www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2669.htm).