Ordering Problems in Produce
Second, reads and writes of a lock-free variable must occur in an expected order, which is nearly always the exact order they appear in the program source code. But compilers, processors, and caches love to optimize reads and writes, and will helpfully reorder, invent, and remove memory reads and writes unless you prevent it from happening. The right prevention happens implicitly when you use mutex locks or ordered atomic variables (C++0x std::atomic, Java/.NET volatile); you can also do it explicitly, but with considerably more effort, using ordered API calls (e.g., Win32 InterlockedExchange) or memory fences/barriers (e.g., Linux mb). Trying to write lock-free code without using any of these tools can't possibly work.
Consider again this code from Produce, and ignore that the assignment iTail isn't atomic as we look for other problems:
list.push_back(t); // A: add the new item iTail = list.end(); // B: publish it
This is a classic publication race because lines A and B can be (partly or entirely) reordered. For example, let's say that some of the writes to the T object's members are delayed until after the write to iTail, which publishes that the new object is available; then the consumer thread can see a partly assigned T object.
What is the minimum necessary fix? We might be tempted to write a memory barrier between the two lines:
// Is this change enough? list.push_back(t); // A: add the new item <font color="#FF0000">mb(); // full fence</font> iTail = list.end(); // B: publish it
Before reading on, think about it and see if you're convinced that this is (or isn't) right.
Have you thought about it? As a starter, here's one issue: Although list.end is probably unlikely to perform writes, it's possible that it might, and those are side effects that need to be complete before we publish iTail. The general issue is that you can't make assumptions about the side effects of library functions you call, and you have to make sure they're fully performed before you publish the new state. So a slightly improved version might try to store the result of list.end into a local unshared variable and assign it after the barrier:
// Better, but is it enough? list.push_back(t); <font color="#FF0000">tmp</font> = list.end(); <font color="#FF0000">mb(); // full fence</font> iTail <font color="#FF0000">= tmp;</font>
Unfortunately, this still isn't enough. Besides the fact that assigning to iTail isn't atomic and that we still have a race on iTail in general, compilers and processors can also invent writes to iTail that break this code. Let's consider write invention in the context of another problem area: Consume.