A Limited Lock-Free Queue
Marginean's goal was to write a limited lock-free queue that can be used safely without internal or external locking. To simplify the problem, the article imposed some significant restrictions, including that the queue must only be used from two threads with specific roles: one Producer thread that inserts into the queue, and one Consumer thread that removes items from the queue.
Marginean uses a nice technique that is designed to prevent conflicts between the writer and reader:
- The producer and consumer always work in separate parts of the underlying list, so that their work won't conflict. At any given time, the first "unconsumed" item is the one after the one iHead refers to, and the last (most recently added) "unconsumed" item is the one before the one iTail refers to.
- The consumer increments iHead to tell the producer that it has consumed another item in the queue.
- The producer increments iTail to tell the consumer that another item is now available in the queue. Only the producer thread ever actually modifies the queue. That means the producer is responsible, not only for adding into the queue, but also for removing consumed items. To maintain separation between the producer and consumer and prevent them from doing work in adjacent nodes, the producer won't clean up the most recently consumed item (the one referred to by iHead).
The idea is reasonable; only the implementation is fatally flawed. Here's the original code, written in C++ and using an STL doubly linked list<T> as the underlying data structure. I've reformatted the code slightly for presentation, and added a few comments for readability:
// Original code from [1] // (broken without external locking) // template <typename T> struct LockFreeQueue { private: std::list<T> list; typename std::list<T>::iterator iHead, iTail; public: LockFreeQueue() { list.push_back(T()); // add dummy separator iHead = list.begin(); iTail = list.end(); }
Produce is called on the producer thread only:
void Produce(const T& t) { list.push_back(t); // add the new item iTail = list.end(); // publish it list.erase(list.begin(), iHead); // trim unused nodes }
Consume is called on the consumer thread only:
bool Consume(T& t) { typename std::list<T>::iterator iNext = iHead; ++iNext; if (iNext != iTail) { // if queue is nonempty iHead = iNext; // publish that we took an item t = *iHead; // copy it back to the caller return true; // and report success } return false; // else report queue was empty } };
The fundamental reason that the code is broken is that it has race conditions on both would-be lock-free variables, iHead and iTail. To avoid a race, a lock-free variable must have two key properties that we need to watch for and guarantee: atomicity and ordering. These variables are neither.