A Two-Lock Multiproducer/Consumer Queue
The queue data structure itself is a singly linked list. To make the code simpler for the empty-queue boundary case, the list always contains a dummy node at the head, and so the first
element logically in the queue is the one contained in the node after first. Figure 1 shows the layout of an empty queue, and Figure 2 shows a queue that holds some objects.
Each node holds the T
object by pointer and adds padding:
template <typename T> struct LowLockQueue { private: struct Node { Node( T* val ) : value(val), next(nullptr) { } T* value; atomic<Node*> next; char pad[CACHE_LINE_SIZE - sizeof(T*)- sizeof(atomic<Node*>)]; };
Like any shared variable, the next
pointer needs to be protected by a mutex or, as here, declared as an ordered atomic type (C++0x atomic<>
or Java/.NET volatile
). The padding here is to ensure that two Node
objects won't occupy the same cache line; in particular, in a nonempty queue, having the first and last nodes in the same cache line would penalize performance by causing invisible contention between the producer and consumer. Note that the amount of padding shown here and later on errs on the side of being too conservative: Each Node
will be allocated on the heap, and a heap allocator may add its own extra overhead to each allocated block for alignment and housekeeping information, which effectively adds extra padding. If so, and if we know how much that will be, we could reduce our internal padding proportionately to make a heap-allocated Node
exactly fill one cache line.
Next, here are our queue control variables:
char pad0[CACHE_LINE_SIZE]; // for one consumer at a time Node* first; char pad1[CACHE_LINE_SIZE - sizeof(Node*)]; // shared among consumers atomic<bool> consumerLock; char pad2[CACHE_LINE_SIZE - sizeof(atomic<bool>)]; // for one producer at a time Node* last; char pad3[CACHE_LINE_SIZE - sizeof(Node*)]; // shared among producers atomic<bool> producerLock; char pad4[CACHE_LINE_SIZE - sizeof(atomic<bool>)];
Again, we add padding to make sure hat data used by different threads stay physically separate in memory and cache. Clearly, we want the consumer-end data and the producer-end data to be on separate cache lines; but even though only one producer and one consumer will be active at a time, we want to keep the locking variable separate so that waiting consumers spinning on consumerLock
won't contend on the cache line that contains first
which the active consumer is updating, and that waiting producers spinning on producerLock
won't slow down the active producer who is updating last
.
The constructor just sets up the initial empty state, and the destructor (in .NET or Java, this would be the disposer) just walks the internal list and tears it down:
public: LowLockQueue() { first = last = new Node( nullptr ); producerLock = consumerLock = false; } ~LowLockQueue() { while( first != nullptr ) { // release the list Node* tmp = first; first = tmp->next; delete tmp->value; // no-op if null delete tmp; } }