Parallel

Writing a Generalized Concurrent Queue

By Herb Sutter, October 29, 2008

Herb tackles the general problem of supporting multiple producers and multiple consumers with as much concurrency as possible.

A Two-Lock Multiproducer/Consumer Queue

The queue data structure itself is a singly linked list. To make the code simpler for the empty-queue boundary case, the list always contains a dummy node at the head, and so the first element logically in the queue is the one contained in the node after first. Figure 1 shows the layout of an empty queue, and Figure 2 shows a queue that holds some objects.

Figure 1: Structure of an empty queue.

Figure 2: A queue containing four T objectsempty queue.

Each node holds the T object by pointer and adds padding:

template <typename T>
struct LowLockQueue {
private:
  struct Node {
    Node( T* val ) : value(val), next(nullptr) { }
    T* value;
    atomic<Node*> next;
    char pad[CACHE_LINE_SIZE - sizeof(T*)- sizeof(atomic<Node*>)];
  };

Like any shared variable, the next pointer needs to be protected by a mutex or, as here, declared as an ordered atomic type (C++0x atomic<> or Java/.NET volatile). The padding here is to ensure that two Node objects won't occupy the same cache line; in particular, in a nonempty queue, having the first and last nodes in the same cache line would penalize performance by causing invisible contention between the producer and consumer. Note that the amount of padding shown here and later on errs on the side of being too conservative: Each Node will be allocated on the heap, and a heap allocator may add its own extra overhead to each allocated block for alignment and housekeeping information, which effectively adds extra padding. If so, and if we know how much that will be, we could reduce our internal padding proportionately to make a heap-allocated Node exactly fill one cache line.

Next, here are our queue control variables:

    char pad0[CACHE_LINE_SIZE];
 
  // for one consumer at a time
  Node* first;
 
  char pad1[CACHE_LINE_SIZE
           - sizeof(Node*)];
 
  // shared among consumers
  atomic<bool> consumerLock;
 
  char pad2[CACHE_LINE_SIZE 
           - sizeof(atomic<bool>)];
 
  // for one producer at a time
  Node* last; 
 
  char pad3[CACHE_LINE_SIZE 
           - sizeof(Node*)];
 
  // shared among producers
  atomic<bool> producerLock;
 
  char pad4[CACHE_LINE_SIZE 
           - sizeof(atomic<bool>)];

Again, we add padding to make sure hat data used by different threads stay physically separate in memory and cache. Clearly, we want the consumer-end data and the producer-end data to be on separate cache lines; but even though only one producer and one consumer will be active at a time, we want to keep the locking variable separate so that waiting consumers spinning on consumerLock won't contend on the cache line that contains first which the active consumer is updating, and that waiting producers spinning on producerLock won't slow down the active producer who is updating last.

The constructor just sets up the initial empty state, and the destructor (in .NET or Java, this would be the disposer) just walks the internal list and tears it down:


public:
  LowLockQueue() {
    first = last = new Node( nullptr );
    producerLock = consumerLock = false;
  }
  ~LowLockQueue() {
    while( first != nullptr ) {      // release the list
      Node* tmp = first;
      first = tmp->next;
      delete tmp->value;       // no-op if null
      delete tmp;
    }
  }

Previous 1 2 3 4 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Parallel

Writing a Generalized Concurrent Queue

A Two-Lock Multiproducer/Consumer Queue

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Parallel Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Parallel

Writing a Generalized Concurrent Queue

A Two-Lock Multiproducer/Consumer Queue

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Parallel Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content