The Ping-Pong Test
To compare the three approaches (NAIVE_POLLING, SLEEP, and TIME_WAIT), I implemented a test called "Ping-Pong" that is similar to the game of table tennis (the source code is available online). In Figure 2, there are two identical queues between the threads T1 and T2. You first load one of the queues with a number of "balls," then ask each thread to read from one queue and write to the other. The result is a controlled infinite loop. By limiting the game to a fixed number of reads/writes ("shots"), you get an understanding of how the queue behaves when varying the waiting/sleep time and strategy and the number of "balls." The faster the game, the better the performance. You should also check CPU usage to see how much of it is used for real work.
- "No ball" means "do nothing" (like two players waiting for the other to start). This gives you an idea of how good the queues are when there is no datahow nervous the players are. Ideally, CPU usage should be zero.
- "One ball" is like the real ping-pong game: Each player shoots and waits for the other to shoot.
- "Two (or more) balls" means both players could shoot at the same time, modulo collision and waiting issues.
In a wait-free system, the more balls in the game, the better the performance gain compared to the classic locking strategy. This is because wait-free is an optimistic concurrency control method (works best when there is no contention), while classical lock-based concurrency control is pessimistic (assumes contention happens and preemptively inserts locking).
Ready to play? Here is the Ping-Pong test command line:
$> ./pingpong [strategy] [timeout] [balls] [shots]
When you run the program, the tests show the results in the table shown in Figure 3:
- The best combination is the timed_wait() with a small wait time (1ms in the test for TIMED_WAIT). It has a very fast response time and almost 0 percent CPU usage when the queue is empty.
- Even when the sleep time is 0 (usleep(0)), the worst seems to be the sleep() method, especially when the queue is likely to be empty. (The number of shots in this case is 100-times smaller than the other cases because of the long duration of the game.)
- The NO_WAIT strategy is fast but behaves worst when there are no balls (100-percent CPU usage to do nothing). It has the best performance when there is a single ball.
Figure 4 presents a table with the results for a classic approach (see SafeQueue). These results show that this queue is, on average, more than four-times slower than the LockFreeQueue. The slowdown comes from the synchronization between threads. Both Produce() and Consume() have to wait for each other to finish. CPU usage is almost 100 percent for this test (similar to the NO_WAIT strategy, but not even close to its performance).