In your application, how many independent pieces of work are ready to run at any given time? Put another way, how many cores (or hardware threads, or nodes) can your code harness to get its answers faster? And when should the answers to these questions not be "as many as possible"?
Table 1 summarizes three "orders" of throughput available in a given application or operation, using big-Oh notation to describe the amount of CPU-intensive work that will typically be ready to be actively executed at a given time. The rest of this article overviews these orders of scalability and summarizes when each one is appropriate.
Last month [2], we looked at three "pillars" of concurrency: Pillar 1, isolation by structuring work asynchronously and communicating through messages; Pillar 2, scalable throughput by exploiting parallelism in algorithms and data structures; and Pillar 3, dealing with mutable shared state. This month's topic delves into throughput-oriented techniques using techniques in Pillar 1 and Pillar 2.
Order | O(1): Single-Core | O(K): Fixed | O(N): Scalable |
Tagline | One thing at a time | Explicit threading | Re-enable the free lunch |
Summary | Sequential applications, and bottlenecked parallel applications | Explicitly express how much work can be done in parallel | Express lots of latent concurrency in a way that can be efficiently mapped to N cores |
Examples | Multithreaded code convoyed on a global lock or message queue, occasional or intermittent background work | Pipelining, hardwired division of tasks, regular or continuous background computation | Tree traversal, quicksort, compilation |
Applicability | Single-core hardware, single-threaded OS, or nonCPU-bound app | Hardware with fixed concurrency, or app whose CPU-bound parts have limited scalability | Hardware with variable (esp. growing) concurrency, and app with CPU-bound parts that are scalably parallelizable |
Examples | Code targeting legacy hardware, small embedded systems, single-core game consoles; simple text processor | Game targeting one multicore game console generation; code whose key operations are order-sensitive (e.g., can be pipelined but not fully parallelized) | Mainstream desktop or server software with CPU-bound features and targeting commodity hardware or future upgradeable game consoles |
Pillar (see Notes [2]), and today's mainstream tools | Pillar 1: Threads, message queues, futures | Pillar 2: Thread pools, futures, OpenMP |
Table 1: Orders of throughput.