The basics of parallel computing
By Blaise Barney, Lawrence Livermore National Laboratory
January 29, 2009
URL : http://drdobbs.com/introduction-to-parallel-computing-part/212903586
S I S DSingle Instruction, Single Data |
S I M DSingle Instruction, Multiple Data |
M I S DMultiple Instruction, Single Data |
M I M D
Multiple Instruction, Multiple Data |
Like everything else, parallel computing has its own "jargon". Some of the more commonly used terms associated with parallel computing are listed below. Most of these will be discussed in more detail later.
Synchronization usually involves waiting by at least one task, and can therefore cause a parallel application's wall clock execution time to increase.
wall-clock time of serial execution
wall-clock time of parallel execution |
One of the simplest and most widely used indicators for a parallel program's performance.
Comparison of Shared and Distributed Memory Architectures | |||
---|---|---|---|
Architecture | CC-UMA | CC-NUMA | Distributed |
Examples | SMPs Sun Vexx DEC/Compaq SGI Challenge IBM POWER3 |
SGI Origin Sequent HP Exemplar DEC/Compaq IBM POWER4 (MCM) |
Cray T3E Maspar IBM SP2 |
Communications | MPI Threads OpenMP shmem |
MPI Threads OpenMP shmem |
MPI |
Scalability | to 10s of processors | to 100s of processors | to 1000s of processors |
Draw Backs | Memory-CPU bandwidth | Memory-CPU bandwidth Non-uniform access times |
System administration Programming is hard to develop and maintain |
Software Availability | many 1000s ISVs | many 1000s ISVs | 100s ISVs |
Machine memory was physically distributed, but appeared to the user as a single shared memory (global address space). Generically, this approach is referred to as "virtual shared memory". Note: although KSR is no longer in business, there is no reason to suggest that a similar implementation will not be made available by another vendor in the future.
The SGI Origin employed the CC-NUMA type of shared memory architecture, where every task has direct access to global memory. However, the ability to send and receive messages with MPI, as is commonly done over a network of distributed memory machines, is not only implemented but is very commonly used.
In both cases, the programmer is responsible for determining all parallelism.
Implementations are available for most common parallel platforms.
Implementations are available for most common parallel platforms.
Copyright © 2012 UBM Techweb