Parallel

RapidMind: C++ Meets Multicore

By Stefanus Du Toit and Michael McCool, June 08, 2007

RapidMind is a framework for expressing data-parallel computations from within C++ and executing them on multicore processors.

Real Applications

A number of applications written using the RapidMind platform illustrate the performance gains. One such example is RTT AG (www.rtt.ag), a provider of automotive visualization software. RTT's RealTrace software was built using RapidMind, and is the world's first workstation real-time raytracer. It lets accurate reflections and refractions be displayed on interactive models; see Figure 3. By leveraging GPUs to perform the complicated computations involved in raytracing, RTT was able to put a product on the market that surpasses the state-of-the-art in raytracing performance. The same code was ported to the Cell BE processor within three weeks, and shown as a demonstration in IBM's SIGGRAPH 2006 booth.

Figure 3: RTT's RealTrace application produces real-time, ray-traced images.

With quad-core CPUs now available (and larger numbers of cores on the horizon), we are preparing for the pervasiveness of multicore systems. To this end, we recently demonstrated a prototype of a RapidMind x86 multicore backend on a financial modeling application; see Figure 4. Compared to hand-tuned C code using the Intel C++ Compiler, the RapidMind version of the algorithm was able to achieve twice the performance on a single core, scaling to eight times the performance on four cores, with no additional application effort. The increased performance on a single core may come as a surprise. It stems from the fact that the semantics of the platform let our code generators perform much better analysis and optimization of the application code. We have been careful to design the system so that we are not plagued by issues (such as pointer aliasing) that inhibit effective optimization on C or C++. At the same time, our system integrates so cleanly with C++ that all the modularity constructs of the language are available to the developer, generally at no performance cost.

[Click image to view at full size]

Figure 4: Performance comparison between a tuned C++ implementation and a RapidMind-enabled version running on an x86 multicore backend prototype.

Likewise, Hewlett-Packard and RapidMind performed comparisons between CPUs and GPUs for a scientific algorithms, such as the Fast Fourier Transform (FFT) and the Basic Linear Algebra Subroutines (BLAS) single-precision matrix-multiply (SGEMM). The results showed that the RapidMind implementations on a GPU were between 2.4 and 32.2 times as fast as the same algorithm running on a CPU core. For the CPU comparisons, extremely tuned numerical libraries were used.

Previous 2 3 4 5 6 7

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Parallel

RapidMind: C++ Meets Multicore

Real Applications

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Parallel Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Parallel

RapidMind: C++ Meets Multicore

Real Applications

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Parallel Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content