site Search Results
Results for: Farber
Order the NEW
Discounted Dr. Dobb's Developer Library DVD 6
Purchase the fully searchable DVD for $59.95 - a 60% discount! Features
21 years of Dr. Dobb's Journal, 15 years of Sys Admin
magazine, 14+ years of C/C++ Users Journal, 1 year worth of Dr.
Dobb's Digest, podcasts, videos and more! Order Now.
A Massively Parallel Stack for Data Allocation
A fast, constant, type memory allocator and parallel stack are essential for initiating kernel launches from the CUDA device - Parallel
A Robust Histogram for Massive Parallelism
Preserving highly parallel performance when every thread is simultaneously trying to increment a single object - Parallel
CUDA: Unifying Host/Device Interactions with a Single C++ Macro
A general method to move data transparently between the host and the CUDA device. - Parallel
Atomic Operations and Low-Wait Algorithms in CUDA
Used correctly, atomic operations can help implement a wide range of generic data structures and algorithms in the massively threaded GPU programming environment. However, incorrect usage can turn massively parallel GPUs into poorly performing sequential processors. - Parallel
Exceeding Supercomputer Performance with Intel Phi
Using MPI on inexpensive clusters of Intel Xeon Phi coprocessors can produce results that exceed the performance of today's high-end supercomputers. - Parallel
Numerical and Computational Optimization on the Intel Phi
How tuning functions for large data sets and profiling the results gets most of the benefits of the Phi's 60 cores without hand wringing and late-night hacking. - Parallel
Getting to 1 Teraflop on the Intel Phi Coprocessor
The key to truly high performance with the Phi coprocessor is to express sufficient parallelism and vector capability to fully utilize the device. Here is a timing framework that enables you to measure and optimize performance and push it past 1 teraflop. - Parallel
Programming the Xeon Phi
A series of articles on getting the best performance out of the new Intel Xeon Phi coprocessor - Parallel
Comparing OpenCL, CUDA, and OpenACC [video]
Rob Farber takes you on a tour of the paths to massively parallel x86, MultiGPU, and CPU+GPU applications. - Parallel
CUDA vs. Phi: Phi Programming for CUDA Developers
Both CUDA and Phi coprocessors provide high degrees of parallelism that can deliver excellent application performance. For the most part, CUDA programmers with existing application code have already written their software so it can run well on Phi coprocessors. However, additional work may be required to achieve the highest possible performance. - Parallel