Every successful company, maybe even every successful team, has a development process that works--what team members think of as "their" process. Making some adjustments to the process to accommodate the development of parallel software can have a big impact, keeping the process working as you transition to multi-core with the same high level of productivity you enjoy today.
Intel's Developer Products Division (DPD) has a process, too, one tuned for the development of parallel software and honed by over ten years of experience working with customers, primarily in HPC domains. The DPD process works for multi-core development just as it has worked for other parallel processing environments. It may be helpful to look at its approach, not as a model, but as a sample, something to crib from as you consider how multi-core may change your own development practices.
Intel DPD breaks its process down into four steps. These four steps are intended to be applied continuously, version over version, improving parallelism with continued revision.
- Discover: Find the natural modes of parallelism in an application. Determine which problems are appropriate for parallel decomposition, how functions might be refactored so that different sections can run simultaneously.
- Express: Design and build an implementation structured around the parallel decomposition you've developed in the discovery step.
- Confidence: Test the implementation to develop confidence in its working correctly. Use your findings to improve your implementation or update the model. Take this feedback into the earlier steps.
- Optimize: Tune at the end of the incremental revision, after you're satisfied with the correctness and baseline performance of the implementation.
These four steps are general, and can be related to any development methodology. Within an agile development framework, as an example, the process can be followed through in a single development iteration, restricted only to the sections of the software actively under development, and completed within a release cycle. More generally, if you have a refactoring process, think about adding this threading process as a special case.
Parallel in Every Phase
Developing for multi-core changes the structure of the process. It also means some new considerations for each step along the way. What follows are some pointers to follow (and pitfalls to avoid) in directing a multi-core development project. Let's consider the project step-by-step.
In the discovery step, consider tools early. Look for the threading and messaging tools that best suit your application. Tools are a key element of success in developing parallel applications, and they aren't all alike, so put in the effort up front. Find the right analysis and debugging tools as well as the right threading or messaging libraries. Test components and libraries for thread safety, relying as little as possible on vendor claims.
Threading is a design consideration, not an optimization. Include discussions of threading and thread coordination as you work through the project design. This will help to minimize threading conflicts as you get to implementation. If you can use it, data decomposition should be preferred over strictly functional threading, as data decomposition will scale better to more cores.
Train application experts in threading and parallel techniques. These are the developers that will do most of the implementation in the expression step. If you're going back and threading an existing program, use the original developer of a module to thread his or her own code, rather than a parallel-programming expert.
The confidence step in a threaded project introduces thread-specific testing requirements. In addition to testing additional thread-interaction scenarios, test the threaded version of the application for consistency with a single-threaded implementation, if one exists. This becomes more important as more users deploy on multi-core systems.
Optimization is critical for best parallel performance. Don't expect that just implementing threads will create a dramatic performance improvement. Spend the time on the back end to get the most out of the design by tuning locking, shared memory, cache interactions, and other performance parameters. Automated tools such as Intel's VTune and Thread Profiler can help a great deal in this part of the process.
Creating a fast threaded application or threading existing code takes attention to parallelism in every phase. If you skip the front end of the process, you'll end up with threading added as an optimization, less comprehensive and more likely to introduce bugs. If you skip the back end of the process, you end up with an under-tested, under-performing implementation. View each round of parallel performance enhancement as a continuous task that runs through a release cycle, rather than a feature that can be introduced late or back-burnered under schedule pressure.