In the Field: Commercial Software
As we did with HPC earlier in this series, let's relate some of these points to the development of a few actual business software products. We'll discuss two projects, one a revision to a voice communication and collaboration application that is now complete, and the other the ongoing threading of a large desktop productivity application. Intel engineers played, or are playing, a consulting role on these development efforts and they provided us with these brief case studies.
The voice application had an unusual goal. Instead of using multiple cores to better performance, developers sought to distribute the workload among cores as evenly as possible. With more even core utilization, CPU clock frequency could be dropped, conserving power.
The project team included two company engineers and one Intel engineer. At the start of the project, the application was functionally threaded only, and core utilization was uneven. The discovery step began with a review of the entire system, from the driver level up, to find opportunities for parallelism. Engineers settled on applying data decomposition techniques to the software's audio codec as the best approach. In the design, pooled threads both encode and decode audio data. The number of threads in the pool is proportional to the number of cores, for even loading. The main thread pulls threads from the pool and uses them in the codec as needed, returning them to the pool when they complete their task. Each thread runs at the same priority, again for most even distribution of active threads among cores.
As the expression step began and developers began to thread the codec, they found that a third-party library they had been using was not thread-safe. Rather than replace the entire library, which is used throughout the application, developers chose to build thread-safe replacements for only the library functions that were used in the threaded sections. That complete, developers found only a few synchronization bugs as they began to test for confidence.
Although the voice project was focused on core utilization and not application performance, developers proceeded to an optimization step after verifying that the threaded codec was working properly. They focused particularly on serial optimization within the threaded section, since that would reduce maximum core utilization. The team used Intel's VTune and Thread Profiler tools in tuning.
The second project -- at a very different stage -- is the threading of a large desktop productivity application. In this project, still in the discovery step, engineers are unraveling an inefficient threaded implementation before they can begin to look for better, more natural opportunities for both functional and data parallelism. This amounts to forensic work on the application, using VTune and a debugger to find dependencies and to map out the work done by the main thread.
The desktop application has close to 40 threads, but the main thread is doing over 95% of the work. Engineers need to first address some basic performance issues, such as reducing the time spent in system calls, before moving on to discover new opportunities for threading. The initial threading plan is to decompose the main thread, almost starting from the same point as one might with a single-threaded application.
It's hard to say how the desktop application project will proceed, but the rough schedule is for the three engineers working on the project to spend three months in further discovery (while simultaneously working on basic performance) and then the following three months on the threaded implementation.
The Development Manager's Role
In their 1937 Papers on the Science of Administration, Gulick and Urwick described a manager's role in terms of seven activities. Their's is a seminal work, one that helped to define the discipline of management. In this series, we've focused quite a bit on how multi-core changes development process and development practice. To wrap up this installment, let's take a look at multi-core from another perspective, focusing instead on how the development of parallel software changes the manager's job. We'll break it down according to the seven functions provided by Gulick and Urwick.
Planning. In planning a project, consider how parallelism will change your development process. Take an integrated approach to parallel programming that runs through every phase of development.
- Organizing. Organize the team according to the roles we've discussed earlier in this series. Include a lead architect that understands threading, and application experts that understand the problem domain. Include testers that can verify that parallel code works on the range of target hardware, varying the number and speed of processing cores.
- Staffing. Start with application experts and train them in parallel techniques. Attempt to develop parallel skills throughout the development team. As a goal, the overall team's expertise should be 80% in the application domain and 20% in parallelism.
- Directing. Guide the project through your development process, as modified to accommodate multi-core. Know the exit signals from each step of the process. As in the voice application case study, know when changing conditions (a library that's not thread safe) require a change to the implementation (the development of thread-safe replacements).
- Coordinating. For large applications, you'll need to carefully coordinate the threading team's changes with those of other teams making functional or architectural changes. If you don't have internal threading expertise, coordinate your team's efforts so that they follow the technology lead of experts in your application area.
- Reporting. Parallel programming is seen as a performance activity, so you'll often be targeting a performance metric, but correctness is always important. Parallelism adds additional testing requirements. Track whether bugs are due to parallel interactions or present in a single thread.
- Budgeting. Budget for staff requirements discussed above. Consider the tradeoff between the software costs of buying threading components vs. the development costs of building parallelism into the application itself. The best budgeting news is on the hardware side, where multi-core is the new mainstream for desktop and laptop systems. That means that even the natural obsolescence of development and test machines will put more and more developers and testers onto the target multi-core platform.
Multi-core is the new mainstream for more than business customers—it's time to start putting those cores to good use in consumer software, too. In our next installment, we'll look at the management issues around the development of multi-core games and other consumer software projects.
Steve Apiki is senior developer at Appropriate Solutions, Inc., a Peterborough, NH consulting firm that builds server-based software solutions for a wide variety of platforms using an equally wide variety of tools. Steve has been writing about software and technology for over 15 years.