Optimization Process
Optimization using the compiler begins with a characterization of the application. The goal of this step is to determine properties of the code that may favor one optimization over another and to help in prioritizing the optimizations. Large applications may benefit from optimizations for cache memory. If the application contains floating-point calculations, vectorization, which parallelizes floating-point computations, may provide a benefit.
The second step is to prioritize testing of compiler optimization settings based on an understanding of which optimizations are likely to provide a beneficial performance increase. Performance runs take time and effort so it's essential to prioritize the optimizations that are likely to increase performance and foresee any potential challenges in applying them. For example, some advanced optimizations require changes to the build environment. If you want to measure the performance of these advanced optimizations, you must be willing to invest the time to make these changes. At the least, the effort required may lower the priority. Another example is the effect of higher optimization on debug information. Generally, higher optimization decreases the quality of debug information. So besides measuring performance during your evaluation, you should consider the effects on other software-development requirements. If the debugging information degraded to an unacceptable level, you may decide against using the advanced optimization or you may investigate compiler options that can improve debug information.
The third step, selecting a benchmark, involves choosing a small input set for your application so that performance of the application compiled with different optimization settings can be compared. In selecting benchmarks, keep in mind that benchmark:
- Runs should be reproducible, and not result in substantially different times every run.
- Should run in a short time to enable running many performance experiments; however, the execution time cannot be so short that variations in runtime using the same optimizations is significant.
- Should be representative of what your clients typically run.
The next step is to build the application using the desired optimizations, run the tests, and evaluate the performance. The tests should be run at least three times apiece. Our recommendation is to discard the slowest and fastest times and use the middle time as representative. We recommend checking your results as you obtain them, seeing if the actual results match up with your expectations. If the time to do a performance run is significant, you may be able to analyze and verify your collected runs elsewhere and catch any mistakes or missed assumptions early. Finally, if the measured performance meets your performance targets, it is time to place the build changes into production. If the performance does not meet your target, the use of a performance analysis tool (such as the Intel VTune Performance Analyzer) should be considered for your application code. Figure 1 summarizes the process for applying aggressive compiler optimization to application code.
