C/C++

A Simple and Efficient FFT Implementation in C++:
Part II

By Volodymyr Myrnyy, May 25, 2007

The specialization of short FFTs, FFT selection at runtime, and comparative benchmarks

Comparative Benchmarks and Conclusions

Definition of the GFFT object factory means instantiation of all needed GFFT template classes during compilation. Each of them includes O(P) recursive template classes. A reasonable question: how long will the compilation take? GNU C++ 4.x was used to compile performance tests shown in Figures 1 and 2, and took about 10 seconds with optimization keys for a full range of values P in the object factory, because the overall number of different templates to be instantiated stays small. Microsoft Visual C++ 2003 and Intel C++ 8.x and later versions showed similar results, although not all C++ compilers can treat the recursive templates of the GFFT so efficiently. I suppose some older compilers try to instantiate template classes for each member function call even if the template classes are the same. There exist O(2^P) such member function calls in the GFFT. As a result, the compilation hangs and finally fails, after entire memory has been exhausted. Those unlucky examples are IBM C++ 6 (Visual Age) and GNU C++ 3.x, but not GNU C++ 2.96! They could compile the GFFT object factory without optimization options quite fast, but hung with optimization turned on.

A nice property of the GFFT that distinguishes it from other well known FFT libraries is that a runtime computation of roots of unity using sine and cosine functions is not needed any more. Usual FFT implementations of known libraries like FFTW or Intel MKL proceed in two steps:

"Planning" including calculation of roots of unity Computing of FFT.

The second step can be repeated for different data of the same length. The GFFT omits the first step without any additional penalty. Figure 2 shows benchmarks of the GFFT compared to FFTW with options FFTW_ESTIMATE and FFTW_MEASURE as well as routine zfft1dc from Intel MKL 7.0, where total CPU time of both steps was measured. Being very good for small P, the performance of FFTW become poor for large P. The only hardware optimized Intel MKL has a similar high performance comparing to the GFFT. Qualitatively the same benchmark results were obtained on Itanium2 processor with Intel C++ compiler.

Finally, I would like to make a general conclusion as to the differences between the GFFT and traditional implementation. What made the GFFT so efficient? The essential step was the assumption of P to be static constant. The reason was not simply to play with template metaprogramming, but to give additional information to the compiler, so that it could optimize the code better. Really, many numerical algorithms have some integer parameters, which vary in a small range (like P in FFT). It can be assumed static and should be used as far as possible in template metaprogramming, but not to overload the compiler taking care about the total number of template classes to be instantiated. The final implementation can be more efficient, since the compiler has got more information and so an opportunity to optimize the code. You can use such an implementation originally with static parameter or compile it for all possible or needed parameter values within object factory and use as a library. The latter means you can choose highly optimized pieces of code without dynamic bindings at runtime.

This article described a C++ implementation technology on example of simple classical Cooley-Tukey FFT algorithm. Of course, there are many advanced and more complicated FFT algorithms, such as Winograd FFT, parallelized FFT, and the like. Those modern FFT approaches can also take advantages of template recursion and metaprogramming to improve performance significantly.

[Click image to view at full size]

Figure 2a: Comparative benchmark, part 1.

[Click image to view at full size]

Figure 2b: Comparative benchmark, part 2.

This month's Dr. Dobb's Journal

Dr. Dobb's Digital Digest - October 2014

This month, Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android , and much more!

Download the latest issue today. >>

Upcoming Events

C/C++

A Simple and Efficient FFT Implementation in C++:
Part II

Comparative Benchmarks and Conclusions

References

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

C/C++

A Simple and Efficient FFT Implementation in C++: Part II

Comparative Benchmarks and Conclusions

References

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

A Simple and Efficient FFT Implementation in C++:
Part II