Converting C code to an HDL accelerator with a C-to-HDL tool is an efficient method for creating hardware coprocessors. The illustration in Fig 3 and the steps detailed below this figure summarize the C-to-HDL conversion process:
3. C-to-HDL design flow.
- Implement the application or algorithm using standard C tools. Develop a software test bench for baseline performance and correctness (host or desktop simulations). Use a profiler (such as gprof) to begin identifying critical functions.
- Determine if floating-to-fixed point conversion is appropriate. Use libraries or macros to aid in this conversion. Use a baseline test bench to analyze performance and accuracy. Use the profiler to reevaluate critical functions.
- Using a C-to-HDL tool, such as Impulse C, iterate on each of the critical functions to:
- Partition the algorithm into parallel processes.
- Create hardware/software process interfaces (streams, shared memories, signals).
- Automatically optimize and parallelize the critical code sections (such as inner code loops).
- Test and verify the resulting parallel algorithm using desktop simulation, cycle-accurate C simulation, and actual in-system testing.
- Using the C-to-HDL tool, convert the critical code segment to an HDL coprocessor.
- Attach the coprocessor to the APU interface for final testing.
Impulse: C-to-HDL tool
Impulse C, shown in Fig 4, enables embedded system designers to create highly parallel, FPGA-accelerated applications by using C-compatible library functions in combination with the Impulse CoDeveloper C-to-hardware compiler. Impulse C simplifies the design of mixed hardware/software applications through the use of well-defined data communication, message passing, and synchronization mechanisms. Impulse C provides automated optimization of C code (such as loop pipelining, unrolling, and operator scheduling) and interactive tools, allowing you to analyze cycle-by-cycle hardware behavior.
4. Impulse C.
Impulse C is designed for dataflow-oriented applications, but it is also flexible enough to support alternate programming models, including the use of shared memory. This is important because different FPGA-based applications have different performance and data requirements. In some applications, it makes more sense to move data between the embedded processor and the FPGA through block memory reads and writes; in other cases, a streaming communication channel might provide higher performance. The ability to quickly model, compile, and evaluate alternate algorithm approaches is an important part of achieving the best possible results for a given application.
To this end, the Impulse C library comprises minimal extensions to the C language in the form of new data types and predefined function calls. Using Impulse C function calls, you can define multiple, parallel program segments (called processes) and describe their interconnections using streams, signals, and other mechanisms. The Impulse C compiler translates and optimizes these C-language processes into either:
- Lower-level HDL that can be synthesized to FPGAs, or
- Standard C (with associated library calls) that can be compiled onto supported microprocessors through the use of widely available C cross-compilers.
The complete CoDeveloper development environment includes desktop simulation libraries compatible with standard C compilers and debuggers, including Microsoft Visual Studio and GCC/GDB. Using these libraries, Impulse C programmers are able to compile and execute their applications for algorithm verification and debugging purposes. C programmers are also able to examine parallel processes, analyze data movement, and resolve process-to-process communication problems using the CoDeveloper Application Monitor.
The output of an Impulse C application, when compiled, is a set of hardware and software source files that are ready for importing into FPGA synthesis tools. These files include:
- Automatically generated HDL files representing the compiled hardware process.
- Automatically generated HDL files representing the stream, signal, and memory components needed to connect hardware processes to a system bus.
- Automatically generated software components (including a run-time library) establishing the software side of any hardware/software stream connections.
- Additional files, including script files, for importing the generated application into the target FPGA place and route environment.
The result of this compilation process is a complete application, including the required hardware/software interfaces, ready for implementation on an FPGA-based programmable platform.
Design example
The Mandelbrot image shown in Fig 5, a classic example of fractal geometry, is widely used in the scientific and engineering communities to simulate chaotic events such as weather. Fractals are also used to generate textures and imaging in video-rendering applications. Mandelbrot images are described as self-similar; on magnifying a portion of the image, another image similar to the whole is obtained.
5. Mandelbrot image and code acceleration.
The Mandelbrot image is an ideal candidate for hardware/software co-design because it has a single computation-intensive function. Making this critical function faster by moving it to the hardware domain significantly increases the speed of the whole system. The Mandelbrot application also lends itself nicely to clear divisions between hardware and software processes, making it easy to implement using C-to-HDL tools.
We used the CoDeveloper tool set as the C-to-HDL tool set for this design example. We modified a software-only Mandelbrot C program to make it compatible with the C-to-HDL tools. Our changes included division of the software project into distinct processes (independent units of sequential execution); conversion of function interfaces (hardware to software) into streams; and the addition of compiler directives to optimize the generated hardware. We subsequently used the CoDeveloper tool set to create the Pcore coprocessor that was imported into Xilinx Platform Studio (XPS). Using XPS, we attached the PC to the PowerPC APU controller interface and tested the system.
Xilinx Application Note XAPP901 provides a full description of the design along with design files for downloading. Meanwhile, User Guide UG096 provides a step-by-step tutorial in implementing the design example.
Performance improvement examples
We measured performance improvements for the Mandelbrot image texturing problem, an image filtering application, and triple DES encryption. The performance improvements, demonstrating acceleration ranging from 11X to 34X that of software, are documented in Table 2.
Table 2. Algorithm acceleration through coprocessor accelerators.
Conclusion
Constrained by power, space, and cost, you might need to make a non-ideal processor choice. Frequently, it is a choice where the processor is of lower performance than desired. When the software code does not run fast enough, a coprocessor code accelerator becomes an attractive solution. You can hand-craft an accelerator in HDL or use a C-to-HDL tool to automatically convert the C code to HDL.
Using a C-to-HDL tool such as Impulse C enables quick and easy accelerator generation. Virtex-4 FX FPGAs, with one or two embedded PowerPCs, enable tight coupling of the processor instruction pipeline to software accelerators. As demonstrated in this article, critical software routines can be accelerated from 10X to more than 30X, enabling a 300 MHz PowerPC to provide performance equaling or exceeding that of a high-performance multi-gigahertz processor. The above examples were generated in just a few days each, demonstrating the rapid design, implementation, and testing possible with a C-to-HDL flow.
Glenn Steiner is Sr. Engineering Manager, Advanced Products Division Xilinx, Inc. Glenn can be reached at [email protected].
Kunal Shenoy is a Design Engineer, Advanced Products Division Xilinx, Inc. Kunal can be reached at [email protected].
Dan Isaacs is Director of Embedded Processing, Advanced Products Division Xilinx, Inc. Dan can be reached at [email protected].
David Pellerin is Chief Technology Officer at Impulse Accelerated Technologies. David can be reached at [email protected].
Editor's Note: This article first appeared in the Xilinx Embedded Magazine and is presented here with the kind permission of Xcell Publications.