The FIFO semantics of the FSL channels map naturally onto the standard Linux software FIFO model, and to the streaming programming model of Impulse C. An FSL port may be opened, read, or written to, just like a normal file. Here is a simple example that shows how easily a software application can interface to a hardware co-processing core through the FSL interconnect (Figure 4).

Figure 4. Simple communication between μClinux applications and Impulse C hardware using the generic FSL FIFO device driver
You can easily modify this basic structure to further exploit the parallelism available. One easy performance improvement is to overlap I/O and computation, using a double-buffering approach (Figure 5).

Figure 5. Overlapping communication and computation for greater system throughput
From these basic building blocks, you are ready to tune and optimize your application. For example, it becomes a simple matter to instantiate a second FFT core in the system, connect it to the MicroBlaze processor, and integrate it into an embedded Linux application.
An interesting benefit of the embedded Linux integration approach is that it allows developers to take advantage of all that Linux has to offer. For example, with the FFT core mapped onto FSL channel 0, we can use MicroBlaze Linux shell commands to drive and test the core:
$ cat input.dat > /dev/fslfifo0 &; cat /dev/fslfifo0 > output.dat;
Linux symbolic links permit us to alias the device names onto something more user-friendly:
$ ln -s /dev/fslfifo0 fft_core
$ cat input.dat > fft_core &; cat fft_core > output.dat;
Conclusion
Although our example demonstrates how you can accelerate a single embedded application using one FSL-attached accelerator, Xilinx Platform Studio tools also permit multiple MicroBlaze CPUs to be instantiated in the same system, on the same FPGA. By connecting these CPUs with FSL channels and employing the generic FSL device driver architecture, it becomes possible to create a small-scale, single-chip multiprocessor system with fast inter-processor communication. In such a system, each CPU may have one or more hardware acceleration modules (generated using Impulse C), providing a balanced and scalable multi-processor hybrid architecture. The result is, in essence, a single-chip, hardware-accelerated cluster computer.
To discover what reconfigurable cluster-on-chip technology combined with C-to-hardware compilation can do for your application, visit www.petalogix.com and www.impulsec.com.
About the Authors
Dr. John Williams is CEO of PetaLogix. He can be reached at [email protected]
Dr. Scott Thibault is President of Green Mountain Computing Systems, Inc. He can be reached at [email protected]
David Pellerin is CTO of Impulse Accelerated Technologies, Inc. He can be reached at [email protected]
[Editor's Note: This article first appeared in the Xilinx Embedded Magazine and is presented here with the kind permission of Xcell Publications.]