![]() |
We impliment fixed-point matrix inversion on a Virtex-4 FPGA using a synthesizable QR-decomposition MATLAB model and the AccelDSP Synthesis tool. The resulting function occupies 12% of a XC4VSX55 device and has a 1.7 MSPS data rate.
October 12, 2006
URL:http://drdobbs.com/embedded-systems/implementing-matrix-inversions-in-fixed-/193300128
Matrix inversion is an important operation
in many state-of-the-art DSP algorithms
and implementations, including
radar, sonar, and multiple antenna systems
for communications. A common
component of these algorithms is a beamformer
or spatial filter, whose function is
to steer (in some optimal fashion) the
response of an array of sensors for the
reception of signal sources.
When using the least-squares (LS) criterion, the computation of optimum weights is based on the solution of a system of linear equations known as the deterministic normal equation. This is shown in the equation:
Here, w is a vector of beamformer weights, which can be obtained with inversion of the correlation matrix Rx as shown in the equation
From a numerical point of view, the best approach to matrix inversion is to not do it explicitly, whenever possible. Instead, it is better to solve the system of equations using an adequate solution technique.
Traditionally, implementations like this have been done with general-purpose DSP devices using floating-point arithmetic to minimize round-off error. A disadvantage of these implementations, however, is the limited processing power because of the small number of floating-point processing units commonly available per device. An appealing alternative for implementation is to use the Xilinx Virtex-4 FPGA family, which offers large amounts of parallelism. One complication with these silicon fabrics is that they are tailored for fixed-point arithmetic, and implementation in these is inherently challenging because of sensitivity to round-off error.
In this article, we'lI present an efficient
methodology that enables the implementation
of algorithms involving matrix-inversion operations
in hardware with fixed-point arithmetic.
This methodology includes three
essential steps to follow in the development
process:
Using this methodology, you can fully exploit the benefits of the processing power offered by implementations in FPGA or ASIC fixed-point hardware.
Beamforming and Matrix Inversion
Figure 1 shows a basic narrowband beamformer
with K sensor elements arranged in a
uniform linear array (ULA); this also shows
a signal source sq(t) impinging on the array
at an angle of incidence q. The K beamformer
weights (w1, w2, …, wK) are used to
linearly combine the array data observation
samples (x1(n), x2(n), ..., xK(n)). These are
set to "steer" the response of the array for
optimum reception. The output of the
beamformer is the scalar y(n).
A generalized sidelobe canceller (GSC) is a special beamformer structure that allows the use of unconstrained optimization methods in the design of the optimum beamformer weights. The structure of the GSC is shown in Figure 2. To find the optimum weights wa using the LS criterion, the following deterministic normal equation must be solved:
Here, Rx is the correlation matrix of the input to the unconstrained section of the GSC and the vector b is the cross-correlation of the input Xa and the ideal response.
One effective technique for the solution of this equation is the recursive least-squares (RLS) approximation with QR decomposition of the input data matrix. This technique finds the solution without explicit inversion of a matrix and avoids constructing the correlation matrix, explicitly reducing the dynamic range requirements of signals involved in the computations.
Figure 3 shows the diagram of an adaptive GSC beamformer that uses a QRD-RLS algorithm for a recursive solution of the normal equation.
The GSC MATLAB model consists of three parts. A top-level script generates signals and displays results to analyze the performance the beamformer. The script invokes the QRD-RLS algorithm function in a streaming fashion to perform interference cancellation. Figure 4 shows an excerpt of this script.
... % combine signal, interference and noise s = s_p + s_i + DetNoise; Wc = ones(NSensors,1); d = s*Wc; % broadside array output % blocking matrix B = [eye(NSensors-1); zeros(1,NSensors-1)] - - [zeros(1,NSensors-1); eye(NSensors-1)]; s_n = s * B; % interferer and noise only % streaming loop for recursive LS computation for n = 1:NUM_ITER [e_rls(n)] = qrd_rls_spatial(s_n(n,:),d(n)); end ... |
The second part of the model is a synthesizable QRD-RLS algorithm function, qrd_rls_spatial(), which performs optimum cancellation of the interferer signal. This function is shown in Figure 5.
function [e] = qrd_rls_spatial(xa,d) % QRD-RLS for adaptive spatial filtering M = 3; persistent R p if isempty(R) R = zeros(1,M*M); R(1:M:end) = 0.01; p = zeros(1,M); end Ci = 1.0; lambda = sqrt(0.99); % update R matrix and p vector for row = 1:M xvec = lambda*[R(1:M) p(1) Ci]; yvec = [xa d 0]; xvec_rot,yvec_rot] = givens_rotation(xvec,yvec); R = [R(M+1:end) xvec_rot(1:M)]; xa = [yvec_rot(2:M) 0]; p = [p(2:end) xvec_rot(M+1)]; d = yvec_rot(M+1); Ci = xvec_rot(end); end e = Ci*d; % recursive error estimate |
The last part of the GSC model is the synthesizable function that rotates arrays of values to perform orthogonal Givens rotations (givens_rotation). This rotation function can be automatically generated by the AccelDSP Synthesis tool. This function is shown in Figure 6.
function [v, w] = givens_rotation( x, y ) % Givens rotation r_sqr = x(1)^2 + y(1)^2; r_inv = invsqrt_001(r_sqr); sin_phi = y(1)*r_inv; cos_phi = x(1)*r_inv; v = x*cos_phi + y*sin_phi; w = y*cos_phi - x*sin_phi; |
The analysis and visualization of the performance of the GSC model is shown with plots in Figure 7. The signal of interest is shown on the top plot.
The middle plot shows the signal from the data-independent portion of the GSC. This signal clearly shows the distortion caused by the interferer signal impinging on the sensor array.
The bottom plot shows the output of the GSC after effective cancellation of the interferer done by the QRD-RLS algorithm function.
Figure 8 shows beampatterns of the GSC. The top plot shows the beampattern of the data-independent portion of the GSC. This shows that the interferer signal impinging at 10° suffers an attenuation of only 2 dB approximately relative to that of the desired signal at 0°; this small attenuation is what causes the distortion in the received signal from the broadside.
The middle plot shows the overall GSC beampattern. The improvement in the cancellation of the interfering signal can be seen with the larger attenuation at 10°. This is what accounts for the cancellation of the interferer signal obtained at the output of the GSC.
The bottom plot is a zoomed view of
the overall GSC beampattern to highlight
the attenuation achieved around 10°.Definition of the Fixed-Point Model
The starting point of the design is the
original "golden" reference MATLAB
model of the GSC. The next step is to
define a fully parameterized fixed-point
arithmetic model. This model is directly
coupled to the original MATLAB model
to maintain lockstep with this golden reference.
There are two critical aspects for
efficiency in this step:
Defining a fully parameterized fixedpoint arithmetic model is an iterative process. In the case of the GSC with QRD-RLS, the numerical performance of the implicit matrix inversion operation is measured by the attenuation shown in the overall beampattern. We evaluated several input bit-widths with the intermediate variables sized to avoid overflows—the effect on the attenuation in the beampattern is shown in Figure 9.
We selected a 16-bit implementation that achieves an almost ideal interference rejection, as shown in Figure 9.
Generation of Hardware Implementation
The final step in our methodology is to
generate the hardware implementation.
There are two critical aspects to achieve
efficiency:
The generation of a suitable hardware
implementation is also done iteratively
to balance resource utilization
and speed of operation. For the QRD-RLS
algorithm, there are two points
where the area/speed of the implementation
can be affected:
We performed iterations of this step exploring different combinations of resource utilization and speed of operation of the QRD-RLS function. The results of RTL synthesis, summarized in Figure 10, are for a Xilinx Virtex-4 XC4VSX55 target device.
Metric | Multiplier Parallel | Multiplier Sequential | CORDIC Parallel | CORDIC Sequential |
---|---|---|---|---|
LUTs | 5% | 4% | 10% | 9% |
DSP48s | 51 | 19 | 1 | 1 |
Sustainable Data Rate | 1.9 MSPS | 0.8 MSPS | 1.8 MSPS | 0.18 MSPS |
The results indicate a small decrease in resource utilization (LUTs) with sequential implementations. With a goal of maximum speed of operation and a minimum use of hardware multipliers, the CORDIC parallel implementation was picked for the place and route of the netlist from RTL synthesis. The results of the implementation using the Xilinx ISE software mapped to the same target device as during synthesis are shown in Figure 11.
Occupied Slices | 3076 (12%) |
DSP48s | 1 |
Sustainable Data Rate | 1.7 MSPS |
The hardware implementation results show that the QRD-RLS function can be implemented in 12% of the logic resources of a XC4VSX55 device with a sustainable data rate of 1.7 megasamples per second.
Conclusion
You can create an efficient hardware implementation of DSP algorithms in Xilinx FPGAs using matrix inversion operations with fixed-point arithmetic. The efficiency with which you can implement these algorithms is based on the use of the AccelDSP Synthesis tool to enable a high level of automation. The results show the effectiveness of this methodology in the implementation of a challenging algorithm in fixed-point arithmetic hardware.
To get more information about Xilinx DSP solutions, visit www.xilinx.com/dsp/.
About the authors
Ramon Uribe is a Sr. Principal IP Development Engineer at Xilinx. His focus is on the development of linear algebra intellectual property models for DSP applications. Prior to Xilinx, Ramon worked in design and development of DSP algorithms and applications at Bell Laboratories, Motorola and Tellabs. He received a BSEE degree from the University of Illinois at Chicago, and a MSEE from Stanford University. He can be reached at [email protected] .
Tom Cesear is a Principal Engineer with the DSP Engineering Group managing the North American IP development team at Xilinx. Prior to Xilinx, Tom was the Chief Scientist and Director of IP at AccelChip. He has more than 20 years of entrepreneurial experience in all phases of marketing, strategic business development, and high-speed DSP design at companies such as bit-tru, Centerpoint Broadband Technologies, Mentor Graphics, dQdt and Hughes Aircraft Company. Dr. Cesear holds a Ph.D. and M.S. in Electrical Engineering from UCSD, and a B.S. in Electrical Engineering from Purdue University. He can be reached at [email protected].
Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.