Dr. Dobb's | Implementing Matrix Inversions in Fixed-point Hardware

Implementing Matrix Inversions in Fixed-point Hardware

We impliment fixed-point matrix inversion on a Virtex-4 FPGA using a synthesizable QR-decomposition MATLAB model and the AccelDSP Synthesis tool. The resulting function occupies 12% of a XC4VSX55 device and has a 1.7 MSPS data rate.

October 12, 2006
URL:http://drdobbs.com/embedded-systems/implementing-matrix-inversions-in-fixed-/193300128

Matrix inversion is an important operation in many state-of-the-art DSP algorithms and implementations, including radar, sonar, and multiple antenna systems for communications. A common component of these algorithms is a beamformer or spatial filter, whose function is to steer (in some optimal fashion) the response of an array of sensors for the reception of signal sources.

When using the least-squares (LS) criterion, the computation of optimum weights is based on the solution of a system of linear equations known as the deterministic normal equation. This is shown in the equation:

R_x w = p

Here, w is a vector of beamformer weights, which can be obtained with inversion of the correlation matrix R_x as shown in the equation

w = R_x^-1 p

From a numerical point of view, the best approach to matrix inversion is to not do it explicitly, whenever possible. Instead, it is better to solve the system of equations using an adequate solution technique.

Traditionally, implementations like this have been done with general-purpose DSP devices using floating-point arithmetic to minimize round-off error. A disadvantage of these implementations, however, is the limited processing power because of the small number of floating-point processing units commonly available per device. An appealing alternative for implementation is to use the Xilinx Virtex-4 FPGA family, which offers large amounts of parallelism. One complication with these silicon fabrics is that they are tailored for fixed-point arithmetic, and implementation in these is inherently challenging because of sensitivity to round-off error.

In this article, we'lI present an efficient methodology that enables the implementation of algorithms involving matrix-inversion operations in hardware with fixed-point arithmetic. This methodology includes three essential steps to follow in the development process:

Capturing the DSP algorithm description in the MATLAB language
Definition of the fixed-point parameters directly coupled to the MATLAB algorithm description
Automated generation of a hardware implementation that is bit-accurate to the fixed-point arithmetic model and that meets area/speed requirements for a particular application

Using this methodology, you can fully exploit the benefits of the processing power offered by implementations in FPGA or ASIC fixed-point hardware.

Beamforming and Matrix Inversion
Figure 1 shows a basic narrowband beamformer with K sensor elements arranged in a uniform linear array (ULA); this also shows a signal source s_q(t) impinging on the array at an angle of incidence q. The K beamformer weights (w₁, w₂, …, w_K) are used to linearly combine the array data observation samples (x₁(n), x₂(n), ..., x_K(n)). These are set to "steer" the response of the array for optimum reception. The output of the beamformer is the scalar y(n).

1.Narrowband beamformer

A generalized sidelobe canceller (GSC) is a special beamformer structure that allows the use of unconstrained optimization methods in the design of the optimum beamformer weights. The structure of the GSC is shown in Figure 2. To find the optimum weights w_a using the LS criterion, the following deterministic normal equation must be solved:

R_x w_a = b

Here, R_x is the correlation matrix of the input to the unconstrained section of the GSC and the vector b is the cross-correlation of the input X_a and the ideal response.

2. Generalized sidelobe canceller (GSC)

One effective technique for the solution of this equation is the recursive least-squares (RLS) approximation with QR decomposition of the input data matrix. This technique finds the solution without explicit inversion of a matrix and avoids constructing the correlation matrix, explicitly reducing the dynamic range requirements of signals involved in the computations.

Figure 3 shows the diagram of an adaptive GSC beamformer that uses a QRD-RLS algorithm for a recursive solution of the normal equation.

3. Adaptive GSC beamformerExample Beamformer Model for Hardware Synthesis
The GSC beamformer MATLAB model we used for the design includes the following features:

A ULA array of four sensor elements
A narrowband input signal of interest, impinging at an angle of 0°
A narrowband interfering signal impinging at an angle of 10°, with the same amplitude as the signal of interest
Uncorrelated white noise to model receiver noise at a level of -20 dB relative to the signal of interest

The GSC MATLAB model consists of three parts. A top-level script generates signals and displays results to analyze the performance the beamformer. The script invokes the QRD-RLS algorithm function in a streaming fashion to perform interference cancellation. Figure 4 shows an excerpt of this script.

... % combine signal, interference and noise s = s_p + s_i + DetNoise; Wc = ones(NSensors,1); d = s*Wc; % broadside array output % blocking matrix B = [eye(NSensors-1); zeros(1,NSensors-1)] - - [zeros(1,NSensors-1); eye(NSensors-1)]; s_n = s * B; % interferer and noise only % streaming loop for recursive LS computation for n = 1:NUM_ITER [e_rls(n)] = qrd_rls_spatial(s_n(n,:),d(n)); end ...

4. GSC MATLAB top-level script

The second part of the model is a synthesizable QRD-RLS algorithm function, qrd_rls_spatial(), which performs optimum cancellation of the interferer signal. This function is shown in Figure 5.

function [e] = qrd_rls_spatial(xa,d) % QRD-RLS for adaptive spatial filtering M = 3; persistent R p if isempty(R) R = zeros(1,M*M); R(1:M:end) = 0.01; p = zeros(1,M); end Ci = 1.0; lambda = sqrt(0.99); % update R matrix and p vector for row = 1:M xvec = lambda*[R(1:M) p(1) Ci]; yvec = [xa d 0]; xvec_rot,yvec_rot] = givens_rotation(xvec,yvec); R = [R(M+1:end) xvec_rot(1:M)]; xa = [yvec_rot(2:M) 0]; p = [p(2:end) xvec_rot(M+1)]; d = yvec_rot(M+1); Ci = xvec_rot(end); end e = Ci*d; % recursive error estimate

5. Synthesizable QRD-RLS function

The last part of the GSC model is the synthesizable function that rotates arrays of values to perform orthogonal Givens rotations (givens_rotation). This rotation function can be automatically generated by the AccelDSP Synthesis tool. This function is shown in Figure 6.

function [v, w] = givens_rotation( x, y ) % Givens rotation r_sqr = x(1)^2 + y(1)^2; r_inv = invsqrt_001(r_sqr); sin_phi = y(1)*r_inv; cos_phi = x(1)*r_inv; v = x*cos_phi + y*sin_phi; w = y*cos_phi - x*sin_phi;

6. Givens rotation function

The analysis and visualization of the performance of the GSC model is shown with plots in Figure 7. The signal of interest is shown on the top plot.

(Click to enlarge)
7. GSC time-domain signals

The middle plot shows the signal from the data-independent portion of the GSC. This signal clearly shows the distortion caused by the interferer signal impinging on the sensor array.

The bottom plot shows the output of the GSC after effective cancellation of the interferer done by the QRD-RLS algorithm function.

Figure 8 shows beampatterns of the GSC. The top plot shows the beampattern of the data-independent portion of the GSC. This shows that the interferer signal impinging at 10° suffers an attenuation of only 2 dB approximately relative to that of the desired signal at 0°; this small attenuation is what causes the distortion in the received signal from the broadside.

(Click to enlarge)
8. GSC beampatterns

The middle plot shows the overall GSC beampattern. The improvement in the cancellation of the interfering signal can be seen with the larger attenuation at 10°. This is what accounts for the cancellation of the interferer signal obtained at the output of the GSC.

The bottom plot is a zoomed view of the overall GSC beampattern to highlight the attenuation achieved around 10°.Definition of the Fixed-Point Model
The starting point of the design is the original "golden" reference MATLAB model of the GSC. The next step is to define a fully parameterized fixed-point arithmetic model. This model is directly coupled to the original MATLAB model to maintain lockstep with this golden reference. There are two critical aspects for efficiency in this step:

The ability to intuitively associate fixed-point parameters with variables in the MATLAB algorithm description
The ability to quickly evaluate the effects of the fixed-point arithmetic on the overall performance of the algorithm

Defining a fully parameterized fixedpoint arithmetic model is an iterative process. In the case of the GSC with QRD-RLS, the numerical performance of the implicit matrix inversion operation is measured by the attenuation shown in the overall beampattern. We evaluated several input bit-widths with the intermediate variables sized to avoid overflows—the effect on the attenuation in the beampattern is shown in Figure 9.

(Click to enlarge)
9. GSC beampattern in fixed point

We selected a 16-bit implementation that achieves an almost ideal interference rejection, as shown in Figure 9.

Generation of Hardware Implementation
The final step in our methodology is to generate the hardware implementation. There are two critical aspects to achieve efficiency:

The ability to automatically generate an implementation that is bit-accurate against the fixed-point model of the DSP algorithm
The ability to tailor the hardware architecture of the implementation to meet area/speed requirements

The generation of a suitable hardware implementation is also done iteratively to balance resource utilization and speed of operation. For the QRD-RLS algorithm, there are two points where the area/speed of the implementation can be affected:

Controlling the degree of resource sharing of the givens_rotation function
The rotation of row elements in the Givens rotation function can be achieved with different computation styles, including Newton-Raphson (using multipliers) and CORDIC (multiplier-less) microarchitectures

We performed iterations of this step exploring different combinations of resource utilization and speed of operation of the QRD-RLS function. The results of RTL synthesis, summarized in Figure 10, are for a Xilinx Virtex-4 XC4VSX55 target device.

Metric	Multiplier Parallel	Multiplier Sequential	CORDIC Parallel	CORDIC Sequential
LUTs	5%	4%	10%	9%
DSP48s	51	19	1	1
Sustainable Data Rate	1.9 MSPS	0.8 MSPS	1.8 MSPS	0.18 MSPS

10. RTL synthesis results

The results indicate a small decrease in resource utilization (LUTs) with sequential implementations. With a goal of maximum speed of operation and a minimum use of hardware multipliers, the CORDIC parallel implementation was picked for the place and route of the netlist from RTL synthesis. The results of the implementation using the Xilinx ISE software mapped to the same target device as during synthesis are shown in Figure 11.

Occupied Slices	3076 (12%)
DSP48s	1
Sustainable Data Rate	1.7 MSPS

11. Implementation results

The hardware implementation results show that the QRD-RLS function can be implemented in 12% of the logic resources of a XC4VSX55 device with a sustainable data rate of 1.7 megasamples per second.

Conclusion
You can create an efficient hardware implementation of DSP algorithms in Xilinx FPGAs using matrix inversion operations with fixed-point arithmetic. The efficiency with which you can implement these algorithms is based on the use of the AccelDSP Synthesis tool to enable a high level of automation. The results show the effectiveness of this methodology in the implementation of a challenging algorithm in fixed-point arithmetic hardware. To get more information about Xilinx DSP solutions, visit www.xilinx.com/dsp/.

About the authors
Ramon Uribe is a Sr. Principal IP Development Engineer at Xilinx. His focus is on the development of linear algebra intellectual property models for DSP applications. Prior to Xilinx, Ramon worked in design and development of DSP algorithms and applications at Bell Laboratories, Motorola and Tellabs. He received a BSEE degree from the University of Illinois at Chicago, and a MSEE from Stanford University. He can be reached at [email protected] .

Tom Cesear is a Principal Engineer with the DSP Engineering Group managing the North American IP development team at Xilinx. Prior to Xilinx, Tom was the Chief Scientist and Director of IP at AccelChip. He has more than 20 years of entrepreneurial experience in all phases of marketing, strategic business development, and high-speed DSP design at companies such as bit-tru, Centerpoint Broadband Technologies, Mentor Graphics, dQdt and Hughes Aircraft Company. Dr. Cesear holds a Ph.D. and M.S. in Electrical Engineering from UCSD, and a B.S. in Electrical Engineering from Purdue University. He can be reached at [email protected].