Comparing Fortran 90 and C++ Valarrays

By Shyam Bhat and B. Arun, March 01, 1999

Template class valarray was added to the Standard C++ Library to attract numerical programmers. It appears to be working.

March 1999/Comparing Fortran 90 and C++ Valarrays

Introduction

Finite Element Analysis is extensively used in Engineering and Scientific applications, and routinely generates very large systems of equations. These are usually solved using a variation of Gaussian Elimination, and almost always in Fortran. We have tested an alternative implementation using the C++ STL (Standard Template Library), based on slices and valarrays. In this article we compare the solution times between C++ and Fortran 90. We found C++ to be faster.

Numerical Background

In the FEM (Finite Element Method), the system is modeled using a stiffness matrix formulation. This matrix represents the relationship between displacements at nodes of elements, and applied loads at these nodes, based on properties of elements. A variety of 2-D and 3-D elements may be used in combination to model the system. Based on the modeling, up to six equations are generated per node for static analysis. With thousands of nodes, the total number of equations runs into tens of thousands or more.

These equations are solved using variants of the Guassian elimination method. The unknowns being solved for are the node displacements, given a set of applied loads.

A system of n equations can be expressed as a single matrix equation:

v * disp = rhs

where v is an n x n square stiffness matrix; disp is an n-length vector of unknown displacements; and rhs is an n-vector of applied forces. We consider the cases (which are the majority) where v is symmetric, and banded. A banded matrix is one for which all non-zero elements are found in diagonal bands near the main diagonal. Due to symmetry, only the upper triangular matrix must be stored; and due to bandedness, only a band portion of this triangular matrix must be stored, since the zero valued items beyond the bands do not contribute to the solution.

We use a variation on the technique for storing banded matrices. We store v as a one-dimensional vector, made up of sub-vectors. Each of these sub-vectors is a taken from a column of the matrix, starting from the first non-zero element of each column, down to the diagonal. (This method is known as "Skyline" storage. Interested readers may refer to extensive literature [1-5].)

Matrix v can be decomposed as L * diag * Lt, where L is the lower triangle, diag is a matrix whose diagonal is the same as v's and whose other elements are zero, and Lt is the transpose of L. The matrix L will also be stored using the skyline storage method; in fact the elements of v will be replaced by those of L after decomposition. Once the decomposition has been carried out, the solution for a given force can be carried out by a back substitution process. Back substitution is particularly easy since L is triangular. However, accessing the elements of L is complicated by L's representation as a one-dimensional vector.

The pseudocode for the decomposition algorithm appears in Figure 1. The pseudocode for the back substitution used in solving L * diag * Lt * disp = rhs is shown in Figure 2. (Note that the pseudocode assumes that arrays start at index 1, not index 0 as in C/C++.) The Fortran 90 and C++ implementations are described below. The Fortran and C++ source code is available on the CUJ ftp site in this month's download (see p. 3 for instructions).

Fortran 90 Implementation

We implemented the algorithm in Fortran 90 with the Microsoft Fortran PowerStation v. 4.0 compiler, and we used most of the new Fortran facilities provided. We performed all computations using double precision. Some of the salient features are as follows:

We used modules with allocatable arrays for the matrices. This also localized the storage and access of vectors.
We used pointers, to minimize copying and extraction. Both scalar and array pointers have been used.
We used slice arrays for compact/optimized numerical operations. Thus, in subroutine ldldecom, we use

      ptwk1vc = ptsk1vc / diag(nj1 : j-1);

ptwk1vc

ptsk1vc

nj1:j-1

We used intrinsic functions, such as dot_product, with slice vectors, for very compact and optimal inner loops (where computation effort is 90% of total).
We stored the matrix v as a one-dimensional vector, indexing it via two auxiliary integer vectors. Vector nrow locates the starting row of each column (as per the skyline storage);

Vector ncol locates the position in v of each column start row. The mapping of (i,j) is v[ncol[j] + i - nrow[j]].

We used full optimization on the Fortran compiler, with maximize speed option, under the release compilation. The use of slices provides for an elegant notation in Fortran, which is unmatched by C++. However, the overall organization of Fortran leaves much to be desired for this task.

C++ Implementation

We used the Microsoft Visual C++ 5.0 compiler, with full optimization and maximize speed enabled.

The C++ STL provides valarrays to facilitate numerical computations. Valarrays are supposed to support aggressive optimizations (which is, of course, compiler dependent) for numerical computations. Valarrays also support use of slices, so we were able to simulate use of slices in Fortran. Together with C++ classes, this provided for elegant and optimal coding.

The two-dimensional stiffness matrix v is stored as a one dimensional valarray<double> denoted as v. The diagonal of the matrix is stored in a separate valaray<double> as diag, for faster access. Both v and diag use column major order. The start of each column is stored in a vector<Sliceiter> of slice iterators, to provide faster access to individual columns. This storage method also takes advantage of the jagged (skyline) shape of the matrix.

We have implemented slice iterators (Sliceiter) along the lines of STL iterators, albeit in simpler fashion. Since the stride is always one, the iterator stores only the start position. A reference to the valarray<double> representing v is also stored in the iterator, for fast lookup. The use of Sliceiter provides a clean interface to the user, hiding the details of mapping the two-dimensional matrix to the compact skyline storage scheme used.

Typical Program Flow

The program flow is as follows

1. Read number of columns — ndf.

2. For each column, read number of elements per column and a list of the elements in the column. These are read into a vector<double>.

3. Compute the size of the valarray v, based on cumulative column sizes.

4. Allocate the main array, and set up the auxiliary Sliceiter array for each column.

5. Perform forward decomposition of the matrix. This step overwrites v so as to conserve space.

6. Read in any number of force vectors, and solve for displacements. These displacements are overwritten on the force vectors.

Comparison between Fortran and C++

All programs were run on a Pentium 150 MHz PC, with 48 MB of EDO RAM, and 100 MB of free disk space, on Microsoft NT Workstation 4.0. Time was measured using the _ftime runtime functions. We tested four problems, and found that the C++-based program typically ran 5% faster than Fortran. This could definitely be improved further. The solution times are in milliseconds, and are shown for the decomposition and back substitution functions only (see Table 1). Typically, back substitution takes only 3% of the time needed for decomposition. The length of v is also shown. The length is greatest for Problem 1, consisting of 520,000 items. Since we used double precision, this required 48 MB of memory. We have restricted problems to this size so as to avoid any disk paging by the OS, which would lead to incorrect evaluation of performance.

Conclusions

C++ is well suited for FEM solution algorithms. Very elegant implementations are possible, which are also faster. The entire FEM modeling process, including element representation, equation generation, solving, and storing results, involves object-oriented processes. FEM lends itself to C++ very well. It is time to make a shift from Fortran to C++, both to take advantage of the latest development systems, and for the sake of easy maintenance and upgrading of code.

References

[1] A. Jennings. "A Compact Storage Scheme for the Solution of Symmetric Linear Simultaneous Equations," Computer Journal #9, 1966, pp. 281-285.

[2] E. L. Wilson, K. J. Bathe, and W. P. Doherty. "Direct Solution of Large Systems of Linear Equations," Computers and Structures #4, 1974, pp. 363-372.

[3] R. J. Taylor. "Solution of Linear Equations by a Profile Solver," Engineering Computations #2, 1985, pp. 344-350.

[4] E. L. Wilson. "Static Condensation Algorithm," International Journal for Numerical Methods in Engineering #8, 1974, pp. 199-203.

[5] P. V. Marcal (editor). "General Purpose Finite Element Computer Programs," Proceedings of Seminar of the A. S. M. E., Winter Annual Meeting, New York, November 1970.

[6] Bjarne Stroustrup. The C++ Programming Language, Third Edition (Addision-Wesley, 1997). See Chapter 22, "Numerics."

Shyam S. Bhat has a Bachelors Degree, and ten years' experience in the Civil Engineering field. He has six years' experience as a software developer, specializing in computer-based analysis, design, and engineering drawings. He has six years' experience in C++ and 18 years' in Fortran.

B. Arun has a Masters Degree in Civil Engineering. He has had two years' experience in C++ and six years' in Fortran.

Both authors work for IMI Software in Hyderabad, India.

1 2 3 4 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Comparing Fortran 90 and C++ Valarrays

Introduction

Numerical Background

C++ Implementation

Typical Program Flow

Comparison between Fortran and C++

Conclusions

References

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Comparing Fortran 90 and C++ Valarrays

Introduction

Numerical Background

C++ Implementation

Typical Program Flow

Comparison between Fortran and C++

Conclusions

References

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content