Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Comparing Fortran 90 and C++ Valarrays


March 1999/Comparing Fortran 90 and C++ Valarrays


Introduction

Finite Element Analysis is extensively used in Engineering and Scientific applications, and routinely generates very large systems of equations. These are usually solved using a variation of Gaussian Elimination, and almost always in Fortran. We have tested an alternative implementation using the C++ STL (Standard Template Library), based on slices and valarrays. In this article we compare the solution times between C++ and Fortran 90. We found C++ to be faster.

Numerical Background

In the FEM (Finite Element Method), the system is modeled using a stiffness matrix formulation. This matrix represents the relationship between displacements at nodes of elements, and applied loads at these nodes, based on properties of elements. A variety of 2-D and 3-D elements may be used in combination to model the system. Based on the modeling, up to six equations are generated per node for static analysis. With thousands of nodes, the total number of equations runs into tens of thousands or more.

These equations are solved using variants of the Guassian elimination method. The unknowns being solved for are the node displacements, given a set of applied loads.

A system of n equations can be expressed as a single matrix equation:

v * disp = rhs

where v is an n x n square stiffness matrix; disp is an n-length vector of unknown displacements; and rhs is an n-vector of applied forces. We consider the cases (which are the majority) where v is symmetric, and banded. A banded matrix is one for which all non-zero elements are found in diagonal bands near the main diagonal. Due to symmetry, only the upper triangular matrix must be stored; and due to bandedness, only a band portion of this triangular matrix must be stored, since the zero valued items beyond the bands do not contribute to the solution.

We use a variation on the technique for storing banded matrices. We store v as a one-dimensional vector, made up of sub-vectors. Each of these sub-vectors is a taken from a column of the matrix, starting from the first non-zero element of each column, down to the diagonal. (This method is known as "Skyline" storage. Interested readers may refer to extensive literature [1-5].)

Matrix v can be decomposed as L * diag * Lt, where L is the lower triangle, diag is a matrix whose diagonal is the same as v's and whose other elements are zero, and Lt is the transpose of L. The matrix L will also be stored using the skyline storage method; in fact the elements of v will be replaced by those of L after decomposition. Once the decomposition has been carried out, the solution for a given force can be carried out by a back substitution process. Back substitution is particularly easy since L is triangular. However, accessing the elements of L is complicated by L's representation as a one-dimensional vector.

The pseudocode for the decomposition algorithm appears in Figure 1. The pseudocode for the back substitution used in solving L * diag * Lt * disp = rhs is shown in Figure 2. (Note that the pseudocode assumes that arrays start at index 1, not index 0 as in C/C++.) The Fortran 90 and C++ implementations are described below. The Fortran and C++ source code is available on the CUJ ftp site in this month's download (see p. 3 for instructions).

Fortran 90 Implementation

We implemented the algorithm in Fortran 90 with the Microsoft Fortran PowerStation v. 4.0 compiler, and we used most of the new Fortran facilities provided. We performed all computations using double precision. Some of the salient features are as follows:

  • We used modules with allocatable arrays for the matrices. This also localized the storage and access of vectors.
  • We used pointers, to minimize copying and extraction. Both scalar and array pointers have been used.
  • We used slice arrays for compact/optimized numerical operations. Thus, in subroutine ldldecom, we use
      ptwk1vc = ptsk1vc / diag(nj1 : j-1);
      where ptwk1vc and ptsk1vc are array pointers, and nj1:j-1 represents a slice.
  • We used intrinsic functions, such as dot_product, with slice vectors, for very compact and optimal inner loops (where computation effort is 90% of total).
  • We stored the matrix v as a one-dimensional vector, indexing it via two auxiliary integer vectors. Vector nrow locates the starting row of each column (as per the skyline storage);

Vector ncol locates the position in v of each column start row. The mapping of (i,j) is v[ncol[j] + i - nrow[j]].

We used full optimization on the Fortran compiler, with maximize speed option, under the release compilation. The use of slices provides for an elegant notation in Fortran, which is unmatched by C++. However, the overall organization of Fortran leaves much to be desired for this task.

C++ Implementation

We used the Microsoft Visual C++ 5.0 compiler, with full optimization and maximize speed enabled.

The C++ STL provides valarrays to facilitate numerical computations. Valarrays are supposed to support aggressive optimizations (which is, of course, compiler dependent) for numerical computations. Valarrays also support use of slices, so we were able to simulate use of slices in Fortran. Together with C++ classes, this provided for elegant and optimal coding.

The two-dimensional stiffness matrix v is stored as a one dimensional valarray<double> denoted as v. The diagonal of the matrix is stored in a separate valaray<double> as diag, for faster access. Both v and diag use column major order. The start of each column is stored in a vector<Sliceiter> of slice iterators, to provide faster access to individual columns. This storage method also takes advantage of the jagged (skyline) shape of the matrix.

We have implemented slice iterators (Sliceiter) along the lines of STL iterators, albeit in simpler fashion. Since the stride is always one, the iterator stores only the start position. A reference to the valarray<double> representing v is also stored in the iterator, for fast lookup. The use of Sliceiter provides a clean interface to the user, hiding the details of mapping the two-dimensional matrix to the compact skyline storage scheme used.

Typical Program Flow

The program flow is as follows

1. Read number of columns — ndf.

2. For each column, read number of elements per column and a list of the elements in the column. These are read into a vector<double>.

3. Compute the size of the valarray v, based on cumulative column sizes.

4. Allocate the main array, and set up the auxiliary Sliceiter array for each column.

5. Perform forward decomposition of the matrix. This step overwrites v so as to conserve space.

6. Read in any number of force vectors, and solve for displacements. These displacements are overwritten on the force vectors.

Comparison between Fortran and C++

All programs were run on a Pentium 150 MHz PC, with 48 MB of EDO RAM, and 100 MB of free disk space, on Microsoft NT Workstation 4.0. Time was measured using the _ftime runtime functions. We tested four problems, and found that the C++-based program typically ran 5% faster than Fortran. This could definitely be improved further. The solution times are in milliseconds, and are shown for the decomposition and back substitution functions only (see Table 1). Typically, back substitution takes only 3% of the time needed for decomposition. The length of v is also shown. The length is greatest for Problem 1, consisting of 520,000 items. Since we used double precision, this required 48 MB of memory. We have restricted problems to this size so as to avoid any disk paging by the OS, which would lead to incorrect evaluation of performance.

Conclusions

C++ is well suited for FEM solution algorithms. Very elegant implementations are possible, which are also faster. The entire FEM modeling process, including element representation, equation generation, solving, and storing results, involves object-oriented processes. FEM lends itself to C++ very well. It is time to make a shift from Fortran to C++, both to take advantage of the latest development systems, and for the sake of easy maintenance and upgrading of code.

References

[1] A. Jennings. "A Compact Storage Scheme for the Solution of Symmetric Linear Simultaneous Equations," Computer Journal #9, 1966, pp. 281-285.

[2] E. L. Wilson, K. J. Bathe, and W. P. Doherty. "Direct Solution of Large Systems of Linear Equations," Computers and Structures #4, 1974, pp. 363-372.

[3] R. J. Taylor. "Solution of Linear Equations by a Profile Solver," Engineering Computations #2, 1985, pp. 344-350.

[4] E. L. Wilson. "Static Condensation Algorithm," International Journal for Numerical Methods in Engineering #8, 1974, pp. 199-203.

[5] P. V. Marcal (editor). "General Purpose Finite Element Computer Programs," Proceedings of Seminar of the A. S. M. E., Winter Annual Meeting, New York, November 1970.

[6] Bjarne Stroustrup. The C++ Programming Language, Third Edition (Addision-Wesley, 1997). See Chapter 22, "Numerics."

Shyam S. Bhat has a Bachelors Degree, and ten years' experience in the Civil Engineering field. He has six years' experience as a software developer, specializing in computer-based analysis, design, and engineering drawings. He has six years' experience in C++ and 18 years' in Fortran.

B. Arun has a Masters Degree in Civil Engineering. He has had two years' experience in C++ and six years' in Fortran.

Both authors work for IMI Software in Hyderabad, India.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.