Visualizing Scientific Data
Will Schroeder
There's a real difference between data and information. Visualization is one way to help map the former to the latter.
Introduction
For more than a quarter century, scientists and engineers have relied on computers to perform research and development. There are two major reasons for this. First, computers can simulate physical processes or validate hypotheses, without the time and cost needed to build experimental apparatus or measurement devices. Examples of this include modeling aircraft aerodynamics, designing mechanical structures, or simulating the origins of the universe. Second, computers can rapidly measure physical phenomena at high levels of detail. Radar mapping of the Earth's surface, laser digitizers to capture surface geometry, and CT and MRI imaging are all examples of methods to measure physical data.
In either case, applying computers to scientific research results in very large datasets. These datasets range anywhere from megabytes to terabytes of data. For example, the Earth Orbiting Satellite (EOS) will generate terabytes of data every day while it is in operation. CT and MRI scans can create datasets on the order of 5123 sixteen-bit values, for a total of 2.14 gigabytes of data. Numerical simulations of fluid flow, especially time-varying flow, often generates hundreds to thousands of megabytes of data.
As you can imagine, digesting data of this size is no easy task. Many techniques are being developed to address this problem, including methods in artificial intelligence, data mining, and visualization. In this article, I focus on methods in visualization to treat large datasets. I describe how visualization systems are architected and applied, and the type of data they treat. I demonstrate these ideas using the freely available Visualization Toolkit, a C++ class library that runs on UNIX and PC platforms.
What is Visualization?
An informal definition for visualization is that visualization is the process of transforming data to pictures. Visualization may also mean transforming data into other sensory stimulus, such as sound or touch. But the most common application is to create images for the human visual system, since images and animation are very powerful ways to communicate large amounts of complex data.
Visualization is often referred to as scientific visualization when it is applied to scientific data; data visualization when applied to more general types of data such as business or financial data; and information visualization when applied to data lacking structure, such as a text document or database. In each case, the techniques and general approach are similar, but the form of the original data varies.
Normally when we talk about visualization we mean 3-D (or higher-dimension) displays. So although x-y plots, bar charts, pie charts, and methods in computer imaging are part of visualization, we are focusing on 3-D techniques such as streamlines or isosurfaces.
Features of a Visualization System
You've probably already guessed that a necessary component of visualization systems is a 3-D graphics subsystem. In fact, a working definition of visualization is the transformation of data into graphics primitives (for example, lines or polygons), with associated data such as color and texture. So every visualization system is necessarily a graphics system as well. A major difference between pure graphics software and visualization systems is the emphasis on rendering realism. Most of the 3-D graphics you see today is oriented towards photorealistic rendering, at the expense of system interac- tivity. In contrast, visualization systems stress information display and interactivity, since these are the features that best enable exploration of large datasets.
Note also that my definition of visualization uses the word transformation. This word is important because it describes the process of ingesting data, filtering it, and eventually creating graphics primitives suitable for display. This leads us to another key feature of visualization systems their flexibility.
Because visualization systems must deal with many different types of data, they are designed to be flexible. As a result, most visualization systems today use a variation of a data flow network (called visualization networks). These networks consist of objects to represent data, and objects, or filters, to operate on the data. The networks are created by connecting these data and filter objects together, often in elaborate and complex combinations, which work together to create a desired result.
The advantage of the programmable network approach is that visualization applications can be rapidly created to adapt to new data forms; in fact, many commercial systems such as AVS, Iris Explorer, or IBM Data Explorer, provide a graphical user interface to select, connect, and execute visualization networks interactively.
The Object Model
To clarify these concepts, I turn to the Visualization Toolkit (or vtk), which is a freely available C++ class library for graphics and visualization. (See the web site http://www.cs.rpi.edu/~martink/ for the code and more information.) This system is portable across UNIX and Windows 95 and NT systems, and will take advantage of 3-D graphics cards if you have access to one. Since this system is quite large (100,000+ lines of C++ code), and comes with some large datasets, you may prefer to obtain the CD available with the associated textbook (see references at end of article).
Vtk uses an object model to represent the components in a 3-D scene. Vtk uses actors, lights, and cameras to represent the objects, lighting, and view for a particular scene. It uses a renderer to coordinate the drawing of the scene, and draws into a rendering window. Vtk uses a property object to represent actor properties such as color, transparency, and lighting (ambient, diffuse, and specular effects). The actual geometry of the actor is represented by a mapper, which has the function of mapping data through a color lookup table, and interfacing the geometry to the rendering library on your system. (For example, on PCs OpenGL is used as the rendering library. The rendering DLLs are bundled with the vtk1.0 and vtk1.1 distribution.)
Listing 1 shows the C++ code required to draw a sphere. We start by creating some graphics objects. The object vtkRenderMaster will construct the rendering window with the MakeRenderWindow method. The vtkRenderMaster class serves the important purpose of ensuring device independence. It checks the environment variable VTK_RENDERER for a renderer (e.g., graphics library) type, or if not found, it uses a default type. As a result, the same code will run unchanged on different hardware or with different graphics libraries.
Another useful graphics object is vtkRenderWindowInteractor. You can think of this object as a 3-D widget. It captures mouse and keystroke events in the rendering window, and enables interactive rotation, zoom, and pan of the camera around the scene.
After the graphics objects are instantiated, the program constructs a simple visualization network from a sphere (vtkSphereSource) and a mapper (vtkPolyMapper). Note that the SetInput and GetOutput methods are used to connect the objects together in the network. The strong type checking of C++ comes into play here, allowing only filters of the correct output/input type to be connected together.
A program associates the mapper with the actor via the SetMapper method. This example is completed by associating the actor with the renderer aren, setting background color, generating an initial image with the Render method, and starting the interactive renderer's event loop. (Note that in this example a light and camera are automatically created by the system, since neither was specified.)
To show how easy it is to modify the visualization network, I will modify Listing 1 by inserting a shrink filter between the sphere and the mapper. This results in shrinking the polygons defining the sphere geometry, resulting in a "tiled" effect. The modified code is shown in Listing 2 and the resulting image appears in Figure 1.
How does vtk control execution of the pipeline? When the Render method is invoked by the rendering window, a synchronization process begins. In vtk, the process is based on comparing an internal system time against each object's last modified time. Objects that have been modified since the last render time update themselves. This process may result in a propagation of execution throughout the visualization network. For example, if an internal instance variable is modified (e.g., the sphere's SetRadius method), the next render will result in re-execution of all filters downstream of the filter.
Visualization networks are made up of three different types of filter objects sources, filters, and mappers. Source objects (like our sphere in the previous example), generate data or read data from a file or other I/O port (such as a socket). As a result, source objects have no inputs to hook to other visualization network objects, and have one or more outputs. Filters accept at least one input and generate at least one output. Mappers take one or more inputs and terminate the visualization network. In the previous example, vtkSphereSource is a source object, vtkShrinkPolyData is a filter, and vtkPolyMapper is a mapper object. It is often useful to sketch the connections between objects to view the topology of the network. Figure 2 is an example of such a sketch for the visualization network from the code in Listing 2.
Along with filters, visualization networks contain data objects to represent different types of data. In vtk, collections of data are called datasets and consist of an organizing geometric structure and attribute data. The geometric structure, for example, might contain points and polygons, and forms a relationship between neighboring data points. The attribute data consists of data associated with the geometric structure, such as scalars, vectors, tensors, texture coordinates, and surface normals. In general, attributes modify the appearance of an object by application across its entire structure. For example, scalar data mapped through a color lookup table will be rendered as colors on the surface of the actors.
The Visualization Toolkit has five dataset types. These are polygonal data (points, lines, polygons, and triangle strips), structured points (pixmaps, volumes of regularly sampled data), structured grids (topologically regular grids such as those found in finite difference analysis), unstructured grids (irregular sets of points and data cells), and unstructured points (points located irregularly in space without any topological relationship). These types cover many of the datasets found in scientific and engineering computing. One interesting point about the C++ implementation of these datasets: each type is a derived class of the abstract class vtkDataSet. Consequently, each dataset class behaves uniformly and can be treated in a general manner by the filters in the system.
Visualizing Surface Normals
Now that I've got the basics down I can show something more interesting. In the next example I'll demonstrate a visualization technique called glyphing. Glyphing is a powerful technique which can be used to visualize many types of data: we need only construct a glyph and attach data values to the features of the glyph. In the next example I use a cone glyph to indicate the direction of surface normals.
Listing 3 shows the C++ code used to implement this example, and Figure 3 illustrates the visualization network. As you can see, we have a pipeline consisting of two separate branches. The left branch reads the data file, in this case a Cyberware laser digitizer file, and computes surface normals. The right branch creates a cone to be used as a glyph. The branches join at the vtkGlyph3D filter, which takes the input points and normals from the digitized surface, and positions and orients cone glyphs at selected points. To avoid the clutter of drawing a glyph at every point on the surface, I use the filter vtkMaskPoints to randomly select input points. Also, the cone should be rotated about its base point rather than its center, so I use a transform filter to translate the cone slightly. Figure 4 shows the resulting image, which combines the glyphs plus the original digitized surface.
Visualizing Medical Data
One of the most exciting applications of visualization is in medical imaging. In the next example I show a simple application to read a CT dataset and generate a view of the skull.
CT is a technique that measures X-ray attenuation in a volume of data. The volume, which is really a regular lattice of discrete points, is represented as a series of planes stacked on top of one another. Each plane is essentially a 2-D X-ray image. To create a 3-D skull surface, we gather these slices and use an iso-contouring algorithm to generate a surface of constant X-ray intensity. Like 2-D contours you see on a weather map, 3-D contours represent a regions of constant data value. But instead of generating contour lines, this technique generates 3-D contour surfaces (i.e., isosurfaces).
Listing 4 shows the C++ code used to generate the isosurface. Here I use a reader to read 16-bit medical data, a filter (vtkMarchingCubes) to generate the isosurface, and a mapper to couple with the graphics system. Figure 5 shows the resulting image.
The algorithms used to create the visual displays are of special interest to visualization researchers. The marching cubes algorithm used in the previous example is a widely known contouring technique. Many other contouring methods have been developed as well. One advantage of using a programmable visualization network is that you can quickly implement new algorithms, and then try them out by plugging them into the network.
Conclusion
Visualization is a useful, fun, and exciting way to view data. C++ plays an important role in implementing visualization tools because of its ability to abstract data, provide plug-in modules or filters, and specify uniform interfaces to objects.
In the future, 3-D visualization and graphics will play a greater role in computing systems. You can see this already with the arrival of 3-D graphics boards and rendering libraries for PCs. Also, graphics and visualization software is becoming more widespread, and is much easier to use than even a few years ago.
If you are interested in finding out more about visualization, I recommend the first reference listed below. You may also wish to try out some commercial systems, such as given in references [2] , [3] , and [4] .
References and Information Sources
[1] W. Schroeder, K. Martin, and W. Lorensen. The Visualization Toolkit An Object-Oriented Approach to 3-D Graphics. Prentice-Hall, 1996. ISBN 0-13-199837-4.
[2] C. Upson, T. Faulhaber Jr., D. Kamins and others. "The Application Visualization System: A Computational Environment for Scientific Visualization." IEEE Computer Graphics and Applications, 9(4):3042, July, 1989.
[3] Iris Explorer User's Guide. Silicon Graphics Inc., Mountain View, CA, 1991.
[4] Data Explorer Reference Manual. IBM Corp, Armonk, NY, 1991.
[5] vtk is available on the web at http://www.cs.rpi.edu/~martink/.
[6] Source code that implements OpenGL is available via ftp at iris.ssec.wisc.edu.
[7] OpenGL man pages are available at http://www.digital.com:80/pub/doc/opengl/.
[8] The OpenGL spec is available at http://www.sgi.com/Technology/openGL/glspec/glspec.html.
William J. Schroeder is a computational scientist at GE's Research & Development Center. He has designed the object-oriented VISAGE visualization system used throughout GE. Will's contributions to the visualization field include the decimation polygon reduction algorithm, the stream polygon for vector/tensor visualization, and swept surfaces for motion representation. He has recently completed the text "The Visualization Toolkit: An Object-Oriented Approach to 3D Graphics" along with co-authors Ken Martin and Bill Lorensen. Dr. Schroeder received a BS in mechanical engineering at the University of Maryland, and MS and PhD in applied mathematics at Rensselaer Polytechnic Institute.