Ray is a technical editor for DDJ and can be reached at 501 Galveston Drive, Redwood City, CA 94063.
BioComputing refers to biologically inspired approaches to creating software. In recent years a number of these techniques and technologies have emerged. Some are proven and out in the market, others are still being nurtured in research labs. Together they hold great promise in allowing software developers to push the envelope of program complexity and size, and to allow for tractable solutions to thorny programming problems such as pattern recognition.
The most visibly success of these techniques is neural networks, also known as neurocomputing, neoconnectionism, and parallel distributed processing. Other BioComputing technologies include genetic algorithms, iterated function systems, fuzzy logic, simulated annealing, fractal systems, cellular automata, L-systems, classifier systems, and chaotic dynamics. To our knowledge, this is the first time they have been grouped in this way. Yet, as we learn more about them, the interconnections become more clear.
In some cases, the techniques are related at a deep theoretical level--for example, chaotic dynamics, fractals, L-systems, and iterated functions stand on common mathematical ground. In other cases, the techniques exist in their own distinct territory (that is, fuzzy logic), but still maintain some connection to the natural and biological world.
The Need for BioComputing
The 1980s was the decade in which real-time networked systems, GUIs, and distributed applications appeared en masse, with program size and complexity taking a big leap. It was during this time that many of us encountered the limits of existing software technology.
For example, the ten million lines of code in the Space Shuttle software system came to a screeching halt during one launch countdown when an error, caused by a missing comma in one program, was encountered. Last year, the entire AT&T phone network was immobilized for a day when it encountered an anomalous condition in the multiprocessor software.
Closer to home, many of the leading companies in the PC software industry have been embarrassed by long-delayed and/or bug-ridden programs (such as dBase IV 1.0, Lotus 1-2-3/G, OS/2, Windows, Macintosh System 7, and so on). The bigger they are, the harder they fall, and the longer it takes.
It's no surprise that computer scientists have been looking for ways to build software systems that are less brittle, more reliable, robust, and flexible, yet still allow for high levels of functionality. Many directions have been explored, from OOP to CASE to AI to Steve Jobs's fresh-squeezed orange juice and high salaries for small, dedicated teams. One exciting set of approaches falls under the rubric of BioComputing.
Mother Nature Knows Best
Lest anyone misunderstand, the rationale behind BioComputing is not to create big programs. In fact, most current implementations are modest in size and resource consumption. Rather, the rationale comes from the fact that, as the number of lines in a conventional program increases, its complexity approaches that of a small bioorganism.
Living systems are intricate structures made out of simpler components (cells), with high degrees of redundancy, fault tolerance, and adaptiveness. Incredibly detailed and complex biostructures arise from what must be very small sets of rules. The engineering specifications for a Boeing 747 are equal in size and volume to the aircraft itself. The specifications for a six-foot human being (that is, our DNA genetic code) fit into a container less than a millionth of a cubic inch in size, and human beings are orders of magnitude more complex than a 747. Moreover, unlike the Space Shuttle, we don't come to a screeching halt when there is a misplaced comma in our DNA sequences (in general).
Hans Moravec, a computer scientist and robotics researcher at CMU, has tried to compare the processing power of natural organisms to artificial CPUs. In his view, one MIPS is roughly equivalent to 100,000 neuronal cells in the brain. In terms of both storage capacity and processing speed, then, a Macintosh computer is roughly equivalent to a snail, and a Cray2 is at the level of a small rodent.
We can extend Moravec's hardware comparison to software. The DNA sequence for a human being is about six billion bits long, and represents the engineering specification for the human body. This sounds like a lot of information, but is surprisingly small when compared to contemporary software packages. Our DNA is a package of code and data only 1000 times larger than the information content of products like dBase IV or Lotus 1-2-3. Clearly, nature's programmers are not using C or assembler.
Actually, the effective information in DNA is much less than 6 billion bits, because only 5 percent of our DNA is used as data by the human body. This active DNA is about 40 Mbytes worth of information -- the size of a PC/AT disk drive. The other 95 percent is so-called "junk DNA" or "introns," which contain information not used by the target system. Some of this data can be seen figuratively as "stop bits" or parity bits--information used internally for error correction and preservation of the genetic signal. Other parts of the junk DNA represent outdated information that was applicable further back along the evolutionary timeline--in other words, on obsolete hardware platforms. Clearly, nature's programmers do know how to use #ifdefs to comment out code that is no longer needed.
Neural Nets
Artificial neural nets are loosely based on current theories of how the natural brain works, that is, through interconnections of neuronal cells. In reality, the specifics of mammalian brain function continue to elude researchers, but this has not stopped neural net researchers from making great strides in neural net models and implementations.
The application areas of neural nets are broad and surprisingly widespread, although the principal task is pattern recognition. One of the earliest neural net models, Bernard Widrow's adaptive linear element (Adaline) has been in use since 1959 as adaptive hardware filters to eliminate echoes on phone lines. In recent years artificial neural nets have been used for pattern recognition, image processing, compression, speech synthesis, natural language processing, noise filtering, robotic control, and financial modeling.
A key advantage of neural nets over conventionally written programs is the way in which nets imitate the brain's ability to make decisions and draw conclusions when presented with complex, noisy, irrelevant, and/or partial information. As computer applications broaden to include hand-held devices that process handwriting and voice input, the neural net's ability to make sense of messy real-world data becomes more critical.
Another advantage is that neural net application is not a handcrafted program, but rather the result of feeding training data to a net model, which then learns to output the desired results. Once a net has been trained, it will also be able to deal with input that is different from what it has been trained with, as long as the input is not too different. This is a big advantage over conventional software, which must be specifically programmed to handle every anticipated input. Presumably, this will allow us to avoid "missing comma" fiascos like the aborted Shuttle launch.
Fractals and Iterated Function Systems
Fractal shapes are a class of object that results from the repeated evaluation of a certain simple mathematical function to produce a complex shape with infinite level of detail. (Actually, that is a loose definition. More precisely, Mandelbrot's definition of a fractal is "a set whose Hausdorff dimension is not an integer" -- a fractional dimension.)
Although discovered by a mathematician, fractal techniques have proved invaluable in rendering computer graphic images of natural objects with uncanny detail and texture. In recent years, the deep interconnections between fractal objects, iterated function systems (IFS), and chaotic dynamics have begun to be plumbed, demonstrating results in both the biological and natural sciences as well as the computer sciences.
In the natural sciences, fractal methods have proved a useful tool for visualizing the chaotic dynamics of nonlinear systems. Chaos science, also known as nonlinear dynamics, is a mathematical modeling technique used to represent the complex behavior of feedback-based systems, which can range from the human heart to global weather systems to the interactions between mammalian neurons to the movements of planets.
Perhaps the most practical use of fractals has been in the PC industry, with Michael Barnsley's use of IFSs for data compression of scanned images. IFSs are a compact way of representing a subclass of fractals, those that can be partitioned into a number of tiles. Barnsley's technology uses the iterative tiling characteristics of IFSs to obtain dramatic compression ratios of 500:1, or in some cases, as much as 10000:1.
L-Systems
Although fractal systems of equations can produce images of natural objects like trees or mountains with an uncanny resemblance, it's hard to see the direct connection between the iterative function Z2 + C --> z and a natural organism. The discoverer of the set defined by that function, Benoit Mandelbrot, would have no quarrel with that, because he never intended to directly model the internal workings of biological processes.
In contrast, Lindenmayer systems, or L-systems, are similar to fractals in some respects, but were created with the specific intent of modeling nature. Aristid Lindenmayer, a biologist, conceived a mathematical theory of plant development in 1968. An L-system is a set of rules that specify a repeated sequence of transformations on a starting shape. L-system rules are not unlike the BNF rules that specify the syntax of programming languages; but instead of parsing expressions, they generate data. Given the appropriate starting shape and rules, you can generate images of plant-like objects that are incredibly similar to the real thing.
As with fractals, a small amount of information can produce a large detailed object. In fact, one can say that an L-system is merely another way to represent a fractal set. But unlike conventional fractals, L-systems are used to mimic both the natural end result (such as the shape of a full-grown tree), as well as the stages of growth along the way, from twig to sapling to mature tree.
The dramatic "compression ratios" in L-systems (and IFS) provide some inkling as to how a portion of 108 bits in DNA specify the human neural network of 1011 processing elements and 1014 interconnections. No biologist has found the place inside a plant where L-system rules are stored. That was never a goal of this research. Rather, L-systems should be seen as a formal exercise in understanding how the natural processes of growth can be specified, modeled, controlled, and predicted.
More recently, computer graphics people such as Przemyslaw Prusinkiewicz (who generated the image on the cover of the magazine) have been using L-systems as a rendering technique for more natural images.
Genetic Algorithms
A biologically inspired technique that is not used in computer graphics is genetic algorithms (GA), invented by John Holland in 1975. (See the article entitled "Genetic Algorithms" in this issue for a more detailed description.)
Briefly, this approach provides programs with a means for finding a particular solution in a general search space by mimicking the natural processes of evolution, mutation, and natural selection. Although GAs are directly inspired by biological processes, in practice, the connection is rather loose. GAs are as accurate a model of evolution as an artificial neural network is a model of the brain -- which is to say, not necessarily so.
Nevertheless, a close imitation of nature is not a requirement. These problem-solving techniques are valuable in and of themselves.
There is another semirandom search technique, called "simulated annealing" (basically equivalent to GAs) that mimics the crystallization of a liquid as it cools (or the annealing of a metal as it is heated and cooled). This technique is similar to GAs in that it consists of repeatedly generating a trial solution, testing it against the desired goal, and then semirandomly mutating the solution to see if a better fit can be found. As the "temperature" cools down, the program settles into a near-optimal solution to the problem. The equivalence between simulated annealing and genetic algorithms points out how the same process can be manifested in both biological and nonbiological natural systems. (See "Simulated Annealing" by Michael McLaughlin, DDJ September 1989.)
Genetic algorithmists such as John Holland are now working on an improved technique for parallel search and optimization, called "classifier systems." Classifier systems are similar to L-systems in that they are rule-based; however, they incorporate genetic evolution and mutation. They have been proven formally equivalent to connectionist systems (artificial neural networks).
Other researchers are combining GAs with other techniques--for example, using GAs to evolve different neural net models. Or using GAs to transform and mutate conventional programs, for example, to produce a near-optimal sort program for a given set of data. Like neural nets, GAs map very easily to massively parallel hardware, which explains the serious interest of semiconductor makers (Intel, TI) in neural nets and other BioComputing technologies.
Other BioComputing Techniques
Fuzzy logic, or the theory of fuzzy sets, is a technique that is inspired by nature without intending to be a realistic model of any physical objects. Invented by Lofti Zadeh in 1965, fuzzy logic is an extension to mathematical logic that allows for "soft" values that are in between the hard values of true and false (0 and 1). The intent is to be able to deal in a meaningful way with imprecise notions or concepts that do not have exact boundaries. This does not mimic how the physical human brain works, but it does follow how the human mind seems to carry out its reasoning.
Fuzzy logic has found many practical adherents in Japanese companies. This technology is now a key element in products such as Canon autofocus cameras, Hitachi washing machines, and Nissan and Subaru transmissions, as well as in the handwriting input recognition found in the Sony Palmtop computer.
The Flip Side of BioComputing
Until now, I've focused on biologically inspired ways of creating software. But no discussion of BioComputing would be complete without mentioning its flip side: the use of computers and information science to study biological systems, which we call computational biology. Some techniques mentioned earlier (neural nets, L-systems) started out with the intent of understanding biological systems. However, these have since become more a computational technique than a biologically faithful model.
Nevertheless, researchers remain working on the flip side, studying bioorganisms as information-processing entities. For example, one author in the field of immunology makes the case that the immune system, which is a complex network of interacting elements distributed throughout the human body (such as the three-foot long filament of human DNA), can be viewed as a cognitive process, an entity capable of parallel distributed search, pattern recognition, and associative memory, not unlike an artificial neural network or genetic classifier system.
Computational biology has had its largest impact in the field of medical molecular genetics, although researchers there would not use this phrase to describe their work. Given that bioorganisms are the result of "executing" biochemical programs stored on a digitally encoded tape (the long sequences of genetic DNA), medical researchers have discovered that some diseases are quite literally data transmission errors in the genetic signal. The fatal hereditary disorder cystic fibrosis involves a single change in a data value, like the missing comma in the Space Shuttle software. Like that missing comma, this fatal data error can be detected, corrected, and the program restored to a healthy state.
Naturally, better tools are being built to improve on these old-fashioned methods: automatic computer-controlled DNA sequencers ("disassemblers"), polymerase chain reaction machines ("digital tape duplicators"), molecular design workstations ("CASE tools"), and so on. There are of course serious bioethical issues that need to be addressed here, as well as profound technical challenges. And who knows, maybe 30 years from now Borland or Multiscope will come out with an interactive debugger for home-brewed life-forms, complete with read/write device for DNA sequences.
Copyright © 1991, Dr. Dobb's Journal