Making a slow program fast can lead to both joy and frustration. Frequently, the best you can do is a low-level trick to double or maybe quadruple the speed of a program; for instance, many readers may have implemented John Conway's "Game of Life" using bit-level operations for a significant speedup. But sometimes a whole new approach, combining just a few ideas, yields amazing improvements. A simple algorithm called "HashLife," invented by William Gosper ("Exploiting Regularities in Large Cellular Spaces," Physica 10D, 1984), combines quadtrees and memoization to yield astronomical speedup to the Game of Life. In this article, I evolve the simplest Life implementation into this algorithm, explain how it works, and run some universes for trillions of generations as they grow to billions of cells.
Martin Gardner's columns on Conway's Game of Life, published in Scientific American from October 1970 onwards, inspired a whole generation of programmers; an amazing world of two-dimensional lifeforms springs from just a few simple rules that are easy to program and display. The Game of Life (Figure 1) is a solitaire game played on an infinite two-dimensional grid. Each cell in the grid is either alive (1, denoted by a black circle) or dead. Every living cell that has two or three living neighbors (of the eight immediately adjacent) remains alive, otherwise it dies. Any dead cell surrounded by exactly three living neighbors is a birth cell and is born into the next generation. All the rules are applied to all the cells at the same instant. These rules lead to life forms that are stable and oscillategliders and spaceships that transport themselves across the universe, life forms that grow without bound or completely die away. Seed patterns with as few as nine initial living cells can mutate for many thousands of generations, with constantly moving areas of chaotic activity spewing forth gliders among a large stable menagerie.
Figure 1: Some Life examples. On the left is a stable loaf pattern. In the upper left are the two phases of the blinker, an oscillator with period 2. In a diagonal line from the lower left to the upper right are the four phases a glider goes through as it traverses the universe.
The simplest implementations of Life use a pair of two-dimensional arrays representing the state of a finite portion of the universe. In every generation, the neighbors of a cell in the old array are counted to calculate the state of that cell in the new array, then the arrays are swapped and the screen is updated. This approach has a number of frustrating aspects: The larger the universe, the slower the program runs; yet, a smaller universe bounds the growth of patterns, constraining them artificially. My first improvement to the basic algorithm is to replace this finite universe with an unbounded one; at any given time it is finite, but you can increase its size as necessary. I use a simple tree representation, called a "quadtree" (Figure 2).
Figure 2: A quadtree representation of a 4×4 cell containing a Loaf.
Each node of the tree represents a square portion of the universe. The leaves of the tree are single bits, either 1 (alive) or 0 (dead). Each nonleaf node represents a larger square composed of four children, who are named by their direction from center: nw
is the name of the northwest smaller square, ne
the northeast smaller square, and so on. The level of a node is the distance to the leaves; the leaves are at level 0. A node at level n
thus represents a square of size 2n on each edge.
The straightforward way to implement the Life algorithm is to write a recursive function that takes a tree at a given level and either updates the tree in place or returns an entirely new tree. I choose to return an entirely new tree. If I could return a tree at the same level, the algorithm would be very simple. Unfortunately, this is not possible without some additional information because the neighbors of that node also influence the calculation of the next generation for the border cells. One approach to solve this would be to pass in the neighbor nodes at the same level; another is to maintain neighbor node pointers. Instead, I choose a simpler approach: The result of the recursive function is a node one level down and centered with respect to the original node. For instance, the function takes a node at level 2, representing a 4×4 square of the universe, and return a node at level 1, representing the 2×2 square you can directly compute from the information in that 4×4 square. Similarly, it takes a node at level 5, representing a 32×32 square of the universe, and return a square at level 4, representing the central 16×16 pixels that are one generation advanced. This decision lets you use a completely functional approach on an immutable data structure, assuming you enjoy such things.
Most recursive algorithms working on trees simply recurse on the children and somehow combine the result. If I do that here, using code like Listing One, there will be gaps between the squares calculated, as in Figure 3. Something more complicated is called for. I cannot use the existing children directly in the recursive call; instead, I must create temporary children that are shifted in place, to make sure I get a result square aligned the way we require.
Listing One
class Node { Node nextGeneration() { if (level == 2) { ... do base case through normal simulation ... } else { return new Node(nw.nextGeneration(), ne.nextGeneration(), sw.nextGeneration(), se.nextGeneration()) ; } } }
Figure 3: Why a simple recursive algorithm will not work. The black outer square represents the area of a node; the four red inner squares are the subnodes. I want to calculate the area of the blue inner square in the next generation, but the inner squares of the subnodes are the green squares, which cannot be combined to form the blue inner square because they do not overlap it.
I construct nine new nodes two levels down that are shifted appropriately. To construct these nodes two levels down, I actually access nodes that are three levels down to compose the new ones. From these nine new nodes, I create four new nodes that are one level down. It is these four new nodes that I recurse on. This gives me result nodes at the appropriate locations so I can combine them into the required output node at the right level (see Figure 4).
Figure 4: A solution to the gap problem: Construct new subnodes from the components of the existing node to calculate the squares we actually need. Again, blue is the area I wish to calculate; I split it into four green nodes. I compute these four blue nodes by building subnodes out of various combinations of the nine red subnodes that can be constructed from subnodes of the original node.
I do make a few minor concessions to practicality in Listing Two. Any tree that I know is completely empty, I construct by sharing nodes; see Figure 4. When I calculate the next generation of any node, I first check to see if it is empty and immediately return an empty subtree if so.
Listing Two
Node centeredSubnode() { return new Node(nw.se, ne.sw, sw.ne, se.nw) ; } Node centeredHorizontal(Node w, Node e) { return new Node(w.ne.se, e.nw.sw, w.se.ne, e.sw.nw) ; } Node centeredVertical(Node n, Node s) { return new Node(n.sw.se, n.se.sw, s.nw.ne, s.ne.nw); } Node centeredSubSubnode() { return new Node(nw.se.se, ne.sw.sw, sw.ne.ne, se.nw.nw) ; } Node nextGeneration() { if (level == 2) { ... do base case through normal simulation ... } else { Node n00 = nw.centeredSubnode(), n01 = centeredHorizontal(nw, ne), n02 = ne.centeredSubnode(), n10 = centeredVertical(nw, sw), n11 = centeredSubSubnode(), n12 = centeredVertical(ne, se), n20 = sw.centeredSubnode(), n21 = centeredHorizontal(sw, se), n22 = se.centeredSubnode() ; return new Node( new Node(n00, n01, n10, n11).nextGeneration(), new Node(n01, n02, n11, n12).nextGeneration(), new Node(n10, n11, n20, n21).nextGeneration(), new Node(n11, n12, n21, n22).nextGeneration()); } }
So far, my algorithm is not any better. It takes more memory and is slower. It generates a lot of new nodes that need to be managed or garbage collected. It has a somewhat complicated recursion. I fix all of these issues in two stepscanonicalization of the nodes, and "memoization."
The quadtree data structure (see Figure 5) takes significantly more space than a simple bitmap (although only by a constant factor); for a tree representing a 256×256 universe, the number of 0 leaves plus the number of 1 leaves is 65,536, and there are 16,384+ 4096+1024+256+64+16+4+1 or 21,845 nonleaf nodes. Each node represents a constant, immutable bitmap, so there is no need to have distinct nodes that share the same value. All of the 1 leaves can be represented by a single canonical 1 leaf; similarly, all nodes whose northwest quadrant is a 1 leaf and whose other three quadrants are 0 leaves can be represented by a single canonical instance of this node. This canonicalization takes place all the way to the root. Canonicalizing the nodes requires a simple hash set with the usual recurrence on trees; once the nodes are canonicalized, the value of a node is completely represented (including for comparison) by the pointer to that node, so the hash function of a node can be as simple as some mixing of the addresses of the four subnodes. This canonicalization step is similar to what the Java String intern
function does and represents one advantage of immutable structures.
Figure 5: Another Methuselah.
By itself, this canonicalization step is a powerful form of compression of space. All empty areas of the universe compress into a single small set of nodes. Common life forms, such as gliders and blocks, squeeze into commonly shared nodes. Indeed, for some life forms being constructed now, a serialized form of this quadtree provides the only reasonable storage format, as the forms are simply too large to be distributed in the run-length-encoded format in common use.
Further, the hash set providing the canonicalization can be trivially garbage collected, even in languages like C++, when desired (assuming the hash set stores pointers); simply create a new empty hash set, swap it with the current canonicalized hash set, recanonicalize from the current root, then delete all nodes that are in the old hash set but not the new one.
The canonicalization is implemented by the CanonicalTreeNode
class, where I provide the necessary infrastructure for hashing and override the factory methods responsible for creating every node instance.