Jerry Dwyer is a Professor of Economics at Clemson University with a Ph.D. from the University of Chicago. He primarily writes programs for statistical analyses and for real-time experiments. He has been programming for 20 years, and he has been writing in C for 10 years. He can be reached at [email protected].
Introduction
A computer-generated random number is a contradiction in terms. Random numbers are unpredictable and computers are perfectly predictable. Computers must do the same thing whenever they are in the same state. Typical random-number generators iterate a function to get a sequence of numbers. No matter how seemingly arbitrary, a sequence of numbers produced by a computer is perfectly predictable; it is not random. Hence the common term "pseudo-random numbers" arises for what I for brevity will call "random numbers."
In addition to not being random, random-number generators on computers are not trivial to implement correctly. In the process of doing some simulations, I spent quite a bit of time reading fairly technical material. In this article, I will explain the most common random-number generators. You will see some 32-bit random number generators that you can easily and comfortably include in your own code for any ANSI C compiler. The routines only assume that the maximum signed integer is at least 231-1, which is required by ANSI C. Along the way, you will see how to use approximate factoring to avoid overflow in modular arithmetic.
Lehmer Generators
The most commonly used random-number generator is what I will call the "Lehmer generator." Suggested by D.H. Lehmer in 1951, the "multiplicative congruential random-number generator" is based on the recursion:
x(t+1) = (a*x(t)) mod mwhere x(t) is the value at iteration t used to generate the next value x(t+1), a is a multiplier, and mod is the modulo function. If a*x is positive or zero and m is positive, the modulo function generates the same value as the remainder operator (%) in C. If m is positive, the modulo function is the residue
a*x - RES(a*x/m)*mwhere RES(a*x/m) is notation representing the largest integer less than or equal to a*x/m.
For evaluating random-number generators, the relationship between consecutive values is sometimes informative. Figure 1 shows graphs of x(t+1) against x(t) for integers, where x(t) can be any integer from one to six. Figure 1(a) shows the graph for a series of random numbers. It makes sense to think of a series of true random numbers as being able to take on any possible set of values. In other words, true random numbers fill up the space. In addition, it makes sense to require that the values occur with equal probability. In other words, true random numbers have a uniform distribution.
The Lehmer generator produces a nonrandom structure in two or more dimensions. For example, suppose the generator is:
x(t+1) = (5 * x(t)) mod 7The full set of values in order is 1, 5, 4, 6, 2, 3, 1, ... This sequence generates all of the values from 1 to 6. This sequence does not, however, begin to fill up a two-dimensional space. Figure 1(b) shows the values of x(t) and x(t+1) that occur. Only a few combinations actually occur, and they fall on a couple of lines. When triplets of consecutive values are plotted instead of just pairs as in Figure 1, the combinations of values fall on planes. In higher dimensions, the values continue to fall on higher-dimensional planes.
This particular structure is an inevitable consequence of the Lehmer generator. The structure does not mean that the Lehmer generator is worthless but it is suspect for an application sensitive to correlations between consecutive pseudo-random numbers. Other random-number generators have different, but discoverable, structures. For this random-number generator, the values of the multiplier and the modulus determine the number and location of the planes. The issue becomes: what values of the multiplier and the modulus generate reasonably good random numbers for a given purpose?
In addition to this particular structure of Lehmer generators, they (and any others on a computer) are repetitive. With a multiplier of 5 and a modulus of 7, the Lehmer generator has a period of 6: the sequence repeats after 6 iterations. Repetition is not peculiar to Lehmer generators. No matter what iterative function is used, a computer has a finite number of states. Sooner or later a sequence must repeat because the function iterates into a state that it has seen before.
It is fairly common to soup up the Lehmer generator by adding an increment to the product a*x(t). This increased complexity has one advantage. The Lehmer generator cannot produce a random number equal to zero. If it did, every succeeding number also would equal zero. Adding an increment makes it possible to get a random number equal to zero, thereby increasing the maximum possible period by one. This is not a compelling reason for the additional complexity, however.
Good Lehmer Generators
Even though random-number generators eventually repeat and have some discoverable structure of sequential values, there are better and worse generators. All other things being equal, longer periods are better than shorter ones. As I mentioned above, the generator:
x(t+1) = (5*x(t)) mod 7has only 6 possible values of the random numbers, and a period of 6, which is too short for most purposes. Periods on the order of 215 or 32,768, are too short for marginally serious computations, and at least some compiler vendors seem to agree. For example, Borland C++3.1 and Microsoft Visual C 1.0 both have 32-bit random-number generators with longer periods. (However, I still consider these generators unappealing, as their periods are shorter than those in this article. Besides, these compilers do not use the best known multipliers and they rely on non-portable integer overflow.)
The period before repetition is not the only consideration. In any application, you use only a small fraction of the total period. As a result, the seeming randomness of partial sequences of the random numbers is important. You know that the "random numbers" aren't random! It is important, though, that the numbers appear to be random from the viewpoint of what you're doing with them. For example, the predictability of new values of the random numbers given part of the sequence but not the multiplier or the modulus is important for cryptography. Lehmer generators are not good for this purpose because they are predictable by algorithms that know the numbers are produced by a Lehmer generator.
In order to get long periods, I consider only generators of 32-bit integers. There are many possible values for the multiplier and modulus. What are some good values? The modulus has a big impact on the calculations, so it is a natural place to start limiting the search.
One natural value for the modulus is 232. Why? Assume that multiplication is defined for 32-bit signed integers. Consider the line of pseudocode
x <- a *xThe maximum possible value of x is 2311. If signed integer overflow ignores the high bits (a standard fixup) and the sign bit is masked, this code can implement a Lehmer generator with a modulus of 232. Signed integer overflow need not have this standard fixup, however. In ANSI C, integer overflow is an exception and the behavior of the program is undefined. [You can used unsigned integer arithmetic, but often at a notable cost in performance. pjp] Furthermore, the effect of this code depends on a machine's representation of integers. You can lose portability across machines, languages, possibly compilers, and even versions of a compiler. It is easy to spend more time cleaning up such incompatibilities than ever is saved by the faster function.
Another common value of the modulus is 2311, or 2147483647. Why this seemingly odd value? It is prime, which affects the maximum period. In addition, it is the maximum value of signed integers on many machines in languages with only signed integers, such as BASIC, FORTRAN, and Pascal.
At first glance surprisingly, the generator with the larger modulus has about half the period. For the modulus 232, if the initial value is odd, then the period can be as long as 230. For the modulus 2311, the period can be as long as 2312. All of the multipliers in this article for this modulus have the maximum period, and the program that combines generators has a period not less than about 261. No need to settle for a shorter period when there are lots of long-period Lehmer generators.
What are some good values for multipliers? It might seem desirable to check on your own for good multipliers, but there is no point if you are simply writing a program that uses a random-number generator. Others have exhaustively studied both of these multipliers and others have worked on ways to write portable 32-bit generators on machines with 32-bit signed integers.
Portable Generators
It is easy to write a 32-bit routine with a modulus of 2311 that does not generate integer overflow and is portable. The basic problem is integer overflow: the product of the multiplier and the last value of the random number can be larger than a 32-bit integer. As the sidebar on approximate factoring indicates, Linus Schrage has worked out a fast way to take the modulus of a product by approximately factoring the modulus.
George Fishman and Louis Moore [1] have thoroughly studied Lehmer generators with a modulus of 2311. All of their best multipliers are too large for approximate factoring. Nonetheless, several multipliers are small enough and are almost as good by the same criteria. Table 1 shows, in rough order of preference, four good multipliers for this modulus. Among the values is the multiplier 16,807, used by many people over the last 25 years.
How can you check whether you have written the algorithm correctly? Testing the apparent randomness of the numbers from the viewpoint of your application is important for being sure that the generator is suitable. This does not check your program though. The most straightforward way to check your program is to see whether you get the correct values. In Table 1, I provide the values that I get in Mathematica and Borland C in DOS and OS/2 for the generators on the 10,000th draw starting from an initial value of one.
If there is an error in your program, the probability that the same seed will generate the correct value on the 10,000th draw is virtually zero. Because of the nonlinear relationship between consecutive values, as illustrated in Figure 1, by the 10,000th iteration of your incorrect function the likelihood that you will get the correct value probably is on the order of one over the modulus. Listing 1 shows the file rand.ma, which is a program in Mathematica that generates the 10,000th value. For me, this is a convenient check on the results from C code at the bottom of Listing 2.
Listing 2, the file rand_por.c, is an implementation in C of this portable routine for a particular multiplier. The initialization code may overflow in some compilers, but the rest of the code is portable. Whenever there is a choice between readability and speed in this code, I opt for readability. The most important consequence of emphasizing readability is the implementation of the generator as a function: genr_ rand_port.
The routines have four basic external functions. Two functions, init_rand and rand_port, are similar to the Standard C functions srand and rand. They behave as you would expect. The additional routines are get_init_rand and rand_rect_port. You can call get_init_rand to get a seed to pass to init_rand.
If you call init_rand with an incorrect value, say 0 or 1 (accidentally or on purpose), init_rand_port calls get_init_rand_port to get a seed. The return from init_rand_port provides the user with the actual seed used, important information for rerunning the generator and getting the same sequence of random numbers. Because a common correct seed is the value one, the routine throws away some initial values to get a more arbitrary first random number than the result of the first few calls after this seed: the multiplier and powers of it. Throwing away exactly 16 values is arbitrary.
The routine rand_rect_port in Listing 2 correctly generates a value between zero and one that has a uniform (also known as rectangular) distribution between zero and one. Because many uses of uniformly distributed values assume that the value is not exactly zero, the routine does not generate either zero or one.
Often it is desirable to generate non-overlapping sequences of random numbers, for example in Monte Carlo studies. Listing 2 includes the routine skip_ahead to calculate the random number that is skip values ahead of the value init_rand. From an initial random number of say 3, you can calculate the value of the random number that is generated by rand_port 4,000 or 1,000,000 iterations later.
This routine is quite a bit faster than computing the intermediate values. Doing this computation without overflow requires care because the desired output is:
(a^skip * init_rand) mod mfor all values of skip less than the modulus m. The routine skip_ahead and the called routine mult_mod use the Russian-peasant algorithm [4] and approximate factoring to avoid overflow.
Economists have a standard saying, "There's no such thing as a free lunch," and that's true here. The benefits of portability and a longer period have a cost in speed. From my point of view, the cost is not huge. The routine rand_port is not exactly sluggish. It takes about 0.23 seconds to get 100,000 random numbers on my 66 MHz 486 under OS/2, which can be compared to 0.09 seconds using Borland's overflow routine. The performance hit is harder in 16-bit DOS: the portable routine takes 1.7 seconds versus 0.22 seconds for Borland's routine. Even at that price, I prefer the portable routine, which has twice the period and lends itself to skipping ahead.
Combining Generators and Shuffling
While a period of 2312, which is about 2.15 billion, is not exactly short for many purposes, it is not necessarily long enough for extensive computations. Furthermore, a Lehmer generator can produce a particular value only once in a complete cycle, and the planes illustrated in Figure 1 can be a problem for some uses of the random numbers. Two techniques for mitigating these problems and lengthening the period are combining numbers from Lehmer generators and shuffling the numbers from a generator.
Obvious combinations of the numbers from Lehmer generators are the sum and the difference. L'Écuyer [2, 3] suggests a difference between two numbers that effectively obliterates planes such as those in Figure 1. The difference between numbers from Lehmer generators in Listing 3 has a period of about 2.31x1018, or 2.31 billion billion.
Shuffling the output of a random-number generator is another way to mitigate problems and lengthen the period. The basic idea is simple. Rather than take the numbers in sequence from the Lehmer generator, several values are stored and one of them is picked at "random" using the most recent random number. For a particular multiplier and modulus, the actual increase in the period from shuffling the generator is difficult to determine analytically, but this technique cannot worsen the generator. (Oddly enough, using a second Lehmer generator to pick the next value can worsen a generator, one of many pitfalls of random numbers.)
The sequence of numbers is more apparently random because it is not the set of consecutive values. Shuffling is especially useful if you are going to use the random numbers in another routine that can have unfortunate interactions with a Lehmer generator, which some transformations of consecutive values can.
Listing 3, the file rand_com.c, is a set of routines that combines the results of two generators and shuffles the output. Table 1 gives you the values of the 10,000th draws for the two underlying generators and for the combined generator.
There are no routines for skipping ahead in Listing 3. If the results of a random-number generator are shuffled, there is no known way to skip ahead without computing intermediate values. If you are happy with a period of about 2.31 billion billion, though, you can easily modify Listing 3 to use the routines for skipping ahead in Listing 2. Just delete shuffling from the routines in Listing 3 and use the routines for skipping ahead in Listing 2 for the two underlying Lehmer generators in Listing 3.
In summary, if you want a well-behaved and reasonably quick routine to calculate random numbers and simply want values with equal probabilities, I recommend rand_port. If you want a routine with a longer period and more apparently random numbers even though it is slower, I recommend rand_comb with or without shuffling.
References
[1] George S. Fishman and Louis R. Moore III. "An Exhaustive Analysis of Multiplicative Congruential Random-Number Generators with Modulus 2311," SIAM Journal on Scientific and Statistical Computing January: 1986.
[2] Pierre L'Écuyer. "Efficient and Portable Combined Random-Number Generators," Communications of the ACM June: 1988.
[3] Pierre L'Écuyer. "Random Numbers for Simulation," Communications of the ACM October: 1990.
[4] Donald Knuth. The Art of Computer Programming, Vol. 2, "Seminumerical Algorithms," Second Edition (Reading MA: Addison Wesley, 1981), pp 441-443.
Further Reading
If you want to read more about random-number generators, you will want to begin with Chapter 3 of Donald Knuth's classic (listed above). But "classic" is a double-edged sword. Some of his discussion is dated. Bratley, Fox, and Schrage provide a nice overview of the issues. James provides a nice overview of alternative generators. Some references in a rough order of accessibility, are:
Bratley, Paul, Bennett L. Fox and Linus E. Schrage. A Guide to Simulation, Second Edition. New York NY: Springer Verlag, 1987.
James, F. "A Review of Pseudorandom-Number Generators," Computer Physics Communications. October 1990.
Press, William H., et al. Numerical Recipes in C, Second Edition. Cambridge: Cambridge University Press, 1992.
Park, Stephen K., and Keith W. Miller. "Random-Number Generators: Good Ones Are Hard to Find," Communications of the ACM. October 1988.
L'Écuyer, Pierre, and Serge Ct. "Implementing a Random-Number Package with Splitting Facilities," ACM Transactions on Mathematical Software. March 1991.
Fishman, George S. "Multiplicative Congruential Random-Number Generators with Modulus 2b: An Exhaustive Analysis for b=32 and a Partial Analysis for b=48," Mathematics of Computation. January 1990.