Programmers have used the Cyclic Redundance Check (CRC) algorithm for years to uncover errors in a data transmission. It turns out that you can also use CRCs to correct a single-bit error in any transmission. I first heard about error correcting CRCs in a conversation I had several years ago [1]. At the time, I thought this feature of CRCs was general knowledge, but as I did more research, I saw no mention of CRC error correction in the popular literature. The traditional response to a CRC error is re-transmission. However, the advance of computer technology has led to some situations where it is actually preferable to correct single-bit errors rather than to resend. Some examples include:
- Satellite transmission -- If a host is sending data via a satellite, the cost of sending a regular packet is high, so the cost of a resend just doubles the price for the packet.
- High-speed transmission -- In the future, there may be a tendency to push the technology. (Let's crank this baby up and see what it will do.)The faster bits move through a medium, the higher the probability of error.
- PowerLine Carriers -- Metricom Corporation, a supplier of integrated circuits for computer applications states, "There is a growing interest in the use of PowerLine Carrier (PLC) for data communication using the intrabuilding electric power distribution circuits. Power lines were not designed for data communications and exhibit highly variable levels of impedance, signal attenuation and noise... Harmful effects of impulse noise on data communications systems can be expected." [2].
Error Correcting CRCs
The algorithm for error correcting CRCs involves determining the remainder after dividing in binary (modulo 2). The algorithm is table driven. The length of the message is determined, a priori, and agreed to by both the sender and the receiver. A generator polynomial value that can be used to determine the position of errors anywhere in the message is also selected at that time.
Before a message is sent, a checksum is calculated and appended onto the end of the message. The message and checksum are then sent. The receiver gets the message and calculates a second checksum (of both parts). If the new checksum value is 0 then the message is considered valid. If not, the checksum value is used as a subscript in an error correction table to determine the position of the bad bit. This method will find and correct 1-bit errors. The receiver builds the error correction table using the chosen Generator Polynomial (GP). The GP has to be carefully selected, because different GPs have different characteristics. I discuss the selection of an error correcting GP later in this article.
Building the Tables
To build the error correction table, I begin with a series of 0s representing correct data. One of the bits is then set to 1. Using modulo 2 division (exclusive-or), the receiver divides the message by the GP, calculating the remainder that is used as a subscript in the error correction table. See Algorithm 1.
Algorithm 1:
for (i = 1; i<=Message_Length; i++) { set all bits in the Message to 0 change the i'th bit to 1 calculate the checksum (cs) EC_table[cs] = i }You only have to build this table once for each generator polynomial.
Building the EC Table for 1011
In the example that follows, I have chosen a 4-bit generator polynomial. With 4-bit GPs, only two GP values can be used for error correction--decimal values 11 and 13. I chose 11. When one divides the message by the 4-bit GP, the result is a 3-bit remainder, yielding values from 0 to 7. This result implies that I can use this GP for a message with a total length from 1 to 7 bits. (Four bits for the message and 3 bits for the checksum). The result of the calculation of the checksums is shown in Table 1. At the bottom of the table are the decimal values of the remainders after the division.
These values are then used to fill the Error Correction (EC) table. The EC table, in checksum order, is shown in Table 2. You could initialize an array in C with these values. Usually, EC[0] is not used.
int EC[] = {0,7,6,4,5,1,3,2};This process makes up the test for error correcting generator polynomial validity. For a given number, if the entire table cannot be built (i.e., if two or more numbers map to 1 slot), the number chosen cannot be used as an error correcting generator polynomial. In practice, when data is sent, the sender computes the checksum (cs) and then appends the checksum onto the end of the data. When the receiver receives the message combined with the checksum, the receiver computes another checksum (cs2). If the second checksum is zero, the data is (supposedly) ok. If cs2 is NOT zero, EC[cs2] contains the location of the bit in error.
Sending a Message with GP = 1011
Start with the following generator polynomial: 1011. Assume the Original Data is 1100. First append 3 additional bits (with value 000) on the end. Then calculate the checksum (using exclusive-or):
1100000 1011 ---- 0111000 1011 ---- 1010 1011 ---- 010The resulting checksum is: 010 and the string that is sent is the original data appended with the checksum: 1100010. Suppose the following string of bits is received 1101010. (Numbering from left to right, bit 4 has changed.) The receiver computes the checksum.
1101010 1011 ---- 01100 1011 ---- 1111 1011 ---- 1000 1011 ---- 011The checksum is binary 011, which is 3 in decimal. EC[3] is 4, which is the position of the bad bit! (Refer to Table 2) You can use this scheme for much longer messages, but the preceeding example demonstrates the essence of the algorithm. I wrote a program that determines whether a given bit pattern can be used as an error correcting GP (see Listing 1). None of the existing widely used GPs work for error correction (see the sidebar titled "Generator Polynomials").
There are ways of finding the bad bit without using tables. The following code is a slight modification of an algorithm presented by Fred Halsall [4] for computing an 8-bit CRC.
Algorithm 2:
hpo2 = 8; v = 1; for (i = hpo2-1; i>=1; i--) { if (v == checksum) bad_bit_position = i; v = v + v; if (v >= hpo2) v = v ^ GP; }If you have a lot of packets to check, table lookup is faster.
Using a Finite State Machine
Intrigued by the idea of CRC error correction, I looked at the method from different angles, and, as a result, I discovered a method for error correcting CRCs using finite state methods. Given the following generator polynomial: 11 (1011 in binary), I first determine the value of the left-most 1 bit (8) in the GP. I call it hpo2, and it is equal to the highest power of 2 in the GP. Then I build a Finite State Table (FST) for GP = 1011. This table represents states of the machine. The table has hpo2-1 rows and 2 columns (0 and 1). The procedure for building an FST is as follows: Let t equal the current row number. The row number represents the current state. Double it. (t = t * 2) Add the current column number (zero or one) to t. If the value of t is >= hpo2, then exclusive-or it with the GP. Place the value of t into the current row and current column of the table. The following C code computes the values in the FST.
Algorithm 3:
for (r = 0; r < hpo2; r++) for (c = 0; c < 2; c++) { t = r*2 + c; if (t >= hpo2) t ^= GP; FST[r][c] = t; }For GP = 1011, this algorithm produces the finite state table shown in Table 3.
Both the sender and the receiver must build the FST. Only the receiver builds the error correction table and does so using the following code:
Algorithm 4:
pos = 7; cc = 1; for (r = 0; r < hpo2; r++) { EC[cc] = pos; pos--; cc *= 2; if (cc >= hpo2) cc ^= GP; }The result is the CRC error correction table for 1011 and is equivalent to Table 2.
Correcting a 1-bit Error Using an FST
An example sequence of sending a message, losing a bit, then recovering the bad bit follows:
Algorithm 5:
Sender Given an original message, say 1100 Append zeros for the checksum on the end: 1100000. Traverse the FST... cs = 0; for (i=1; i < hpo2; i++) cs = FST[cs][msg[i]]; ... giving a 2 (010 in binary) for the checksum (cs). Send the message with a binary 2 appended: 1100 010 Receiver Receive a message with 1 bit changed (say bit 4): 1101 010 Traverse the FST to get a value (cs2) of 3 if cs2 = 0 then the message is ok else EC[3] = 4, therefore bit 4 is bad Correct the bad bit: 1100 010 Discard the checksum: 1100
A Sample Program
Listing 2 is a sample program that builds the tables, generates a random string, and changes a random bit. Part 2 of the program determines the position of the changed bit and corrects it. (The code is written with the message and checksum in an array of int. This approach demonstrates the logic while making the code much simpler to read.)
References
[1] Conversation with Maartin van Sway, now Professor Emeritus at Kansas State University.
[2] Direct quote from "http://www.metricom-corp.com/fec.html"
[3] Andrew S. Tanenbaum, Computer Networks, Prentice-Hall. 1996.
[4] Fred Halsall Data Communications, Computer Networks and Open Systems, Addison-Wesley, 1996, (p. 137).
About the Author
Bill McDaniel received his Ph.D. from Kansas State University and is currently the Chair of the Department of Computer Science at the University of Central Oklahoma. His interests include networking, encryption, CGI programming, and operating systems. He is the author of the article "Efficiently Sorting Linked Lists," which appeared in the June 1999 issue of Dr. Dobbs Journal.