Decoding
Decoding of received words is the most challenging part of any system that uses error-correcting codes. Fortunately, an efficient algorithm exists for the decoding of RS codes. Decoding a received word r(x)
amounts to finding the closest codeword, the one that was most likely transmitted. You need to know at which locations the errors occurred and what their magnitudes are.
Since
then
Given that all codewords are a multiple of g(x)
,
holds for any valid codeword c(x).
Since we know that all multiples of g(x)
are codewords, you can infer that a received word r(x)
is erroneous when any of the values is nonzero. This set of 2T
values is actually referred to as the syndrome of the received word r(x), and it can be expressed as the polynomial
It is important to realize that the syndrome contains all the information needed to decode the received word r(x).
Conceptually, there are three stages in the decoding process:
- Compute the syndrome of the received codeword.
- If the syndrome is nonzero, find the number of errors, their locations, and their magnitude.
- Correct the errors.
Stage 1. To compute the syndrome, the decoder could subsequently evaluate r
() through . Unfortunately, this basic syndrome computation cannot be optimized as well as the encoding algorithm. I realized later that, by definition, the syndrome of r(x)
is the same as the syndrome of m(x) = r(x)
modulo g(x).
And this is good, because m(x)
is a much shorter polynomial (of degree up to 2T) than r(x),
which makes computing m
(1) through m
(2T-1) cheap. For calculating m(x)
from r(x)
and g(x),
the decoder uses the same efficient technique as the encoder. Obviously, if m(x)
= 0, then r(x)
is a valid codeword, and there is no need for further decoding. The function calc_syndrome
in codecsup.asm (available electronically) contains the code to compute m(x)
and the syndrome of the received word.
Stage 2. The main question in this stage is: How can we derive the error locations and magnitudes from the syndrome? What is their mathematical relationship? Again, assume that is a primitive element of GF(28). I will define the set B
of error locations as
For each , the corresponding error magnitude is . Given this mathematical representation of error magnitudes and locations, the error locator polynomial, (x),
and the error evaluator polynomial, (x)
of a received word, are defined by Figure 1. The error locations are simply the set of values for x
, where (x)=0,
which is B
. For each error locator , the corresponding error magnitude is shown in Figure 2.
Figure 1: (a) Defining the error-locator polynomial; (b) defining the error-evaluator polynomial.
Figure 2: Error magnitude.
At this point, I'll turn to the key equation for the decoding algorithm. After all, all that is known is the syndrome S(x).
Recall that T
is the error-correcting capability of the code and the degree of the generator polynomial is 2T
. The key decoding equation is:
It is interesting that a variation of the ancient algorithm discovered around 300 b.c. by Euclid can be used to determine (x
) and (x
) from the aforementioned equation. I am referring to the extended algorithm for finding the greatest-common divisor of two polynomials. In this case, it is applied to the two polynomials x
2T
and S(x)
until the degree of the resulting polynomial falls below T
.
If, at this stage, the degree of S(x)
is greater than T
, it can be concluded that too many errors occurred, and any attempt to correct them will be utterly fruitless. The decoder will return an error code. Now that (x
) and (x
) are known, the next step is to find B
, the set of error locations. There is a shortcut for the case where the received codeword has only 1 error, which implies that (x
) has degree 1. Then, (x
) can be written as ax+b
. This polynomial is zero when x=b/a
and B={b/a}.
When the degree of (x
) is greater than 1, all nonzero elements of GF(28)
are plugged into (x
) to find all its zeros. This stage is actually performed by an assembly language routine because assembly offers optimization opportunities in this kind of computation. Examine codecsup.asm to see how short the code is (at label evalpoly
, in the function find_roots
) that evaluates (x
) in each iteration! If, at this stage, the number of zeros found does not match the degree of (x
), the decoder returns an error code because too many errors must have occurred. Finally, for each , is computed using the error magnitude formula defined above.
Stage 3. At this stage, you are finally able to enjoy the fruits of your labor, as the algorithm seems to apply the right corrections at the right places in the received symbol sequence to recover the original codeword! By definition, each is equal to , where loc
is the error location. loc
can be expressed as . The correction of the error at loc
is done by adding (the error magnitude) to the loc
th symbol in the received word. This translates to the following simple statement in C data[loc] ^= magn
.
The program decode.c (available electronically at the top of the first page of this article) contains the function RSG_decode
that integrates stages 1, 2, and 3 of the decoding process. The program codecsup.asm contains the assembly function find_roots
that searches for all zeros of (x
).
Main Program
The main program (rsmain.c, Listing Three) lets you play with the Reed-Solomon error-correction functions that I have discussed. When you run the program it sets up the arrays for GF(28) arithmetic and computes the generator polynomial. Then, it asks you for an original string. This string is padded with zeros to create an array of K
symbols. This array is encoded to form the original codeword. Next, it asks you for a string with up to T
symbol errors. The program now overwrites the first K
symbols of the original codeword with the second string and executes the decoding algorithm. After this, the process will repeat itself. To exit the program, just enter an empty string at any of the prompts.
Listing Three
/****************************************************** * File: rsmain.c -- main function: repeatedly asks for original string and * string with errors and tries to recover original string from string with * errors using Reed-Solomon decoding * Copyright (c) 1995 by Hugo Lyppens with permission to print in DDJ ******************************************************/ #include "hdr.h" #include <conio.h> #include "rs.h" main(argc, argv) int argc; char *argv[]; { char str[N], str2[N]; int r; RSG_ConstructGaloisField(); RSG_CalcGeneratorPoly(); for(;;) { printf("Enter original string or enter empty string to quit:\n"); memset(str, 0, N); str2[0] = 0; gets(str); if(!str[0]) break; RSG_encode((UBYTE *)str); printf("Enter string with up to %d symbol errors or enter empty string to quit:\n", T); gets(str2); if(!str2[0]) break; strncpy(str, str2, K); r = RSG_decode((UBYTE *)str); if(r < 0) { printf("Decoding error -- too many errors!\n"); } else { printf("Decoding OK, recovered '%s' from '%s' by correcting %d errors!\n", str, str2, r); } } return(0); }
The program generates a lot of diagnostic output to let you follow the decoding process. In Figure 3, the strings "Coding theory is fun!" and "C0ding th.ory is f&n?" are entered by you.
Figure 3: Program output.
Generator polynomial: 54 + 01x^1 + F3x^2 + 6Cx^3 + 1Bx^4 + 5Fx^5 + 62x^6 + D4x^7 + B2x^8 + CFx^9 + 1Ex^10 + 72x^11 + BAx^12 + F4x^13 + E7x^14 + 81x^15 + 01x^16 Enter original string or enter empty string to quit: Coding theory is fun! Enter string with up to 8 symbol errors or enter empty string to quit: C0ding th.ory is f&n? Syndrome 0: 59 Syndrome 1: 8d Syndrome 2: 5d Syndrome 3: 4d Syndrome 4: 5 Syndrome 5: bf Syndrome 6: ae Syndrome 7: 5c Syndrome 8: 18 Syndrome 9: ad Syndrome 10: 6b Syndrome 11: b4 Syndrome 12: c9 Syndrome 13: c3 Syndrome 14: e6 Syndrome 15: fe Degree of err. loc. polynomial: 4 Error locator polynomial: CC + B8x^1 + 05x^2 + 21x^3 + 3Cx^4 Error evaluator polynomial: 5C + 2Fx^1 + 4Dx^2 + ADx^3 number of roots of elp: 4 correcting error at location 20 of magnitude 1E correcting error at location 9 of magnitude 4B correcting error at location 1 of magnitude 5F correcting error at location 18 of magnitude 53 Decoding OK, recovered `Coding theory is fun!' from `C0ding th.ory is f&n?' by correcting 4 errors!
Although this program only lets you test cases where the information symbols (the first K
symbols of each codeword) are affected by errors, the algorithm also works if some or all of the errors occur in the redundancy part of a codeword.
Conclusion
Reed-Solomon codes are extremely efficient because they only need 2T
redundancy symbols to achieve an error correction capability of up to T
symbol errors. They are also fairly easy to decode. As demonstrated, a Reed-Solomon code with the right parameters lends itself extremely well to implementation in software on the Intel 32-bit x86 and other computer architectures. As my positive experiences with the Amiga Video Backup System suggest, software Reed-Solomon implementations could be used to build a wide variety of systems (especially low-volume or custom applications) which wouldn't otherwise be feasible because of the high cost of using hardware-error correction.
References
van Tilborg, H.C.A. Coding Theory: A First Course. Eindhoven Technical University, 1993.
Hoffman, D.G., D.A. Leonard, C.C. Lindner, K.T. Phelps, C.A. Rodger, and J.R. Wall. Coding Theory: The Essentials. Auburn, AL: Marcel Dekker. 1991.
Hugo is a programmer/analyst at Goldman, Sachs & Company. He can be reached at [email protected] or at http://www.stack.urc.tue.nl/~hugo.
Dr. Dobb's Journal January 1997: A Complete Reed-Solomon Encoding and Decoding Example