Calculating CRC Checksums in C++
Colin Mahoney
Here's a handy template that computes a variety of CRC checksums.
A while back I was asked to produce some C++ code to calculate CRC (cyclic redundancy code) checksums. I was given a copy of Ross Williams' article, "A Painless Guide to CRC Error Detection Algorithms" [1], which includes a detailed description of the basic CRC algorithm, its optimization using tables and other means, and information about various CRC protocols. The article describes a parameterized model for CRC algorithms called the "Rocksoft Model CRC Algorithm." The article includes C source code for an implementation of the model. The model seemed to be made to measure for an implementation in C++ using templates. In this article I describe CRCGenerator, a C++ class which I produced based on Ross's original C code. Readers who want a detailed description of CRC algorithms should refer to Ross's original article.
CRCs The Basics
CRCs create hash values for a block of data. These values can be used to check for file integrity and transmission errors in streams. They should not be used to check for malicious tampering with data, as it is quite easy to change the original data in a way that conserves the original CRC value.
A CRC checksum is obtained by treating the input data as a very long binary number, and dividing it by a shorter number usually 16 or 32 bits. The remainder of this division is the CRC value. The quotient is not used and thus does not have to be recorded. The division is carried out modulo two without carries, and can be calculated efficiently using shifts and XORs (exclusive ORs). The divisor is known as the "polynomial," for reasons related to the mathematical theory underlying such calculations. CRC checksums have a useful property: if the checksum is appended to the original data, the new checksum for <data + original checksum> is zero. This makes testing for transmission errors very easy; the checksum can be transmitted after the data, and the receiver has only to check that the checksum for all the data received is zero.
The CRCGenerator Implementation
CRCGenerator (Figure 1 a partial listing of crc.h) uses a nested table class, CRCTable, to do most of the work of calculating the checksum. CRCTable has one data member, Table, and three member functions a constructor, a subscripting operator, and CalcTableEntry. The latter calculates the values for the elements of the table, and is called repeatedly from the constructor to fill the table. CRCGenerator has a static data member Table, of type CRCTable. The unsigned long member Register holds the current value of the CRC calculation. It is initialized to the value of the template parameter Init. The code uses template parameters in conditional expressions to select the appropriate element of Table when processing data. This will produce a lot of "conditional expression is constant" style messages on compilation. You may want to disable this warning when compiling crc.h.
CRCGenerator is defined in the namespace cdm_crc. Also defined in this namespace are CRCMaxBits, which defines the maximum number of bits that can be held in an unsigned long, and an insertion operator to write the CRC value to an ostream.
The Rocksoft Model
The Rocksoft model uses seven parameters to describe each CRC protocol. Six of these make natural template parameters in C++, while the seventh is a check value used to validate implementations of the algorithm. I wrote the CRCGenerator class to use just five of the Rocksoft parameters as well as one additional parameter. The declaration of the CRCGenerator class is:
template < int Width, unsigned long Poly, unsigned long Init, unsigned long XOrOut, bool Ref, bool Direct=true > class CRCGenerator;
If you want to calculate CRC-16 checksums, you can typedef the type:
typedef CRCGenerator < 16, 0x8005, 0, 0, true > CRC16;
The template parameters are as follows.
Width
This parameter describes the width of the polynomial used to calculate the checksum. It is equal to the highest exponent of x used in the polynomial representation of the divisor. CRCGenerator works with polynomials of widths from 8 to 32 bits.
Poly
This parameter is the value representing the polynomial used to calculate the CRC, but not including the most significant bit. This value should have Width bits. The ability of a CRC algorithm to detect errors depends to a large part on the mathematical properties of the polynomial chosen. For this reason it is not really a good idea to "invent" your own polynomial unless you really know what you're doing you should use one that has been tried and tested.
Init
This is the value used to initialize the CRC value, as given by the CRC protocol in question. Common values are 0, as in CRC-16, and 0xFFFFFFFF, as in CRC-32.
XOrOut
This is the value to be XORed with the final result of the CRC calculation to obtain the CRC value for the data. If XOrOut is non-zero, the CRCGenerator function GetNormalCRC will not return zero after processing <data+ original checksum>.
Ref
Some CRC protocols process bytes least-significant bit first the bytes are reflected. Ref controls whether the input bytes should be reflected before being processed, and whether the final CRC value should be reflected before it is read. The Rocksoft model uses separate parameters for input and output, but my implementation uses a single value for both.
Direct
When Direct is true (the default value) each byte processed is passed immediately through Register that is, there are no bytes sitting in Register waiting to be propagated out by the following data. The effect is as if each byte processed had been pushed through Register by Width zero bits. Any actual data that follows is superimposed on top of these zero bits and then processed in the same way.
The value that has been fed into Register by the end of the calculation is equal to the total input data augmented by Width zero bits. This is the required behavior for most CRC alorithms. For some algorithms, however, the bytes processed should be placed at the bottom of the register then pushed through it by the following data. You can get this behavior by setting the Direct parameter to false. If you have Direct set to false and you want to force all the data through the register, you can combine calls to Process and ProcessBits to pass Width zero bits through the register after processing all your data.
The "Check" parameter in the Rocksoft Model is the value that should be obtained when the string "123456789" is processed.
The CRCGenerator Interface
The public interface for CRCGenerator is shown below:
CRCGenerator(); unsigned long GetCRC() const; unsigned long GetNormalCRC() const; bool GetDirect() const; bool GetReflect() const; bool GetWidth() const; void LoadRegister ( unsigned long value ); void Process(unsigned char byte); void Process ( unsigned char* block, int block_size ); template<class InIter> void Process(InIter first, InIter last); void ProcessBits ( unsigned char bits, int count ); void Reset(); void Write(ostream& os) const;
A synopsis of these functions follows.
CRCGenerator
Only a default constructor is required since all the necessary information is given in the template parameters.
GetCRC
This function returns the value in the CRC register after XORing with XOrOut. If the Ref parameter is true this value is bitwise reflected. If you want to feed this value into the CRC calculation, you should do so least significant byte first you don't need to worry about the bits, as the calculation takes care of reflecting them. If Ref is false the bytes are correctly ordered. The value is returned in the bottom Width bits of the return value.
GetNormalCRC
This function reflects the bytes in the CRC register if Ref is true, and returns the CRC value in the most significant bytes of the return value. The value returned from GetNormalCRC can be fed into the calculation most-significant byte first, irrespective of whether Ref is true or false.
GetDirect, GetReflect, GetWidth
These functions return the values of the Direct, Ref, and Width parameters respectively. These may be useful for shifting or formatting the value returned by GetCRC or GetNormalCRC.
LoadRegister
This function can be called to XOR a value into the register. The value is reflected as necessary. This function should not be required for most calculations.
Process
The three Process functions allow data to be fed into the calculation either one byte at a time or in blocks. An alternative to the version using member templates is provided; it is commented out and can be used if your compiler doesn't support member templates.
ProcessBits
Processes count bits from bits. If Ref is false they are read in least-significant bit first; if Ref is true, most-significant bit first. Unless you want to do something unusual with CRCs having widths that are not integral numbers of bytes you probably won't need to use this function.
Reset
This function sets the CRC register value to the value of the Init parameter.
Write
Writes the CRC value to a stream, taking Ref into account. Write can be used to append the CRC value to a file stream. The file will then produce a CRC value of zero if read in and processed through CRCGenerator.
An Example: CRC-16
I have already shown the typedef used by the standard CRC-16 protocol. The following code runs the test case for the Rocksoft model:
typedef CRCGenerator < 16, 0x8005UL, 0UL, 0UL, true > CRC16; unsigned char data[] = {'1','2','3', '4','5','6','7', '8','9'}; CRC16 crc16; crc16.Process(data, 9); unsigned long crc_val; crc_val=crc16.GetNormalCRC();
Figure 2 shows complete code to calculate and print CRC-16 checksums for a list of files specified on the command line. Note the call to infile.unsetf(ios_base::skipws), which is necessary because istream_iterator<> skips whitespace by default.
Conclusion
The above examples demonstrate the convenience offered by C++ template classes. The template mechanism takes care of the creation and initialization of the static look-up table, allowing the class user to write code uncluttered by global variables or calls to initialization functions. My intention in writing the CRCGenerator class was to provide efficiency, ease of use, and flexiblity. In as much as this has been achieved, it is because C++ has made these often conflicting goals easily attainable.
Note
[1] Available online at Ross's web site at http://www.ross.net/crc/.
Colin Mahoney teaches English and is an independent developer of software for language learners in Sabadell, Spain. He can be contacted at [email protected].