Preparing for the inevitable
Arch is the lead developer for KAI C++. He cut his 64-bit teeth on the nCUBE-2 processor in 1990. He can be reached at arch [email protected].
Formula for Repetitive Constants
Fabrication advances are causing most processors to move from 32-bit words to 64-bit words. So what good are 64-bit machines? One advantage is that they can address more than 4 GB of memory. Another is that they can operate on 64 bits of data at once. The data need not be a single 64-bit integer; it could be eight characters. Listing One shows an example implementation of the C function strlen, which returns the offset of the first 0 byte in a string. It is tailored to Big-endian 64-bit machines. The gimmick is this line:
b = ((w&~m) + ((w&m)>>7)) & m;
that looks for the first "all ones" byte in w. It does this by adding the high-order bit in each byte to the 7 low-order bits, and looking for a carry into the high-order bit. Hence 8 bytes are processed in parallel on each loop iteration. Once it has found such a carry, the code performs a series of shift and mask operations to identify the leftmost such carry; see the accompanying text box entitled "Formula for Repetitive Constants."
The disadvantage of the technique is the extra startup overhead for alignment considerations, and the fact that it needs more instructions per loop iteration. Thus, for short strings and 32-bit processors (which yield only four-way parallelism), it rarely pays off. Furthermore, some newer processors have multimedia instructions for parallel operations on 64 or even 128 bits without the need for the carry trickery in the example.
The choice of strlen is for illustrative purposes most modern compilers and hardware specifically optimize the native version of strlen so well that the parallel version may well be slower on your system these days.
As you might expect, there are a number of issues involved in migrating from 32 to 64 bits. In this article, I'll describe some of the migration issues that I have found to be significant.
Java solves the portability problem by creating a virtual machine model everywhere that promises (or at least advertises) identical results for sequential programs. (Parallel programs are often naturally nondeterministic.) Doing so dumbs all machines down to the lowest common denominator. Java's Procrustean approach also prevents programs from growing with their machines. For example, Java limits arrays to about 2 billion elements. This makes sense on 32-bit toaster ovens, but is a limitation on 64-bit machines.
There are many theoretically possible problems in the migration from 32 to 64 bits. I'll stick to the ones that I've seen happen in practice, starting with the common language-independent issues, and then moving on to those peculiar to C/C++.
Perhaps the biggest hassle with migration from 32-bit to 64-bit platforms is not the details of coding it's the fact that 64-bit binaries won't run on 32-bit platforms. Thus to exploit the 64-bit capabilities, you'll end up shipping both 32-bit binaries and 64-bit binaries, which drives up production and testing costs. I know of no panacea for this. It's a cost of progress.
Layout Issues
Obviously, 64-bit addresses and integers take up twice as much space as their 32-bit counterparts. The doubling can slow a program down; half as many integers or pointers can fit in cache at once. Thus bigger is not necessarily better.
The size of structures is not only affected by the size of fields, but also of their alignment constraints. On 64-bit machines, the alignment of long may be stricter than for int or float, it may be required to reside at a multiple of 64 bits instead of 32 bits, and likewise for type double. Consequently, structures containing type double may grow about 33 percent larger than their size on a 32-bit machine, even though the sizes of the fields are unchanged because the compiler must insert more padding in the structure to guarantee alignment requirements. For instance:
struct Foo {
double x; // 8 bytes
bool y; // 1 byte
// Implicit padding here.
};
requires 12 bytes for 32-bit alignment, but 16 bytes for 64-bit alignment. (Coding tip: For good packing of structures, declare fields in decreasing order of size.)
Numerical Issues
Sixty-four-bit integers have much more range. Whereas the maximum value for a signed 32-bit integer is about 2 billion, the maximum value for a signed 64-bit integer is about 9.223×1018. I recommend learning the order of magnitude and the first few leading digits of this number, and likewise for the maximum for an unsigned 64-bit integer (about 18.44×109). It's handy to be able to recognize them in decimal dumps, particularly the unsigned value, since it corresponds to a word of all ones.
While most problems in migrating to 64-bit platforms arise from the extra high-order bits, the low-order bits can cause trouble too. Both 32- and 64-bit machines typically use IEEE floating point. An IEEE double-precision floating-pointer number (typically type double in C) has 64 bits, but 12 of those bits hold the sign and exponent. Converting a 64-bit integer to a double can cause a loss of precision, which was never a problem with 32-bit integers.
Avoiding C/C++ Traps
C/C++ sometimes have a bad reputation for portability. In fact, it is usually straightforward to write portable C/C++. Think of the machine architecture as one of the inputs to your program, and parameterize your program accordingly.
When writing portable code, it is important to understand the difference between a language standard and an implementation. For some proprietary languages, there is no difference: The implementation defines the Standard. But for many languages (for instance, C, C++, Fortran, Ada, Cobol) there are international Standards that say what is guaranteed and what is not. For example, given the statement:
a[j++]=b[j++]
some implementations of C increment the value of j twice. However, the ISO standard for C says that the effect on j is undefined. Some of the problems that I will point out are actually latent violations of the C/C++ Standards that are exposed by the migration to 64-bit platforms. To improve the chances that your compiler catches the problems for you, I strongly recommend that you turn on all the type-checking available.
Adhering to the Standards and parameterizing your code appropriately are good, but only up to a point. You have to draw a line somewhere. In principle, machines can be really weird and still have implementations of C/C++ that conform to ISO standards. For example, 13-bit characters and one's complement arithmetic are theoretically possible. But practically speaking, except for DSP chips and antiques, machines have 8-bit characters and two's complement integers with a size that is a power of two. Writing excessively general code can be just as much a waste of time as writing excessively narrow code, so give heed to Samuel Johnson's advice before coding: "No man but a blockhead ever wrote except for money."
Here are the three principle sources of information for parameterizing your code for word size. The first is the sizeof operator. Use it religiously to compute the sizes of types, not only when allocating objects, but also when packing or unpacking bits in integers. The second is <limits.h>. It defines CHAR_BIT, the number of bits in type char, and the maximum and minimum values for integral types; for example, ULONG_MAX for the maximum value of an unsigned = long. The third is the integral types themselves. Choosing the right type for a purpose is part of parameterizing your code for word size.
long and int: Noticeably Different
The most frequent gotchas in 64-bit code arise not from bigger absolute sizes of types, but from changes in relative sizes: Types that were the same size on 32-bit systems are often no longer so on 64-bit systems. Indeed, this was the major stumbling block in the migration from 16-bit to 32-bit systems. In particular, modern 32-bit code sometimes presumes, often accidentally, that sizeof(int)==sizeof(long) ==sizeof(pointer), which is almost never true on 64-bit machines.
long Bigger Than int
On 64-bit platforms, it is usually true that sizeof(long)>sizeof(int). The exceptions are cretinous implementations with 32-bit long and int, and the long long abomination for 64-bit integers. The rest of this article presumes a clean 64-bit system: 32 bits for int, 64 bits for long and pointers.
The growth of long to bigger than int can cause some subtle errors in programs. The most common program error is accidental truncation, as shown here:
long x = 0;
for( int k=0; k<sizeof(long)*CHAR_BIT;++k )
x |= 1<<k;
This code appears to set all bits in x to 1, and does so when sizeof(long)==sizeof(int). However, when sizeof(long)>sizeof(int), there is a problem: The 1 has type (int), and thus the shift is done using int arithmetic, not long arithmetic. When the shift count is greater than or equal to the number of bits in the type, the ISO rules say the result is undefined. For instance, one common platform simply ignores all but the rightmost 5 bits in the shift count, and thus evaluates 1<<32 the same as 1<<0, which is 1, not the intuitively obvious 0.
The correct way to write the affected line is:
x |= 1L<<k
or, if writing a template and x is of some unknown type T, write:
x |= T(1)<<k
Be on the lookout for similar truncation problems with other operators. I know of no good automatic defense against the problem.
While on the subject of bit manipulation, there is one other gotcha for 64-bit systems: You need 6 bits, not 5, to specify the index of a bit within a word. Bit fields that hold bit indices must be declared with 6 bits instead of 5. Otherwise, the critical 6th bit of such indices is silently lost.
Pointers Bigger Than int
Pointers and type int usually have the same size on 32-bit platforms these days, but not on 64-bit platforms. There are two common kinds of mischief (or deserved punishment) arising from this change.
First, C permits conversions between pointers and integer, and regrettably some programs habitually do so. If the conversion is between a pointer and type int, the code will likely break on a 64-bit platform because the 32-bit int will not hold all the bits that were in the pointer.
C++ helps a bit in this matter it expressly disallows converting a pointer to an integer with fewer bits. Given a pointer p, the expression (int)p might compile without complaint on a 32-bit platform, and be rejected by a compiler for a 64-bit platform. Hence, if you must cast between integers and pointers, it is probably safest to use type long for the integers.
Second, the difference in sizes for pointers and int exposes the difference between adding a signed or unsigned int to a pointer. Consider:
char * p;
int i;
unsigned int u;
Does p+(u+i) point to the same memory location as (p+u)+i? That is, is addition associative? Let's plug in some numbers to find out. Let u=-1 and i==1. The first expression simplifies to p+(unsigned)0. The second expression is most likely illegal on most 32-bit machines, because p+u is out of bounds unless the object pointed to by p occupies all memory. But most hardware will happily execute it, and once i is added, it will wrap around to yield the value of p, a valid address. Thus the error goes unnoticed. But on a 64-bit machine, (p+u)+i simplifies to a pointer 4 GB away from p, which is either an illegal address (resulting in a machine fault when dereferenced), or a legal address in a really big object (which is even worse if the intent was address p, because it silently works).
A related problem involves changes in the relative sizes of size_t and ptrdiff_t compared to int. Typically, type size_t is 32 bits, is on a 32-bit machine, and 64 bits on a 64-bit machine. Furthermore, size_t is often unsigned int on 32-bit machines, and unsigned long on 64-bit machines a change in type as well as size. Type ptrdiff_t usually changes likewise, except that it is always a signed integral type.
Choosing Integral Types
Here are my rules of thumb for picking integral types: If a value fits in 8 bits (and space is at a premium), use an explicitly signed or unsigned char. Be leery of using a plain char for anything other than ASCII because the standards permit implementations to treat plain char as unsigned or signed. If a value fits in 16 bits, use a short or unsigned short. If a value needs to be handled fast and fits in 16 bits (in critical loops, for instance), use type int or unsigned int. If a value needs at least 32 bits, use type long or unsigned long because the C/C++ Standards guarantee only at least 16 bits for int and unsigned int. If a value needs to be bigger than 32 bits and you might be running on a 32-bit machine, use some sort of package for multiprecision arithmetic. In C++, overloaded operators make such packages painless to use.
Formatting and I/O
The switch from 32 to 64 bits can introduce formatting and I/O problems. The problems arise from both increases in magnitudes and from changes in the relative sizes. Consider the following example, which has both sorts of mistakes:
long j = ...
char buffer[12];
sprintf( buffer, "%d", j );
There are two problems here when a long grows from 32 to 64 bits. First, more digits are necessary to print numbers. The array size 12 was big enough to hold an optional minus sign followed by 10 decimal digits and a null terminator. The switch to a 64-bit long potentially adds nine more digits. Table 1 lists the digit requirements.
Second, the format string has a subtle error, even though it works when sizeof(long)==sizeof(int). The format for printing a long requires an l modifier: It should be %ld, not %d. On a machine with sizeof(long)>sizeof(int), such an error may cause the value to be printed incorrectly, or for formats that print multiple values, errors in printing the other values because the format string "lies" about parameters that were passed. Always remember the l modifier when printing a long. Here's the corrected example:
long j = ...
char buffer[21];
sprintf( buffer, "%ld", j );
Also note that type size_t may be an unsigned int or unsigned long. Thus when printing a size_t, cast it to an unsigned long and include the l modifier, as shown here:
size_t s;
printf("size=%ld\n", (unsigned long)s );
The best solution is to use C++ and std::ostringstream then you are not responsible for preallocating storage or the l modifier.
External file formats can cause eternal portability grief since they become set in stone. Think hard about the following when inventing a new file format. The common sin is writing and reading raw structs to and from files. While such code is easy to write, it locks the file format to the particular structure layout and byte order of a machine.
The simplest portable approach is to always write and read bytes not words. There is a format used by the DWARF debugging format that is an excellent way to write and read integers in a machine-independent fashion. The format writes out an unsigned integer 7 bits at a time, starting with the least-significant 7 bits. The upper bit of each byte is used as a delimiter: 0 means end of integer; 1 means another byte follows. Not only is this format machine independent, it compresses small integer values. Thus, the decrease in I/O traffic can easily make up for the added time for encoding and decoding, particularly in the current milieu where processors grossly outrace storage devices.
For example, unsigned integers between 0 and 127 take up only a single byte. There is a similar format for signed integers that differs only in that when read, the 6th bit of the last byte is treated as a sign bit. Listing Two shows some concise C++ template functions for such encoding and decoding for both the signed or unsigned format.
Of course, sometimes efficiency or simplicity constraints dictate that machine integers be written out directly. If you need to do so, I recommend at least inventing a file header that describes the number of bytes in an integer, and the byte order. One gimmick for describing byte order is to write a number such as "0x0706050403020100" as a raw integer at the beginning of the file. On a Big-endian machine, the resulting byte sequence is "7 6 5 4 3 2 1 0;" on a Little-endian machine it is "0 1 2 3 4 5 6 7." The byte with value n describes exactly where to find the nth most significant byte.
Conclusion
The migration from 32-bit machines to 64-bit machines is really a test of how clean your code is does it work accidentally or according to standards? Do it right, and you'll be able to brag to your grandchildren when they are groaning over the 128-bit to 256-bit migration.
DDJ
Listing One
// Big-endian code. Presumes 8-bit bytes and 64-bit type long. size_t strlen( const char * s ) { typedef unsigned long word; // Align on word boundary // Presumes that sizeof(word) is power of 2. const char * t = s; for( ; (unsigned long)s & sizeof(word)-1; ++t ) if( !*t ) return t-s; // Search for word containing a byte equal to zero. word b; const word m = ((word)-1 / 0xFF)<<7; const word * u = (word*)(void*)t; do { word w =~*u++; // For each byte in b, set high bit of byte if corresponding byte // in w is a zero. Do this by adding high bits of each byte to // other 7 bits of each byte, and look for carry into high bit. // There are many other possible sequences -- the best depends // upon your compiler and target machine. b = ((w&~m) + ((w&m)>>7)) & m; } while( !b ); t = (char*)(void*)(u); // t now points to word *after* the word with the zero. // Binary search b for position of high-order 1. // The logic here is for big-endian machines with 64-bit words. if( b&0xFFFFFFFF00000000UL ) { b >>= 32; t -= 4; } if( b&0xFFFF0000UL ) { b >>=16; t -= 2; } if( b&0xFF00UL ) { b >>=8; t -= 1; } return (char*)(void*)t-s-1; }
Listing Two
// The functions below are written as templates so that they // employ the signed or unsigned format according to whether // type T is a signed or unsigned integral type respectively. // The idiom "-(T)1>0" is a way of asking "is T an unsigned type?" // For simplicity, the routines below presume that when type T // is a signed integral type, operator>> does sign extension. // This is widely common behavior, but not mandated by ISO C/C++. typedef unsigned char byte; // Template function "encode" writes the encoding of value i to // the sequence starting at p, and returns an iterator that points // to the end of the sequence. template<typename Iterator, typename T> Iterator encode( Iterator p, T i ) { // If T is unsigned type, loop while u is outside range 0..127 // If T is signed type, loop while u is outside range -64...63. for(; (-(T)1>0 ? i : i+64u)>127u; >>=7 ) *p++ = i&127|128; *p++ = i&127; return p; } // Template function "decode" sets "result" to the value decoded // from a byte sequence starting at p, and returns an iterator // that points to the end of the sequence. template<typename Iterator, typename T> Iterator decode( Iterator p, T& result ) { T u = 0; int s = 0; byte c; // Accumulate value in u from bytes until terminator is found. do { c = *p++; u |= (c&127)<<s; s += 7; } while( c&128 ); // Subtract 0 or 1 depending upon bit 6 of last byte. // This has the effect of sign-extending u. result = -(T)>0 ? u : u - (c>>6&<<s); return p; }