Bruce is president of Counterpane Systems, a security consulting firm, and author of Applied Cryptography: Protocols, Algorithms, and Source Code in C, Second Edition (John Wiley & Sons, 1996). Subscribe to his monthly newsletter at http://www .counterpane.com/.
Sidebar: The History of AES
Sidebar: The Current State of DES
Sidebar: The AES Candidates
In 1997, the National Institute of Standards and Technology (NIST) called for the replacement of the DES encryption algorithm. Fifteen candidates came forward. The selection process will take about two years. (For more information on the process, see the accompanying text boxes entitled "The History of AES" and "The AES Candidates.") Twofish is our submission.
John Kelsey, Chris Hall, Niels Ferguson, David Wagner, Doug Whiting, and I designed Twofish to be fast, flexible, and secure. It's conservative -- there are no radical new security ideas or design elements. And it's completely free -- there are no patent royalties on the algorithm, copyright on the code, or license fees on anything. We have not applied for a patent on Twofish, and have no plans to do so.
Twofish
Twofish is a symmetric block cipher; a single key is used for encryption and decryption. Twofish has a block size of 128 bits, and accepts a key of any length up to 256 bits. (NIST required the algorithm to accept 128-, 192-, and 256-bit keys.) Twofish is fast on both 32-bit and 8-bit CPUs (smart cards, embedded chips, and the like), and in hardware. And it's flexible; it can be used in network applications where keys are changed frequently and in applications where there is little or no RAM and ROM available.
As Figure 1 illustrates, Twofish is a Feistel network. This means that in each round, half of the text block is sent through an F function, and then XORed with the other half of the text block. DES is a Feistel network. Blowfish (another Schneier algorithm) is a Feistel network. Five of the AES submissions are Feistel networks. Feistel networks have long been studied in cryptography, and we know how they work.
In each round of Twofish, two 32-bit words (the two vertical lines along the left of Figure 1) serve as input into the F function. Each word is broken up into four bytes. Those four bytes are sent through four different key-dependent S-boxes. The four output bytes (the S-boxes have 8-bit input and output) are combined using a Maximum Distance Separable (MDS) matrix and combined into a 32-bit word. Then the two 32-bit words are combined using a Pseudo-Hadamard Transform (PHT), added to two round subkeys, then XORed with the right half of the text. There are also two 1-bit rotations going on, one before and one after the XOR. Twofish also has something called "prewhitening" and "postwhitening;" additional subkeys are XORed into the text block both before the first round and after the last round.
The algorithm might look haphazard, but we did everything for a reason. Nothing is in Twofish by chance. Anything in the algorithm that we couldn't justify, we removed. The result is a lean, mean algorithm that is strong and conceptually simple.
Each step of the round function is bijective. That is, every output is possible. We've seen too many attacks against ciphers that don't have this property not to include it. The round function mixes up operations from different algebraic groups: S-box substitution, an MDS matrix in GF(28), addition in GF(232), addition in GF(2) (also called XOR), and 1-bit rotations. This makes the algorithm difficult to attack mathematically.
The key-dependent S-boxes are designed to be resistant against the two big attacks of the early 1990s -- differential cryptanalysis and linear cryptanalysis -- and resistant against whatever unknown attacks come next. Too many algorithm designers optimize their designs against specific attacks, without thinking about resistance against the unknown. Our design philosophy was a bit different: good enough against known attacks, and enough nastiness to (hopefully) resist unknown attacks. Key-dependent S-boxes were one way we did that.
Key-dependent S-boxes were not selected randomly, as they were in Blowfish. Instead, we carefully designed S-box construction rules, and tested them with all possible 128-bit keys (and a subset of possible longer keys) to make sure that all the S-boxes were indeed strong. This approach allowed us to combine the strength of fixed, strong S-boxes with the strength of secret S-boxes. And Twofish has no weak keys, as Blowfish does in reduced-round variants.
The MDS matrix was carefully chosen to provide good diffusion, to retain its MDS property even after the 1-bit rotation, and to be fast in both hardware and software. This means that we had to search through all possible matrices and find the one that best met our criteria.
The PHT and key addition provide diffusion between the subblocks and the key. And using the LEA instruction on the Pentium (and above), we can do all four additions in just two operations.
The round subkeys are carefully calculated, using a mechanism similar to the S-box construction rules, to prevent related-key attacks and to provide good key mixing. One of the things we learned during this process is that a good key schedule is not grafted onto a cipher, but designed in tandem with the cipher. We spent a lot of time on the Twofish key schedule, and are proud of the results.
The 1-bit rotation is designed to break up the byte structure; without it, everything operates on bytes. This operation exists to frustrate cryptanalysts; it certainly frustrated our attempts at cryptanalyzing Twofish.
The prewhitening and postwhitening seems to add at least a round to the difficulty of any attack. Since eight XORs are cheaper than a round, it makes sense to leave them in.
Twofish's Performance
Twofish screams on high-end CPUs, and it's flexible enough for tiny smart-card CPUs. It also works well in hardware. And there are several performance trade-offs between key-setup time and encryption speed that make it unique among the AES candidates.
Almost all encryption algorithms have some kind of key-setup routine: a way to take the key and make the round subkeys that the algorithm uses. Twofish needs to take the key and make key-dependent S-boxes and round subkeys. Blowfish, which needed to do the same thing, was slow in setting up a key, taking as long as 521 encryptions. Twofish is much faster; its key setup can be as fast as 1.5 encryptions.
Twofish has a variety of options. You can take longer for key setup and the encryption runs faster; this makes sense for encrypting large amounts of plaintext with the same key. You can setup the key quickly and encryption is slower; this makes sense for encrypting a series of short blocks with rapidly changing keys. Table 1 shows the performance of key setup and encryption, in clock cycles per block, for five keying options on both the Pentium II/Pentium Pro and Pentium, in assembly language.
Other processors are similar or better. In general, the Intel architecture is the most annoying, and the hardest to optimize. It is far easier to write code that meets these performance numbers on a more general architecture, say the UltraSparc, 68040, or G3.
All of these options interoperate; they are just different ways of implementing the same Twofish algorithm. Data can be encrypted using one option and decrypted with another.
On smart cards, Twofish also has a variety of trade-offs. Table 2 is based on code written for a 6805 CPU. The RAM estimates assume that the key must be stored in RAM. If the key can be stored in EEPROM, then the algorithm only needs 36 bytes of RAM to run. The code size includes both encryption and decryption code. If only encryption has to be implemented, the code size and speed numbers improve somewhat.
Key setup on this processor is about 1750 clocks per key, which can be cut considerably at the cost of two additional 512-byte ROM tables. And the 6805's lack of a second index register has a significant impact on the code size and performance of Twofish; a CPU with multiple index registers (the 6502, for instance) will be a better fit for the algorithm.
These estimates are for a 128-bit key. For larger keys, the extra code size is negligible: less than 100 bytes for a 192-bit key, and less than 200 bytes for a 256-bit key. The encryption time increases by less than 2600 clocks for a 192-bit key, and about 5200 clocks for a 256-bit key. Similarly, the key schedule precomputation increases to 2550 clocks for a 192-bit key, and to 3400 clocks for a 256-bit key.
It's possible to shrink Twofish even further, saving about 350 bytes of ROM while decreasing performance by a factor of 10 or more. This is only useful in limited situations, but it shows how flexible the algorithm really is.
Similar sorts of trade-offs exist when putting the algorithm into hardware: key setup speed, for example, versus encryption speed, or speed versus gate count.
Cryptanalysis of Twofish
We spent over 1000 man-hours cryptanalyzing Twofish. The detailed results are in the Twofish design document (http://www .counterpane.com/twofish.html), but here are the highlights.
Our best attack works against five rounds of Twofish, without the prewhitening and postwhitening. It requires 222.5 chosen plaintext pairs and 251 work. We expect further research and clever techniques will extend this attack a few more rounds, but don't believe that there are any attacks against more than nine or 10 rounds.
We also have a related-key attack. It's a partial chosen-key attack on 10 rounds of Twofish without the prewhitening and postwhitening. To mount the attack, we have a pair of related keys. We get to choose 20 of the 32 bytes of each key. We have complete control over those 20 bytes of both keys. We don't know the remaining 12 bytes of key, but we do know that they are the same for both keys. We end up trying about 264 chosen plaintexts under each key, and doing about 234 work, to recover the remaining unknown 12 bytes of key. No, it's not a terribly realistic attack, but it's the best we can do.
And we have reduced-round attacks on simplified variants: Twofish with fixed S-boxes, Twofish without the 1-bit rotations, and so on. We can't break full Twofish even with these simplifications, but our analysis helps us understand why those components are there and what they are doing.
Of course, with any encryption algorithm, it's "buyer beware." As a designer of Twofish, I am the least qualified to make pronouncements about its security. As the AES process continues, and other cryptographers start analyzing Twofish, we hope to collect evidence of its security. Until then, it's best to wait.
Conclusion
We feel that Twofish is the best choice among all the AES candidates because of its unique combination of speed, flexibility, and conservative design. Assuming it's secure (and only time will tell), Twofish is the fastest AES candidate across all CPUs. Twofish fits on smart cards, even those that only have a couple of registers, a few bytes of RAM, and little ROM. And it fits in hardware in few gates.
No other algorithm has the same flexibility in implementation: the ability to trade off key-setup time for encryption speed, and ROM and RAM for encryption speed. These options exist on 32-bit CPUs, 8-bit CPUs, and hardware.
And Twofish does this with a conservative design. We chose not to modify the basic Feistel network. We did not use data-dependent rotations, 32-bit multiplies, or any other poorly understood primitives. The key schedule is designed to resist even the nastiest of attacks. And we gave the cipher 16 rounds when we could only break five.
Reference code and executables that implement and test Twofish are available electronically (see "Resource Center," page 3). The files include platform-specific definitions, macros, and tables for Twofish internal structures, reference ANSI C source code, test code, an executable 32-bit console app of TST2FISH.C and TWOFISH.C, and the like.
The Twofish web site (http://www .counterpane.com/twofish.html) has the Twofish design document, free source code in a variety of languages for a variety of platforms, and any late-breaking news. Readers outside the U.S. and Canada can go to the web site to find pointers to Twofish code on servers outside the U.S.
DDJ
Copyright © 1998, Dr. Dobb's Journal