Jim writes computational geometry and encryption software, and can be contacted at [email protected]. David is a developer for Cyber FX Communications and can be contacted at [email protected].
A message authentication code (MAC) is a fixed length hash, H, constructed in such a way as to be collision free. That is, it is infeasible, given H(M) for a message M, to find another message M' such that H(M')=H(M) (or even to find any two messages A and A' such that H(A')=H(A), a substantially easier problem). MD5 is a popular message digest algorithm that computes a 128-bit hash or code value for any message of any length. That is, MD5 "smears" the infinite number of possible input messages to 2128 (~3×1038) possible digests. Another algorithm for generating MACs is the secure hash algorithm (SHA) from NIST and NSA. SHA produces 160-bit codes. MACs are typically used, as their name implies, to authenticate messages. The sender follows a message with its MAC and the recipient regenerates a MAC at his end using the same algorithm. If the transmitted and generated MACs match, because of the collision resistance designed into the MACs, it is extremely unlikely that the message was corrupted during transmission.
Of course, a malicious adversary could modify the message and the MAC before they reach the recipient, and retransmit the altered message together with a newly calculated matching MAC without the recipient being aware of the switch. This is where keyed MACs come in to play. In this scenario, the sender and receiver share a secret key that is used by the hashing algorithm so that an adversary can no longer generate matching message-MAC pairs.
In this article, we propose an alternative encryption method called "MMPC" (short for "Multi-Message Package Chaffing"). MMPC encrypts multiple messages in one transmission using Ron Rivest's chaffing and package transforms (see Chaffing and Winnowing: Confidentiality without Encryption at http://theory.lcs.mit.edu/~rivest/ chaffing.txt, and All-Or-Nothing Encryption and The Package Transform at http:// theory.lcs.mit.edu/ ~rivest/fusion.ps, both by Ronald Rivest). The C code we present here is optimized to minimize memory usage and at the same time make as few passes through the data as possible.
MMPC is an encryption technique that combines messages with keyed message authentication codes (HMACs). MMPC offers several unusual and useful advantages over typical encryption methods without sacrificing the security of the underlying algorithms. Because MMPC uses the package transform, the sender is ensured that the receiver cannot read any of the message without first recovering a key; this requires reading the entire message. That is, intended recipients cannot read any of their message unless they can read all of it.
Perhaps the most significant way that MMPC differs from other encryption algorithms is that multiple messages can be interwoven into one transmission, each with its own key, in such a way that only the sender knows all of the messages. In a simple case, two unrelated messages can be sent in one transmission to two different recipients, each with their own key. Each recipient will see only the information intended for them and neither will be aware that there is any other information in the message. A more complicated example could include several messages and keys with a large number of possibilities as to who can read what.
To further thwart an informed adversary (one who knows everything about the method, but nothing about the keys), and also to prevent one keyholder from subtracting his message to narrow in on any other messages, pseudorandom bytes are also interspersed with the messages. Rivest calls these extra bytes "chaff," so that separating a message from the remaining data is analogous to separating wheat from chaff, except that one person's wheat is another person's chaff and the pseudorandom data is everyone's chaff.
MMPC works by breaking a message into fixed length blocks and transmitting each block followed by its keyed MAC or HMAC (see Figure 1). Consider the case where the transmission consists of only one message. In this case, each block of the message is followed by its HMAC and the transmission consists of the series of block-HMAC pairs. There is no security here because an adversary can reconstruct the message by reading only the blocks and putting them together without using the key at all.
Three factors to consider in addition to security are the increase in size of the transmitted information necessitated by the encryption algorithm, and the computer memory and time required to encrypt and decrypt. In the simplest case where HMACs have been added, the message size increases by a constant factor, which is determined by the size of the HMAC relative to the block size. There is an end-effect inherent in blocking, in that padding information has to be added; this sometimes requires an additional block. This effect is O(1) and can safely be ignored compared to the HMAC size, which is O(n) for n blocks (see Bruce Schneier's Applied Cryptography, Second Edition, John Wiley & Sons, 1995). In our example, time is needed to calculate the HMACs proportional to the message size; for instance, O(n), but if a block is read, an HMAC calculated, and a block-HMAC pair written, the memory use is only O(1).
If the package transform is applied to the message as the block-HMAC pairs are created, there is still limited security, but even this small effort can make an adversary's job more than a little harder. And this is really what encryption is all about, namely leveraging the effort at scrambling the data so that each incremental increase in effort at the encrypting end results in a much bigger increase in decrypting effort at the other end. The combination of several easy-to-implement encryption techniques can result in devilishly complicated ciphertext. All the previous comments about time and memory hold with two additional ones: First, some time must be spent doing the package transform; this time is proportional to the message length. Also, an additional block and HMAC have to be calculated to hold the hidden key; this is simply another end effect. Regarding memory, the package transform can be carried out with only one additional accumulator block, which is used to hold the final key block.
From the adversary's standpoint, however, even though the adoption of the package transform has little cost in storage, time, and memory, his work is n times harder because he has to read every single block to gain access to the key that allows the plaintext to be recovered. The legitimate keyholder has to do the same thing in this simple case, but it will soon be clear that knowledge of the key provides a big advantage over an adversary, an advantage whose leverage can be adjusted by adding phony blocks! We should point out that there are two ways to decrypt the packaged blocks. All of the blocks can be stored, the key recovered from the last one, then each block can be decrypted in turn. An alternative is to make two passes through the blocks, the first to get the key and the second to use the key on each packaged block. The code we are about to present uses the latter technique to save on memory at little expense in time.
The Addition of Chaff
Now the fun starts, at least from the standpoint of the encrypter. Suppose that instead of writing the block-HMAC pairs out in order, some random bytes are interspersed in groups of the same sizes as the block-HMAC pairs. Again, this is a very simple process, taking little computer time or memory and only as much space as the additional pairs; see Figure 2. The intended recipient will need her key so that she can check each pair for legitimacy, keeping only those pairs that match. This work is proportional to the number of pairs, including the bogus ones. The random pairs throw a wrench into the plans of an adversary, however.
There are two brute-force ways to decrypt our transmission containing random bytes. An adversary could try an exhaustive key search, but there is a complication due to the random pairs. There could be as few as two actual message pairs (one for the data and one for the encrypted key) and they could be any two pairs, perhaps the last two. That means that each key would have to be applied to each block to produce an HMAC for comparison with the paired HMAC. If our key length is 64 bits, there are 264 possible keys; this number must be multiplied by the number of blocks. Do the multiplications and you will convince yourself that this approach is going to take a very determined adversary with a lot of time on his hands. But there is another way.
You could ignore the HMACs altogether and search for some combination of the blocks that results in a readable message. Remembering that the message could be any number of blocks from 2 to n, let us count the number of possible combinations that would have to be tested. (It is possible to relax the condition that the blocks appear in order, albeit with bogus pairs interspersed, but then permutations enter the picture and there are far more possible ways to arrange the pairs. This requires memory to store all of the blocks so they can be recombined or n passes through the message. We did not look into this case.) There are nCm ways to take m blocks from n, and if t is the time to process one block, it will take time 2*t*m*nCm to process all the m block combinations (the 2 comes from having to make two passes, one for the key and one to recover the plaintext) and time 2*t*(2*nC2+3*nC3 +...+n*nCn) for a complete brute-force attack. This can be simplified to 2*t*n*(2(n-1)-1).
The difficulty of this second type of attack depends on the total number of blocks and thus is under the sender's control. That is, chaff can always be added to increase the number of combinations and thus reduce the likelihood of finding the right combination of blocks to reveal the message. It is easy to show that when the total number of blocks is about the same as the number of bits in the key, it is as hard to find the right combination of blocks that make up the message as it is to find the message key itself. For example, for a 64-bit key length, if the message takes less than 64 blocks, extra random byte blocks can be injected to foil an attacker. Again, the random blocks take almost no effort on the sender's part, but they complicate the task of an adversary because he has no idea which blocks are the message ones and which are random.
Adding More Messages
Adding random blocks of data at random block positions increases the number of possible combinations of blocks that could comprise a message. This, in turn, makes it hard for an adversary to find the one correct combination. The reason it is not very hard for the legitimate recipient to find the right combination of blocks is that he is in possession of the key that lets him discard the random blocks from consideration. Now here is a final twist to add to this whole scenario.
The random pairs are random in the sense that their HMACs do not match their data blocks. (The chance of a match is 2(-n) where n is the number of bits in the HMAC, negligible even for n=64.) But suppose instead of being truly random, some of our extra pairs are really paired using a different key, say K2. To the holder of the first key, K1, the pairs for K2 may as well be truly random since they get discarded just like the true random pairs. But to the holder of K2 the pairs for K1 might as well be truly random -- they get thrown away because they mismatch using K2 as the key!
There are now three types of data-HMAC pairs:
1. Those that match using K1.
2. Those that match using K2.
3. Those that do not match using either K1 or K2.
With key K1, only those in category 1. will match. The pairs for categories 2. and 3. look like chaff. Likewise, with K2 as key, only the pairs from category 2. will match and those in 1. and 3. will look like chaff. And with any other key, in all likelihood all the pairs will look like chaff!
If another message is packaged in the pairs keyed with K2, the result is a single transmission containing two independent messages, as Figure 3 illustrates. Decrypting with K1 will expose (only) the first message, decrypting with K2 will expose the second message, and these are the only keys that result in meaningful messages.
But why stop with two keys? You could package another message using K3 and it would be revealed only to the holders of K3. In fact, the only disadvantage of combining too many messages in one is the remote possibility that with n messages, an adversary attempting an exhaustive key search is n times more likely to stumble upon one of the keys by chance than if there were only one message. For a 64-bit key, that means 28=256 messages could be interwoven, resulting in the same security, 56 bits, as DES.
Implementation
We have coded MMPC encryption/decryption in C and present the driver programs mmpc_snd for encrypting, and its companion mmpc_rec for decrypting. The supporting code for Blowfish, MD5, SHA, HMAC, the package transform, hashing, and the like, and a Makefile for building the programs is available electronically; see "Resource Center, page 5). This code uses an 8-byte data block size and either 16 bytes for an MD5 HMAC or 20 bytes for an SHA HMAC. The data block size is up to the user: We chose 8 bytes because this is the block size for Blowfish (see Schneier), the encryption used in the package transform.
The interface for encrypting with MMPC is straightforward. The user tells the program how many files he wants to encrypt and gives the name and a key for each file. Then the user enters the number of additional chaff blocks to add to the mix. Once a name for the output file is entered, the program does the packaging for each message and writes the encrypted output file including the chaff blocks.
The HMAC uses the algorithm proposed by M. Bellare et al. (see Keying Hash Functions and Message Authentication, http:// www.research.ibm.com/security/keyed-md5.html) and is described by H. Krawczyk et al. (HMAC: Keyed-Hashing for Message Authentication, ftp://ds.internic.net/ rfc/rfc2104.txt). This algorithm modifies a nonkeyed message digest. Our code can use either Rivest's 128-bit MD5 or SHA 160 bits, through a conditional compilation. (#define SHA for that message digest. If SHA is not defined, MD5 is used.)
The Encrypting Program
Once the user enters the number of files to encrypt, the encrypting program, mmpc_snd. (available electronically), seeds the random-number generator and initializes the Blowfish arrays through a call to package_one_time_set(). We have borrowed from Schneier a linear congruential pseudorandom number generator with a period greater than 1018. This generator takes two seeds, and we have simply used the time for one seed and hard-coded the other. You may want to have the user enter both seeds for increased security.
In the loop over the number of files, the function file_info() is called. This function serves a dual purpose, asking for and storing the name and key for each file in a structure of type FILE_INFO, and returning the number of blocks the file will require. There is an array of these structures with an element for each file. These elements contain all the file-specific data, which includes the Blowfish arrays used for packaging, and the accumulator for the last packaging block.
After the user supplies the number of chaff blocks to intersperse with the data blocks, the program checks and increases this number to 64, if necessary, to make sure there is enough random data to foil an adversary. Then it determines and stores the computer's byte order or endianness, as the message digests are sensitive to byte ordering.
The last step before actual encryption is to set up the random ordering of the output pairs -- the matching of output blocks with messages. This is a little tricky. First, memory is set aside for two arrays, ran_array_p and block_indices_p. Both arrays have as many elements as there will be pairs of blocks and HMACs in the encrypted file. ran_array_p is initialized with the numbers 0,1,2,..., total_blocks-1. block_indices_p is initialized with a constant value, namely the number of files being encrypted. This value is used as a flag. Those elements that will be used for message data (as opposed to chaff) will be reset to the index of the appropriate message. The tricky step is the one that reloads the elements of block_indices_p with the indices of the files to encrypt. The nested loops in Example 1(a) can be read as Example 1(b).
The get_ran() function returns a random element of the array ran_array_p and takes it out of circulation, so that all of the numbers from 0 to 1 less than the total number of blocks are used exactly once with no replication. get_ran() replaces the used element by the present last element, then, because the array size is passed by reference, it decrements the size of the array, ready for the next call. That this is done in a fair way with all numbers equally likely at every stage can be shown as follows: If there are n numbers initially, the random number generator picks any one with probability 1/n. At the next call to get_ran there are n-1 numbers to choose from -- the one picked on the first call has been removed. The chance of getting any of the available numbers on this second call is the joint probability that the number was not chosen the first time times the chance of getting it the second time or ((n-1)/n)*(1/(n-1))=1/n. The probability on the next call is ((n-1)/n)* ((n-2)/(n-1))*(1/(n-2))=1/n and so forth. Every number that is chosen is chosen with probability 1/n.
When the nested loops are complete, the block_indices_p array is populated with the indices of each of the files to be encrypted, with each index appearing in random positions and as many times as there will be block pairs for that file. Any leftover elements represent the position of the chaff pairs; these are the elements that did not get reset from their initial flag value.
The rest of the main program does the reading, in the specified (random) order, of the data blocks and the writing of the data-HMAC pairs. Example 2 is the pseudocode implementation of this.
If you look at mmpc_snd, you will see that care is required to take care of the padding bytes and the package transform. In particular, the next to last block for each file is a pad. If the number of bytes in the file is not divisible by 8 (the number in a data block) then the leftover bytes are in this last block and the last byte in the block is the number of bytes that are not data bytes. Using this same logic, if the number of bytes is divisible by 8, then an additional block is necessary whose only significant byte is its final one, 8, the number that are not data bytes.
The last data block for each file is simply the auxiliary file-specific packaging block, ms_p, which is accumulated, one for each file, as the data in the file is transformed. This last block is complete when the (preceding) padding block is transformed and so is always ready when the time comes to write the last block-HMAC pair for each file.
The Decrypting Program
The companion program used to decrypt the data is mmpc_rec (available electronically). Because both the encrypting and decrypting programs use the SHA macro to choose between the MD5 and SHA message digests, make sure to either #define SHA in both programs to use SHA, or not to define SHA in either program to use MD5. mmpc_rec expects two parameters on the command line: the key for the desired message in hexadecimal, and the name of the encrypted file, the one output from mmpc_snd. Entering the program name with no parameters produces the following reminder: Usage: mmpc_rec <HMAC key> <chaff file>.
The preliminaries are straightforward:
1. Open encrypted file for reading.
2. stat() the file to get its size.
3. Convert the key ASCII hex characters to bytes using the strtol() function.
4. Exit if the file is not an integral number of pairs.
5. Set up a hash to hold the indices of the "good" pairs.
6. Set up the inverse package transform.
We use a hash to hold the indices of any pairs that turn out to be keepers, that is, any block whose paired HMAC matches the one we calculate from the key. Another way to hold the indices would be to create an array with as many elements as there are pairs in the encrypted file, knowing that this is overkill, and then save the indices of the keepers in this array. Either way, this saves having to recalculate the HMACs for each data block on the second pass through the file.
The for loop with the comment "1st pass" serves two purposes. First, as each data block and its HMAC are read from the file, the HMAC of the data is calculated using the key. If the calculated and file HMACs match, this data block is part of the message and its index is saved for the next pass. The second purpose of this pass is to construct the key that was used in the package transform.
Because the message is package transformed, it cannot be read until its key is revealed, and that can only be done by reading all the blocks that comprise the transformed message. The key is built using the function rev_package_build_kp(). Care must be exercised here because the key is constructed using all the matching data blocks save for the last one, and the only way of knowing when you have reached the last one is when you have processed the entire encrypted file. We use two arrays, block_p and previous_block_p, and, starting with the second keeper, we process the previous block. If at the end of the first pass there are no keeper blocks, the program exits with a message that the key is invalid.
The second pass through the encrypted file is now straightforward; see Example 3. Listing One shows a sample run of these two programs, encrypting two files and adding chaff to the mix. Listing One (a) is the encryption run, while Listing One(b) is the decryption. The key "cafebabecafebabe" reveals the intended message "I believe that Colonel Mustard did it," the key "f00df00df00df00d" reveals the phony message "There is no question that the butler is the culprit," and any other key reveals no message.
Conclusion
MMPC breaks each message into data blocks of a fixed size and appends a keyed message authentication code to each block on transmission. So as not to expose any of the individual message blocks, and to complicate unauthorized decryption by nonkey holders, the data is first filtered using Rivest's package transform. Then, to further complicate decryption, bogus blocks (or chaff) consisting of random bytes are interspersed randomly with the actual data block-HMAC pairs. The security of MMPC hinges on the algorithm used for the keyed MAC.
DDJ
Listing One
(a)
felis:~/crypt/ddj$ mmpc_snd Using hmac_sha for digest. How many files do you want to package-chaff ? 2 Enter name of file 1: real.msg Data file "real.msg" (39 bytes) requires 6 (8) byte data blocks. Enter key (in hexadecimal) for real.msg : cafebabecafebabe Enter name of file 2: fake.msg Data file "fake.msg" (53 bytes) requires 8 (8) byte data blocks. Enter key (in hexadecimal) for fake.msg : f00df00df00df00d How many chaff blocks do you want to add ? 245 Total blocks = 259 Enter name of ciphertext file : chaffed (b) <pre>felis:~/crypt/ddj$ mmpc_rec dead12345678dead chaffed Expecting hmac_sha for digest. There are 259 blocks each containing 8 bytes of data and a 20 byte HMAC. Sorry, there is no match for "dead12345678dead" in file "chaffed". felis:~/crypt/ddj$ mmpc_rec cafebabecafebabe chaffed Expecting hmac_sha for digest. There are 259 blocks each containing 8 bytes of data and a 20 byte HMAC. I believe that Colonel Mustard did it. felis:~/crypt/ddj$ mmpc_rec f00df00df00df00d chaffed 2>/dev/null There is no question that the butler is the culprit.