William is a consultant, lecturer, and author of books on data communications and computer networking. His most recent book is Cryptography and Network Security: Principles and Practice, Second Edition (Prentice-Hall, 1998). He can be contacted at http://www.shore.net/~ws.
Message authentication is a procedure that allows communicating parties to verify that received messages are authentic. The two important aspects are verifying that the contents of the message have not been altered and that the source is authentic. The Message Authentication Code (MAC) is a widely used technique for performing message authentication. A variation on the MAC algorithm has emerged as an Internet standard for a wide variety of applications -- HMAC, short for "Keyed-Hashing for Message Authentication." HMAC was introduced in "Keying Hash Functions for Message Authentication," by M. Bellare, R. Canetti, and H. Krawczyk in Proceedings, CRYPTO '96 (Springer-Verlag,andathttp://wwwcse .ucsd.edu/users/mihir). HMAC is currently an Internet draft that has been distributed by the Internet Engineering Task Force as Request For Proposal (RFP) 2104.
MAC
MAC algorithms involve the use of a secret key to generate a small block of data, known as a "message authentication code," that is appended to the message. This technique assumes that two communicating parties, say A and B, share K as a secret key. When A has a message to send to B, it calculates the message authentication code as a function of the message and the key. The message and code are transmitted to the intended recipient. The recipient performs the same calculation on the received message, using the same secret key, to generate a new message authentication code. The received code is compared to the calculated code; see Figure 1. If you assume that only the receiver and the sender know the identity of the secret key, and if the received code matches the calculated code, then:
1. The receiver is assured that the message has not been altered. If an attacker alters the message but does not alter the code, then the receiver's calculation of the code will differ from the received code. Because the attacker is assumed not to know the secret key, the attacker cannot alter the code to correspond to the alterations in the message.
2. The receiver is assured that the message is from the alleged sender. Because no one else knows the secret key, no one else could prepare a message with the proper code.
3. If the message includes a sequence number (such as is used with X.25, HDLC, and TCP), then the receiver can be assured of the proper sequence, because an attacker cannot successfully alter the sequence number.
A number of algorithms could be used to generate the code. The National Bureau of Standards (NBS), in its publication DES Mode of Operation (http://www.nist .gov/itl/div897/pubs/fip113.htm), recommends the use of the DES algorithm. The DES algorithm is used to generate an encrypted version of the message, and the last bits of ciphertext are used as the code. A 16- or 32-bit code is typical. HMAC is a more efficient, and increasingly popular, alternative.
The process just described is similar to encryption. One difference is that the authentication algorithm does not need to be reversible, as it must for decryption. It turns out that because of the mathematical properties of the authentication function, it is less vulnerable than encryption to being broken.
HMAC
In recent years, there has been increased interest in developing a MAC derived from a cryptographic hash code, such as MD5, SHA-1, or RIPEMD-160. The motivations for this interest are:
- Cryptographic hash functions generally execute faster in software than symmetric block ciphers such as DES.
- Library code for cryptographic hash functions is widely available.
- There are no export restrictions for cryptographic hash functions, whereas symmetric block ciphers, even when used for MACs, are restricted.
A hash function such as MD5 was not designed for use as a MAC and cannot be used directly for that purpose because it does not rely on a secret key. There have been a number of proposals to incorporate a secret key into an existing hash algorithm. HMAC received the most support. HMAC has been chosen as the mandatory-to-implement MAC for IP Security, and is used in other Internet protocols, such as Transport Layer Security (TLS, soon to replace Secure Sockets Layer) and Secure Electronic Transaction (SET).
HMAC-Design Objectives
Design objectives that RFC 2104 lists for HMAC include:
- To use, without modifications, available hash functions. In particular, hash functions that perform well in software, and for which code is freely and widely available.
- To allow for easy replaceability of the embedded hash function in case faster or more secure hash functions are found or required.
- To preserve the original performance of the hash function without incurring a significant degradation.
- To use and handle keys in a simple way.
- To have a well understood cryptographic analysis of the strength of the authentication mechanism based on reasonable assumptions on the embedded hash function.
The first two objectives are important to the acceptability of HMAC. HMAC treats the hash function as a black box. This has two benefits. First, an existing implementation of a hash function can be used as a module in implementing HMAC. The bulk of the HMAC code is prepackaged and ready to use without modification. Second, to replace a given hash function in an HMAC implementation, you must simply remove the existing hash function module and drop in the new module. This could be done if a faster hash function were desired. More important, if the security of the embedded hash function were compromised, the security of HMAC could be retained simply by replacing the embedded hash function with a more secure one (replacing MD5 with SHA-1, for example).
The last design objective in the preceding list is, in fact, the main advantage of HMAC over other proposed hash-based schemes. HMAC can be proven secure provided that the embedded hash function has some reasonable cryptographic strengths.
The HMAC Algorithm
Figure 2 illustrates the overall operation of HMAC (see Table 1 for definition of the terms in Figure 2). HMAC can then be expressed; see Figure 3. In other words:
1. Append zeros to the left end of K to create a b-bit string K+ (for example, if K is of length 160 bits and b = 512, then K will be appended with 44 zero bytes 0x00).
2. XOR (bitwise exclusive OR) K+ with ipad to produce the b-bit block Si.
3. Append M to Si.
4. Apply H to the stream generated in Step 3.
5. XOR K+ with opad to produce the b-bit block So.
6. Append the hash result from Step 4 to So.
7. Apply H to the stream generated in Step 6 and output the result.
Note the XOR with ipad results in flipping one-half of the bits of K. Similarly, the XOR with opad results in flipping one-half of the bits of K, but a different set of bits. In effect, by passing Si and So through the compression function of the hash algorithm, you have pseudorandomly generated two keys from K. HMAC should execute in approximately the same time as the embedded hash function for long messages. HMAC adds three executions of the hash compression function (for Si, So, and the block produced from the inner hash).
HMAC Security
The security of any MAC function based on an embedded hash function depends in some way on the cryptographic strength of the underlying hash function. The appeal of HMAC is that its designers have been able to prove an exact relationship between the strength of the embedded hash function and the strength of HMAC. The security of a MAC function is generally expressed in terms of the probability of successful forgery with a given amount of time spent by the forger and a given number of message-MAC pairs created with the same key. In essence, it can be shown that, for a given level of effort (time, message-MAC pairs), on messages generated by legitimate users and seen by attackers, the probability of a successful attack on HMAC is equivalent to one of the following attacks on the embedded hash function:
1. Attackers are able to compute an output of the compression function even with an Initial Value (IV) that is random, secret, and unknown to attackers.
2. Attackers find collisions in the hash function even when the IV is random and secret.
In the first attack, you can view the compression function as equivalent to the hash function applied to a message consisting of a single b-bit block. For this attack, the IV of the hash function is replaced by a secret, random value of n bits. An attack on this hash function requires either a brute-force attack on the key, which is a level of effort on the order of 2n, or a birthday attack, which is a special case of the second attack.
In the second attack, attackers are looking for two messages, M and M',that produce the same hash: H(M)=H(M'). This requires a level of effort of 2n/2 for a hash length of n. On this basis, the security of MD5 is called into question, because a level of effort of 264 looks feasible with today's technology. Does this mean that a 128-bit hash function such as MD5 is unsuitable for HMAC? The answer is no. To attack MD5, attackers can choose any set of messages and work on these offline on a dedicated computing facility to find a collision. Because attackers know the hash algorithm and the default IV, attackers can generate the hash code for each of the messages that attackers generate. However, when attacking HMAC, attackers cannot generate message/code pairs offline because attackers do not know K. Therefore, attackers must observe a sequence of messages generated by HMAC under the same key and perform the attack on these known messages. For a hash code length of 128 bits, this requires 264 observed blocks (273 bits) generated using the same key. On a 1-Gbps link, you would need to observe a continuous stream of messages with no change in the key for about 250,000 years to succeed. Thus, if speed is a concern, it is fully acceptable to use MD5 rather than SHA-1 or RIPEMD-160 as the embedded hash function for HMAC.
Listing One, the appendix to RFC 2104, is sample code for the implementation of HMAC with MD5. Listing Two (also from RFC 2104) presents test vectors for Listing One (trailing '\0' of a character string not included).
DDJ
Listing One
/* Function: hmac_md5 */ void hmac_md5(text, text_len, key, key_len, digest) unsigned char* text; /* pointer to data stream */ int text_len; /* length of data stream */ unsigned char* key; /* pointer to authentication key */ int key_len; /* length of authentication key */ caddr_t digest; /* caller digest to be filled in */ { MD5_CTX context; unsigned char k_ipad[65]; /* inner padding - key XORd with ipad */ unsigned char k_opad[65]; /* outer padding - key XORd with opad */ unsigned char tk[16]; int i; /* if key is longer than 64 bytes reset it to key=MD5(key) */ if (key_len > 64) { MD5_CTX tctx; MD5Init(&tctx); MD5Update(&tctx, key, key_len); MD5Final(tk, &tctx); key = tk; key_len = 16; } /* the HMAC_MD5 transform looks like: * MD5(K XOR opad, MD5(K XOR ipad, text)) * where K is an n byte key * ipad is the byte 0x36 repeated 64 times * opad is the byte 0x5c repeated 64 times * and text is the data being protected */ /* start out by storing key in pads */ bzero( k_ipad, sizeof k_ipad); bzero( k_opad, sizeof k_opad); bcopy( key, k_ipad, key_len); bcopy( key, k_opad, key_len); /* XOR key with ipad and opad values */ for (i=0; i<64; i++) { k_ipad[i] ^= 0x36; k_opad[i] ^= 0x5c; } /* perform inner MD5 */ MD5Init(&context); /* init context for 1st pass */ MD5Update(&context, k_ipad, 64) /* start with inner pad */ MD5Update(&context, text, text_len); /* then text of datagram */ MD5Final(digest, &context); /* finish up 1st pass */ /* perform outer MD5 */ MD5Init(&context); /* init context for 2nd pass */ MD5Update(&context, k_opad, 64); /* start with outer pad */ MD5Update(&context, digest, 16); /* then results of 1st hash */ MD5Final(digest, &context); /* finish up 2nd pass */ }
Listing Two
key = 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b key_len = 16 bytes data = "Hi There" data_len = 8 bytes digest = 0x9294727a3638bb1c13f48ef8158bfc9d key = "Jefe" data = "what do ya want for nothing?" data_len = 28 bytes digest = 0x750c783e6ab0b503eaa86e310a5db738 key = 0xAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA key_len 16 bytes data = 0xDDDDDDDDDDDDDDDDDDDD... ..DDDDDDDDDDDDDDDDDDDD... ..DDDDDDDDDDDDDDDDDDDD... ..DDDDDDDDDDDDDDDDDDDD... ..DDDDDDDDDDDDDDDDDDDD data_len = 50 bytes digest = 0x56be34521d144c88dbb8c733f0e8b3f6
Copyright © 1999, Dr. Dobb's Journal