Brent's Technique
Brent's technique (Listing Two) is a method for optimizing the time complexity of successful LOOKUP operations.
INSERT IS address = H1(data) depth = 0 while table[address] isn't empty address = address + H2(data) increment(depth) end while if depth < 2 then table[address] = data else address = H1(data) for i = 0 to depth address2 = address + H2(table[address]) depth2 = 0 while depth2 <= i and table[address2] isn't empty address2 = address + H2(table[address]) increment(depth2) end while if depth2 <= i table[address2] = table[address] return end if address = address + H2(data) end for table[address] = data end if insert data in table[address]
For certain applications (notably compilers), the number of successful LOOKUPs is overwhelmingly larger than the number of INSERTs. Brent's technique works by spending more time inserting objects into the table in hopes of making the table better suited for LOOKUP. A C++ implementation is shown in Listing Three.
#include <stdio.h> #if !defined(_ZTC_) #include <= stream.h> // Bad old UNIX doesn't have ansi headers #define UINT_MAX ((unsigned)-1) typedef unsigned size_t; // should be correct most places. #else #include <= stream.hpp>= #include <= limits.h>= #endif #include "hashtab.hpp" // find_tabsize returns the next twin prime >= the desired table size static size_t find_tabsize(size_t tabsize) { // Each element of primes is the lower twin of a twin prime. The table // also has the property that primes[i] >= (primes[i]+100) to keep the // table size down. static unsigned primes[] = { 5, 107, 227, 347, 461, 569, 809, 1019, 1151, 1277, 1427, 1607, 1721, 1871, 1997, 2111, 2237, 2339, 2549, 2657, 2789, 2969, 3119, 3251, 3359, 3461, 3581, 3767, 3917, 4019, 4127, 4229, 4337, 4481, 4637, 4787, 4931, 5099, 5231, 5417, 5519, 5639, 5741, 5849, 6089, 6197, 6299, 6449, 6551, 6659, 6761, 6869, 7127, 7307, 7457, 7559, 7757, 7877, 8009, 8219, 8387, 8537, 8819, 8969, 9239, 9341, 9461, 9629, 9767, 9929, 10037, 10139, 10271, 10427, 10529, 10709, 10859, 11057, 11159, 11351, 11489, 11699, 11831, 11939,12041, 12161, 12377, 12539, 12821, 13001, 13217, 13337, 13679, 13829, 13931, 14081,14249, 14387, 14549, 14867, 15137, 15269, 15581, 15731, 15887, 16061, 16187, 16361,16631, 16829, 16979, 17189, 17291, 17417, 17579, 17681, 17789, 17909, 18041, 18251,18521, 18911, 19079, 19181, 19379, 19541, 19697, 19841, 19961, 20147, 20357, 20477,20639, 20747, 20897, 21011, 21191, 21317, 21491, 21599, 21737, 21839, 22037, 22157,22271, 22481, 22619, 22739, 22859, 22961, 23201, 23369, 23537, 23669, 23831, 24107,24371, 24917, 25031, 25169, 25301, 25409, 25577, 25799, 25931, 26111, 26249, 26681,26861, 27059, 27239, 27407, 27527, 27689, 27791, 27917, 28097, 28277, 28409, 28547,28661, 29021, 29129, 29387, 29567, 29669, 29879, 30011, 30137, 30269, 30389, 30491,30839, 31079, 31181, 31319, 31511, 31721, 31847, }; # define PTABSIZ (sizeof(primes)/sizeof(size_t)) int i; for(i = 0; i <= PTABSIZ; i++) if(tabsize <= = primes[i]) return primes[i] + 2; cerr < < "Table size too big!_n"; exit(-1); } #define empty ((void *) 0) #define deleted ((void *) -1) // constructor for the hashtab class hashtab::hashtab(size_t tablesize, size_t (*map)(void *element), int (*compare)(void *e1, void *e2)) { // round up table size to nearest twin prime this->tablesize = find_tabsize(tablesize); // allocate the table table = new void *[this->tablesize]; if(table == NULL) { cerr << "Out of memory in hashtab constructor"; exit(1); } // clear the table for(int i = 0; i < this->tablesize; i++) table[i] = empty; // store comparison and mapping functions this->map = map; this->compare = compare; // clear insertion count inserted = 0; } // insert an element into the hash table, using brent's technique. void hashtab::insert(void *element) { size_t mapped_val,h1,h2,cur,depth,i; size_t c2,cur2,depth2; // test predicate for finding an empty slot in the table. # define VACANT(x) (table[(x)] == empty || table[(x)] == deleted) // check for table full -- always leave at least // one empty slot in the table, or you'll search endlessly for // keys not in a full table! if(++inserted == tablesize) { cerr << "hash table full_n"; exit(1); } // map key to value and do primary hash mapped_val = map(element); h1 = mapped_val % tablesize; // secondary hash h2 = (mapped_val % (tablesize - 2)) + 1; // compute depth of collision list for(cur = h1, depth = 0; !VACANT(cur); cur = (cur + h2) % tablesize) { // if the element is already in the table then replace it and // return if(compare(element,table[cur]) == 0) { table[cur] = element; return; } depth++; } // if we aren't too deep, just jam the element here. if(depth < 2) { table[cur] = element; return; } for(i = 0,cur = h1; i < depth; i++,cur = (h1+h2) % tablesize) { // secondary hash of current element of collision list c2 = (map(table[cur]) % (tablesize - 2)) + 1; // see if this table element would be displaced fewer // positions than the object being inserted. for(depth2 = 0,cur2 = (cur + c2<=) % tablesize; !VACANT(cur2) && depth2 <= i; depth2++,cur2 = (cur2 + c2) % tablesize) ; // if we're still going to improve matters by inserting here if(depth2 <= i) { table[cur2] = table[cur]; break; } } // postcondition : cur is the index of an empty slot in which // to place element. table[cur] = element; } #define NOTFOUND (UINT_MAX) // lookup function private to hashtab class. size_t hashtab::_lookup(void *element) { size_t mapped_val,h1,h2,cur; // map key to value and do primary hash mapped_val = map(element); h1 = mapped_val % tablesize; // secondary hash. h2 = (mapped_val % (tablesize - 2)) + 1; for(cur = h1; table[cur] != empty && (table[cur] == deleted ? 1 : compare(element,table[cur] != 0); cur = cur + h2) % tablesize) ; return (table[cur] == empty ? NOTFOUND : cur); } // find an element in the table void *hashtab::lookup(void *element) { size_t index = _lookup(element); return (index == NOTFOUND ? empty : table[index]); } // remove an element from the table. void hashtab::remove(void *element) { int x; if(inserted-- == 0) { cerr << "deletion from empty hash table_n"; exit(1); } x = _lookup(element); if(x != NOTFOUND) table[x] = deleted; } #if defined(DEBUG_HASHTAB) && defined(_ZTC_) void hashtab::dump_tab(void (*display)(void *e)) { size_t i; for(i = 0; i < tablesize; i++) { if(table[i] == empty) cout << "empty"; else if(table[i] == deleted) cout << "deleted"; else display(table[i]); if (i && (i % 5 == 0)) cout.put('_n'); else cout.put(' '); } if (i % 5 == 0) cout.put('_n'); } #endif
First, compute the length of the collision list for the object being inserted. If the length is less than two, simply insert the object at the end of its collision list.
If the depth of the collision list is greater than two, look at each element in the list. If the first element would only need to be moved one place down the list, move it and replace it with the object to be inserted. If the second element would have to be moved fewer than two places down its collision list, replace it.
In general, if the nth element in an object's collision list would have to be moved fewer than n places down its own collision list, replace it. This increases the access time for the object being displaced but decreases the access time for the object being inserted, giving you a net improvement over the straight Algorithm D.
Since the worst-case performance of LOOKUP is directly proportional to the length of the longest collision list, the updated table can be searched more quickly because each insertion balances the collision list lengths as fairly as possible. The proof of the superiority of Brent's technique over Algorithm D is, as they say, left as an exercise for the reader.
If you are interested in hashing in general, read the chapter on hashing in Knuth's book. The treatment is complete without being tedious.