C/C++

Tables Within Tables: C++ and Brent's Technique

By Kent Williams, June 03, 2007

Brent's Technique

Brent's technique (Listing Two) is a method for optimizing the time complexity of successful LOOKUP operations.

INSERT IS
  address = H1(data)
  depth = 0
  while table[address] isn't empty
    address = address + H2(data)
    increment(depth)
  end while
  if depth < 2 then
    table[address] = data
  else
    address = H1(data)
    for i = 0 to depth
      address2 = address + H2(table[address])
      depth2 = 0
      while depth2 <= i and table[address2] isn't empty
        address2 = address + H2(table[address])
        increment(depth2)
      end while
      if depth2 <=  i
        table[address2] = table[address]
        return
      end if
      address = address + H2(data)
    end for
    table[address] = data
  end if
  insert data in table[address]

Listing Two: Pseudocode for Brent's technique.

For certain applications (notably compilers), the number of successful LOOKUPs is overwhelmingly larger than the number of INSERTs. Brent's technique works by spending more time inserting objects into the table in hopes of making the table better suited for LOOKUP. A C++ implementation is shown in Listing Three.

#include <stdio.h>
#if !defined(_ZTC_)
#include <= stream.h>
// Bad old UNIX doesn't have ansi headers
#define UINT_MAX ((unsigned)-1)
typedef unsigned size_t;  // should be correct most places.
#else
#include <= stream.hpp>= 
#include <= limits.h>= 
#endif
#include "hashtab.hpp"

// find_tabsize returns the next twin prime >= the desired table size
static size_t find_tabsize(size_t tabsize) {
// Each element of primes is the lower twin of a twin prime.  The table
// also has the property that primes[i] >= (primes[i]+100) to keep the
// table size down.
  static unsigned primes[] = {
  5, 107, 227, 347, 461, 569, 809, 1019, 1151, 1277, 1427, 1607, 1721,
  1871, 1997, 2111, 2237, 2339, 2549, 2657, 2789, 2969, 3119, 3251, 3359,
  3461, 3581, 3767, 3917, 4019, 4127, 4229, 4337, 4481, 4637, 4787, 4931,
  5099, 5231, 5417, 5519, 5639, 5741, 5849, 6089, 6197, 6299, 6449, 6551,
  6659, 6761, 6869, 7127, 7307, 7457, 7559, 7757, 7877, 8009, 8219, 8387,
  8537, 8819, 8969, 9239, 9341, 9461, 9629, 9767, 9929, 10037, 10139, 10271,
  10427, 10529, 10709, 10859, 11057, 11159, 11351, 11489, 11699, 11831, 11939,12041,
  12161, 12377, 12539, 12821, 13001, 13217, 13337, 13679, 13829, 13931, 14081,14249,
  14387, 14549, 14867, 15137, 15269, 15581, 15731, 15887, 16061, 16187, 16361,16631,
  16829, 16979, 17189, 17291, 17417, 17579, 17681, 17789, 17909, 18041, 18251,18521,
  18911, 19079, 19181, 19379, 19541, 19697, 19841, 19961, 20147, 20357, 20477,20639, 
  20747, 20897, 21011, 21191, 21317, 21491, 21599, 21737, 21839, 22037, 22157,22271,
  22481, 22619, 22739, 22859, 22961, 23201, 23369, 23537, 23669, 23831, 24107,24371,
  24917, 25031, 25169, 25301, 25409, 25577, 25799, 25931, 26111, 26249, 26681,26861,
  27059, 27239, 27407, 27527, 27689, 27791, 27917, 28097, 28277, 28409, 28547,28661,
  29021, 29129, 29387, 29567, 29669, 29879, 30011, 30137, 30269, 30389, 30491,30839,
  31079, 31181, 31319, 31511, 31721, 31847,
  };
#  define PTABSIZ (sizeof(primes)/sizeof(size_t))

  int i;
  for(i = 0; i <=  PTABSIZ; i++)
    if(tabsize <= = primes[i])
      return primes[i] + 2;
  cerr < < "Table size too big!_n";
  exit(-1);
}
#define empty ((void *) 0)
#define deleted ((void *) -1)

// constructor for the hashtab class
hashtab::hashtab(size_t tablesize,
    size_t (*map)(void *element),
    int (*compare)(void *e1, void *e2)) {
  // round up table size to nearest twin prime
  this->tablesize = find_tabsize(tablesize);
  // allocate the table
  table = new void *[this->tablesize];
  if(table == NULL) {
    cerr << "Out of memory in hashtab constructor";
    exit(1);
  }
  // clear the table
  for(int i = 0; i < this->tablesize; i++)
    table[i] = empty;
  // store comparison and mapping functions
  this->map = map;
  this->compare = compare;
  // clear insertion count
  inserted = 0;
}

// insert an element into the hash table, using brent's technique.
void hashtab::insert(void *element) {
  size_t mapped_val,h1,h2,cur,depth,i;
  size_t c2,cur2,depth2;

  // test predicate for finding an empty slot in the table.
#  define VACANT(x) (table[(x)] == empty || table[(x)] == deleted)
  // check for table full -- always leave at least
  // one empty slot in the table, or you'll search endlessly for
  // keys not in a full table!
  if(++inserted == tablesize) {
    cerr << "hash table full_n";
    exit(1);
  }

  // map key to value and do primary hash
  mapped_val = map(element);
  h1 = mapped_val % tablesize;
  // secondary hash
  h2 = (mapped_val % (tablesize - 2)) + 1;
  // compute depth of collision list
  for(cur = h1, depth = 0; !VACANT(cur); cur = (cur + h2) % tablesize) {
    // if the element is already in the table then replace it and
    // return
    if(compare(element,table[cur]) == 0) {
      table[cur] = element;
      return;
    }
    depth++;
  }
  // if we aren't too deep, just jam the element here.
  if(depth < 2) {
    table[cur] = element;
    return;
  }
  for(i = 0,cur = h1; i < depth; i++,cur = (h1+h2) % tablesize) {
    // secondary hash of current element of collision list
    c2 = (map(table[cur]) % (tablesize - 2)) + 1;
    // see if this table element would be displaced fewer
    // positions than the object being inserted.
    for(depth2 = 0,cur2 = (cur + c2<=) % tablesize;
      !VACANT(cur2) && depth2 <=  i;
      depth2++,cur2 = (cur2 + c2) % tablesize)
      ;
    // if we're still going to improve matters by inserting here
    if(depth2 <=  i) {
      table[cur2] = table[cur];
      break;
    }
  }
  // postcondition : cur is the index of an empty slot in which
  // to place element.
  table[cur] = element;
}

#define NOTFOUND (UINT_MAX)

// lookup function private to hashtab class.
size_t hashtab::_lookup(void *element) {
  size_t mapped_val,h1,h2,cur;
  // map key to value and do primary hash
  mapped_val = map(element);
  h1 = mapped_val % tablesize;
  // secondary hash.
  h2 = (mapped_val % (tablesize - 2)) + 1;
  for(cur = h1;
    table[cur] != empty &&
    (table[cur] == deleted ? 1 : compare(element,table[cur] != 0);
    cur = cur + h2) % tablesize)  
    ;
  return (table[cur] == empty ? NOTFOUND : cur);
}

// find an element in the table
void *hashtab::lookup(void *element) {
  size_t index = _lookup(element);
  return (index == NOTFOUND ? empty : table[index]);
}

// remove an element from the table.
void hashtab::remove(void *element) {
  int x;
  if(inserted-- == 0) {
    cerr << "deletion from empty hash table_n";
    exit(1);
  }
  x = _lookup(element);
  if(x != NOTFOUND)
    table[x] = deleted;
}

#if defined(DEBUG_HASHTAB) && defined(_ZTC_)
void hashtab::dump_tab(void (*display)(void *e)) {
  size_t i;
  for(i = 0; i < tablesize; i++) {
    if(table[i] == empty)
      cout << "empty";
    else if(table[i] == deleted)
      cout << "deleted";
    else
      display(table[i]);
    if (i && (i % 5 == 0))
      cout.put('_n');
    else
      cout.put(' ');
  }
  if (i % 5 == 0)
    cout.put('_n');
}
#endif

Listing Three: Brent.CPP.

First, compute the length of the collision list for the object being inserted. If the length is less than two, simply insert the object at the end of its collision list.

If the depth of the collision list is greater than two, look at each element in the list. If the first element would only need to be moved one place down the list, move it and replace it with the object to be inserted. If the second element would have to be moved fewer than two places down its collision list, replace it.

In general, if the n^th element in an object's collision list would have to be moved fewer than n places down its own collision list, replace it. This increases the access time for the object being displaced but decreases the access time for the object being inserted, giving you a net improvement over the straight Algorithm D.

Since the worst-case performance of LOOKUP is directly proportional to the length of the longest collision list, the updated table can be searched more quickly because each insertion balances the collision list lengths as fairly as possible. The proof of the superiority of Brent's technique over Algorithm D is, as they say, left as an exercise for the reader.

If you are interested in hashing in general, read the chapter on hashing in Knuth's book. The treatment is complete without being tedious.

Previous 1 2 3 4 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

C/C++

Tables Within Tables: C++ and Brent's Technique

Brent's Technique

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

C/C++

Tables Within Tables: C++ and Brent's Technique

Brent's Technique

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content