C++ and Brent's Technique
The class mechanism of C++ is perfect for hiding the details of implementing an abstract data type. A class has public and private parts; put the data type's permissible operations in the public part and hide all the nasty details in the private part.
The ideal hash table class would be totally generic and manage arbitrary objects without regard for their contents. This isn't possible because C++ is not a pure object-oriented language. You can't pass types as parameters to a constructor. The hash table class needs to know some things about the objects it is trying to manage, such as what empty and deleted objects look like and how to map an object to an address and decide if two objects are equal.
The solution I found (Listing Four) is similar to the usual C technique of using function pointers to implement generic functions. The hash table contains pointers to objects instead of the objects themselves. A pointer to comparison and mapping functions must be passed to the constructor for the type.
#define DEBUG_HASHTAB // hashtab.hpp -- class definition for hash table class hashtab { size_t tablesize; // size of the table size_t inserted; // # of elements inserted void **table; // actual table size_t (*map)(void *element); // map a key to the table index type int (*compare)(void *e1,void *e2); // compare two elements a la strcmp size_t_lookup(void *element); // internal lookup function public: hashtab(size_t tablesize, size_t (*map)(void *element), int (*compare)(void *e1, void *e2)); ); #endif };
The constructor function for a hash table uses the function find_tab_size to round up the table size to a suitable twin prime. I generated the table of primes with a hacked-up version of the standard sieve of Eratosthenes program (it's useful for something besides benchmarking, after all!).
An example usage would be something like the one in HASHTEST.CPP (Listing Five).
// inttab class #include "hashtab.hpp" size_t imap(void *element) { return *((int *)element); } int icompare(void *e1,void *e2) { if(*(int *)e1 > *(int *)e2) return 1; else if(*(int *)e1 == *(int *)e2) return 0; else return --1; } class inthashtab : hashtab { public: inthashtab(size_t size) : (size,imap,icompare) {} int *lookup(int x) { return (int *)hashtab: :lookup((void *)&x); } void insert(int x) { int *ip; if(hashtab: :lookup(&x) != empty) return; ip = new int; *ip = x; hashtab: :insert((void *)ip); } void remove(int x) { int *ip = (int *)hashtab: :lookup((void *)&x); if(ip != NULL) { hashtab: :remove((void *)ip); delete ip; } } };
The member functions of the derived class inthashtab take different arguments from the parent class hashtab, so it is not necessary to make the members in the parent class virtual -- the members of inthashtab overload the names of the parent class functions instead of redefining them.
This approach works but has one potential problem. If you wanted to use a pointer or reference variable of the type hashtab, you wouldn't be able to call the samenamed member functions of the derived class. You might try to get around this by making the parent class functions virtual, but C++ won't allow you to redefine the type of a virtual function in a derived class.
So you can't indiscriminately assign pointers to derived hash table objects to hash table pointers and expect to use them. However, this restriction is not onerous since you generally don't need polymorphic access to instances of a hash table class. If you think of a reason to do this (besides perversity), I'd be interested in hearing it.
The inttab class is a trivial example. When the only data in an object is a key, the LOOKUP function degenerates into a Boolean function. You don't retrieve anything from the table; you just find out whether something was already there. Also, using new to create each table element is inefficient for small objects; ideally, you should use memory-allocation techniques. But inttab should give you some idea of what you can do with the hashtab class.
You aren't limited to using the hashtab class the way I did. Instead of deriving a new subclass from hashtab, use hashtab variables as private data for a class with an entirely different interface. Drop hashtab right into the associative array class discussed in Bjarne Stroustrup's The C++ Programming Language. You can even use multiple hashtab variables to look up objects in the table using more than one key!
C++ allows you to develop generic, reusable data types that can successfully hide most ugly implementation details. Since C++ is not a pure object-oriented language, we can't hide them all. But this isn't really a problem: our's is an impure world, and the mixed nature of C++ may be a better fit than pure object-oriented languages. It is sometimes nice to work at a low level -- something you can't do with a pure language like Smalltalk.
Brent's technique is a very good fit for certain applications. Having a general-purpose hash class in the toolbox leaves no excuse for using linear searches for lookups!