Searching the Skip List
All operations on a skip list rely on the search algorithm, so I discuss this first.
Earlier I provided a high-level explanation of this process, showing how a search for product H2345 would take place. The following explains how the code itself works. Here's the search algorithm broken out of the insert
method:
SkipNode<Key,Obj>* tmp = head; Key* cmpKey; // Figure out where new node goes for ( h = curHeight; h >= 1; h- ) { cmpKey = tmp->fwdNodes[h]->getKey(); while ( *cmpKey < *k ) { tmp = tmp->fwdNodes[h]; cmpKey = tmp->fwdNodes[h]->getKey(); } updateVec[h] = tmp; } tmp = tmp->fwdNodes[1];
The search starts at the topmost forward pointer in the head node, which is indicated by curHeight,
and controlled by the outermost for
loop. The head has no key, so the code always retrieves the value of the key in the next node at this level. It does so via the fwdNodes
attribute. If the key (k
) in the node being compared is less than the key (cmpKey
), then the next node becomes the current node, and the comparison key pointer is reset to the value of the key in the next forward node (handled in the while
loop). This process continues until the code finds a key that is less than the search key. (This explains why the tail was set to a sentinel value of Z9999). At this point, the search drops down a level in the current node, and the cmpKey
pointer is set to the key value of the next forward node.
Eventually, the search ends up at level 1, in the node just prior to the node less than or equal to the key being sought (because the search code is always looking at the key value for the next node). This is the reason for the final forward shift in tmp
following the for
loop. It is at this point where all add, delete, and insert operations take place.
Another important point about the search routine is that following the while
loop (or more precisely, just prior to dropping down a level in the current node), the search routine sets updateVec[h]
to the current node. This vector of SkipNode
pointers is used as a placeholder for pointers that must be moved when splicing the skip list back together after insert
or delete
operations. The updateVec
pointers always point to the most previous node prior to the insertion/deletion point at each level.
To retrieve an object, SkipList
contains a method called retrieve
, which takes the key as an argument, and returns the object (or NULL
if the object doesn't exist). Here's an example:
String *srch = new String("H2345"); productData* prod = sList->retrieve(srch);
Inserting Objects into the SkipList
Adding a node to the skip list is as simple as calling the insert
method with the new key and object. In my SkipList
class, new memory for the key will be allocated in the SkipNode
object, but productData
's memory will not be copied (so you should not delete that memory after calling insert
). Here's a sample insertion:
String* aKey = new String("Y8792"); productData* prod = new productData; aList->insert(aKey, prod);
The first thing SkipList
's insert
method does is use the search algorithm to select a location for a new node. After it locates the perfect spot, the insert
algorithm checks if the height of the new node is greater than any previously created node. If so, then insert
resets SkipList
's curHeight
to match the new level, and sets the updateVec
pointers for the previously unused levels to the head node. The search algorithm will use this information later to enable finding the new node from the head node. (Remember, the head points to the tail at all unused levels).
Once the new node is instantiated, it gets spliced into the list by re-pointing the previous node pointers to the new node. All pointers up to and including the new node's height are retargeted. The new node then acquires from updateVec
the forward pointers from its previous nodes for the levels it occupies.
A successful insert will return true
. If an insertion fails (most likely due to a duplicate key) then insert
will return false
. You can easily change the behavior of insert
to accept duplicate keys.
Removing Skip List Items
You can remove any item from the skip list by calling the remove
method. It takes an argument equal to the key you want to delete. As with insert
, the first thing to take place is a search for the existing key. The search routine also builds the updateVec
array containing pointers to the most previous nodes at each preceding level. When the node you want to remove is located, its forward pointers are copied into the forward pointers of the previous nodes. This effectively slices the node out of the list. The remove
method then performs a delete
on the SkipNode
object.
After the object is deleted, you should check whether the node was the tallest in the list. If it was, the head will once again have forward pointers to the tail. The code then uses this information to adjust curHeight
to the correct value. If the key value you passed to remove
doesn't exist, the routine returns false
. Otherwise remove
returns true
.
Dumping the List
You can still perform linear operations on the list by traversing the nodes at level 1 (as all nodes are linked at that level). I demonstrate this feature in the dump
method used to print out the entire skip list. This method accepts an ofstream
pointer and prints the value of the key for each node.
Skip List Optimization
Pugh suggests that for any given probability (p
), you can compute the optimal node height if you know the approximate number of nodes (n
) you'll need. The formula is: log1/p n
. If we arbitrarily chose 50 nodes and a probability of 0.5, the formula would yield an optimal height of 6. (5.64 is the actual result.)
Conclusion
With any data structure, the application should dictate the most appropriate choice. Skip lists are easy to code and understand, and therefore easy to implement and extend. As an added bonus, they offer average O(log n) performance, and the convenience of both binary and linear operations.
Reference
William Pugh's web site can be found at: www.cs.umd.edu/~pugh and contains a link to "Skip Lists: A Probabilistic Alternative to Balanced Trees," Communications of the ACM, June 1990.
Bill Whitney is an Application Consultant currently working for GTE Data Services in Tampa, Florida. He specializes in Object-Oriented design, Client/Server application architecture, and Telecommunications Management Networks.