.NET

The C5 Generic Collection Library

By Niels Kokholm and Peter Sestoft, June 08, 2007

Collection libraries provide functionality for storing and manipulating collections of related data items.

Unusual Functionality

In addition to what you'd expect from a good collection library, C5 offers some unusual functionality:

Slidable sublist views that reconcile array lists (fast access by index) with linked lists (fast insertion and deletion of single items), without exposing implementation details such as linked-list nodes. Sublist views make for good orthogonality because all operations that can be applied to lists can also be applied to sublist views. For instance, you need to provide only one Reverse() method, not both Reverse(int start) and Reverse(int start, int length)—the latter two can be performed as Reverse() on suitable sublist views. The same conceptual economy applies to numerous other operations such as several kinds of search, backwards search, tests, shuffling, and so on.
Listenable collection change events allow one to efficiently track changes to collections by attaching event handlers. This makes it easy to build composite collections, such as multidictionaries, using the given components.
Fast snapshots of tree-based collections let you have many almost-identical versions of a set or bag. This supports time- and space-efficient implementation of geometric algorithms.
Handles can be associated with priority queue items to permit fast access, update, or removal of items. In practice, a priority queue is pretty useless without this functionality, yet many collection library implementations do not provide it.

Using C5

To illustrate the use of C5, consider a simple index of a file—a sorted list of all words that appear in the file, and for each such word, a sorted list of the lines in which it appears. We use a dictionary to map each word to a set of line numbers; this avoids duplicate line numbers for each word. Using a tree dictionary makes sure the words come out sorted when the dictionary is printed, and similarly for the line numbers. For each word, the line numbers are represented by a TreeSet<int>, and hence the entire file index is represented by a TreeDictionary<String, TreeSet<int>>.

Let filename be the name of the file. Listing One(a) then builds the index in the tree-dictionary index. The foreach loop body is not optimal: It makes at least two searches in index for each string s; namely, one call to Contains and one or two calls to the indexer index[s]. Since this is a common pattern, dictionaries in the C5 library provide a method bool Find(K k, out V v) that combines the Contains(k) test with the lookup of the associated value v. Using Find, the number of searches is reduced to one or two; see Listing One(b).

(a)
IDictionary<String, TreeSet<int>> index
   = new TreeDictionary<String, TreeSet<int>>();
Regex delim = new Regex("[^a-zA-Z0-9]+");
using (TextReader rd = new StreamReader(filename)) {
   int lineno = 0;
   for (String line = rd.ReadLine(); line != null; line = rd.ReadLine()) {
      String[] res = delim.Split(line);
      lineno++;
      foreach (String s in res)
         if (s != "") {
            if (!index.Contains(s))
               index[s] = new TreeSet<int>();
            index[s].Add(lineno);
         }
   }
}

(b) 
foreach (String s in res)
   if (s != "") {
      TreeSet<int> theSet;
      if (!index.Find(s, out theSet))
         index[s] = theSet = new TreeSet<int>();
      theSet.Add(lineno);
   }

(c)
foreach (String word in index.Keys) {
   Console.Write("{0}: ", word);
   foreach (int ln in index[word])
      Console.Write("{0} ", ln);
   Console.WriteLine();
}

(d)
 ...
characterized: 150
characters: 229 230 319 321
checks: 78
circular: 348
class: 149 320 402 492 516
classes: 7 105 110 117 152 226 232 393
classic: 331
 ...

Listing One

The number of searches can be further reduced to exactly one for each string s by using the C5 method FindOrAdd, at the expense of creating a single empty TreeSet<int> in addition to those needed for all the words in the index. Regardless of how the index is built, it can be printed in alphabetical order using standard C# idioms; see Listing One(c). Listing One(d) is a fragment of the output.

A similar approach can be used to find all anagrams, such as "generate" and "teenager," in a text. From a word, you can compute the corresponding bag of characters, such as {a, e, e, e, g, n, r, t g}; this represents the anagram class of the word. You can create a dictionary mapping bags of characters to sets of words, and populate the dictionary in much the same way as the previous file index.

Previous 1 2 3 4 5 6 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

.NET

The C5 Generic Collection Library

Unusual Functionality

Using C5

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

.NET Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

.NET

The C5 Generic Collection Library

Unusual Functionality

Using C5

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

.NET Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content