Unusual Functionality
In addition to what you'd expect from a good collection library, C5 offers some unusual functionality:
- Slidable sublist views that reconcile array lists (fast access by index) with linked lists (fast insertion and deletion of single items), without exposing implementation details such as linked-list nodes. Sublist views make for good orthogonality because all operations that can be applied to lists can also be applied to sublist views. For instance, you need to provide only one Reverse() method, not both Reverse(int start) and Reverse(int start, int length)the latter two can be performed as Reverse() on suitable sublist views. The same conceptual economy applies to numerous other operations such as several kinds of search, backwards search, tests, shuffling, and so on.
- Listenable collection change events allow one to efficiently track changes to collections by attaching event handlers. This makes it easy to build composite collections, such as multidictionaries, using the given components.
- Fast snapshots of tree-based collections let you have many almost-identical versions of a set or bag. This supports time- and space-efficient implementation of geometric algorithms.
- Handles can be associated with priority queue items to permit fast access, update, or removal of items. In practice, a priority queue is pretty useless without this functionality, yet many collection library implementations do not provide it.
Using C5
To illustrate the use of C5, consider a simple index of a filea sorted list of all words that appear in the file, and for each such word, a sorted list of the lines in which it appears. We use a dictionary to map each word to a set of line numbers; this avoids duplicate line numbers for each word. Using a tree dictionary makes sure the words come out sorted when the dictionary is printed, and similarly for the line numbers. For each word, the line numbers are represented by a TreeSet<int>, and hence the entire file index is represented by a TreeDictionary<String, TreeSet<int>>.
Let filename be the name of the file. Listing One(a) then builds the index in the tree-dictionary index. The foreach loop body is not optimal: It makes at least two searches in index for each string s; namely, one call to Contains and one or two calls to the indexer index[s]. Since this is a common pattern, dictionaries in the C5 library provide a method bool Find(K k, out V v) that combines the Contains(k) test with the lookup of the associated value v. Using Find, the number of searches is reduced to one or two; see Listing One(b).
(a) IDictionary<String, TreeSet<int>> index = new TreeDictionary<String, TreeSet<int>>(); Regex delim = new Regex("[^a-zA-Z0-9]+"); using (TextReader rd = new StreamReader(filename)) { int lineno = 0; for (String line = rd.ReadLine(); line != null; line = rd.ReadLine()) { String[] res = delim.Split(line); lineno++; foreach (String s in res) if (s != "") { if (!index.Contains(s)) index[s] = new TreeSet<int>(); index[s].Add(lineno); } } } (b) foreach (String s in res) if (s != "") { TreeSet<int> theSet; if (!index.Find(s, out theSet)) index[s] = theSet = new TreeSet<int>(); theSet.Add(lineno); } (c) foreach (String word in index.Keys) { Console.Write("{0}: ", word); foreach (int ln in index[word]) Console.Write("{0} ", ln); Console.WriteLine(); } (d) ... characterized: 150 characters: 229 230 319 321 checks: 78 circular: 348 class: 149 320 402 492 516 classes: 7 105 110 117 152 226 232 393 classic: 331 ...
The number of searches can be further reduced to exactly one for each string s by using the C5 method FindOrAdd, at the expense of creating a single empty TreeSet<int> in addition to those needed for all the words in the index. Regardless of how the index is built, it can be printed in alphabetical order using standard C# idioms; see Listing One(c). Listing One(d) is a fragment of the output.
A similar approach can be used to find all anagrams, such as "generate" and "teenager," in a text. From a word, you can compute the corresponding bag of characters, such as {a, e, e, e, g, n, r, t g}; this represents the anagram class of the word. You can create a dictionary mapping bags of characters to sets of words, and populate the dictionary in much the same way as the previous file index.