Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

All Sorts of Sorts



I am amazed at how seldom these properties are spelled out for comparison functions. I suppose they are sufficiently obvious to most programmers that they are hard to see as important semantic constraints. It's only when you violate one or more and spend three days debugging a program that you are reminded of their importance.

A classic failure is to change ordering functions in the middle of the stream. For example, the Standard C library has a function called bsearch. You specify an array in sort, an item to look up, and a comparison function. bsearch returns a pointer to an array element that compares equal or a null pointer if it can find none. As the "b" in the name implies, bsearch speeds its search by performing a binary chop. Thus, it can search an array with 1,000 elements with at most 10 comparisons.

The binary chop algorithm assumes the array is sorted by the same criterion as is enforced by the comparison function you specify on the call to bsearch. You can use another Standard C library function called qsort to ensure this result. One argument to qsort is a comparison function that behaves much like the one for bsearch. With a bit of care, you can ensure that both functions use exactly the same comparison function.

What many people do, however, is initialize the search table statically, right in the C source code. They sort the initializers for each element by hand. Or they use a sort utility that comes with their development system of choice. Usually that tactic works fine. Probably, the sort utility and your comparison function specify the same ordering. The key is often just a text string that sorts in one obvious way. Right?

Move the code to a machine with a different character set, and you can be surprised. Character data in sort on the original host no longer sorts the same on the new target. The comparison function yields answers inconsistent with the ordering of the table. Disaster ensues.

I have learned to be wary of arrays whose order depends on character codes. I still initialize them statically, but I no longer make them constant. Instead, I sort such same comparison function I use for later searches. (I also comment each such table clearly so future maintainers retain this discipline.) My method saves all sorts of nasty surprises. You find similar problems among the UNIX utilities.

The sort command lets you specify all kinds of special options for the comparison function. You can specify multiple keys or ordering criteria. (Later keys are tested only when earlier ones compare equal.) Each key can specify a different subfield of the text lines to be sorted. Each subfield can be interpreted in different ways. The main thing wrong with all this flexibility is its complexity. It usually takes me between 10 minutes and an hour of experimentation to get all the keys right for a specialized sort.

Another thing is wrong, however. Several other UNIX utilities also require a comparison function. For example, you drop duplicate adjacent lines from a file by piping the sorted file through uniq. And you identify lines common to two files (or unique to just one file) by sorting both files and passing them through comm. uniq and comm have some notion of before and equal when comparing two text lines. If their findings differ from the notion imposed by the earlier sort, the utilities misbehave.

My knowledge of UNIX is admittedly out of date. But the last time I looked, the utilities uniq and comm lacked all the ordering options of sort. I have slammed into that incompatibility with annoying regularity over the years. I was inspired many years ago to write library functions for parsing sort keys and use them in comparisons. I still have versions of sort, uniq, and comm that can agree on fairly ornate their worth many times over.

The comparison functions you use with sorts and searches are critical pieces of code. They tailor standard algorithms for particular applications. Coded properly, comparison functions help you order data sets efficiently and robustly. Done wrong, they cause all sorts of pernicious errors.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.