Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

A New Road Map ofAlgorithm Design Techniques


Apr00: A New Road Map ofAlgorithm Design Techniques

Anany is an associate professor of Computing Sciences at Villanova University. He can be contacted at [email protected].


When it comes to algorithms, the most challenging task a programmer can face is to design an algorithm for an unfamiliar problem. Though the best algorithms are often based on the exploitation of specific features of the problem presented, no one would deny the value of general algorithm design techniques. Such techniques provide general directions for algorithmic problem solving and therefore can guide a programmer's thinking in designing an algorithm. As a sign of increased interest in general design techniques, I can point to several recent updates of books on algorithms -- including Fundamentals of Algorithmics, by G. Brassard and P. Bratley (Prentice-Hall, 1996); Computer Algorithms, by E. Horowitz, S. Sahni, and Rajasekaran, (Computer Science Press, 1996); and Foundations of Algorithms, by R.E. Neapolitan and K. Naimipour (Jones and Bartlett, 1997) -- organized around such techniques. All these books use the familiar list of design techniques: divide-and-conquer, greedy approach, dynamic programming, backtracking, and branch-and-bound. (In addition, there are, of course, parallel, probabilistic, and approximation algorithms. However, these do not represent different techniques; rather, they represent different dimensions of algorithmic problem solving.)

The currently accepted classification of design techniques has serious shortcomings. First, it includes techniques of different levels of generality. For example, it seems obvious that divide-and-conquer is more general than, say, the greedy approach and branch-and-bound, which are applicable to optimization problems only. Second, it fails to include brute force and transform-and-conquer, which are legitimate general design techniques. Third, it does not distinguish divide-and-conquer and what I call decrease-and-conquer as two qualitatively different strategies. Fourth, its linear, as opposed to hierarchical, structure fails to reflect important special cases of the techniques. Finally, it fails to classify many classical algorithms (for example, Euclid's algorithm, heapsort, search trees, and hashing).

It is not difficult to rectify these shortcomings. But before outlining a new taxonomy, I'll review the four most general algorithm design techniques: brute force, divide-and-conquer, decrease-and-conquer, and transform-and-conquer.

Brute Force

Brute force is arguably the simplest algorithm design strategy. It can be defined as a straightforward approach to solving a problem, usually directly based on the problem's statement and definitions of the concepts involved. There is an interesting dichotomy in an attitude towards this approach between computer scientists and programmers. The former have kept brute force in almost complete neglect if not outright contempt. Thus, none of the recent textbooks mentioned previously has a chapter devoted to it. Practitioners, on the other hand, have been using this technique widely since the earliest days of computing.

It is easy to see the reasons behind the dismissive attitude of most academics toward brute force. By definition, this simple-minded approach yields simple-minded algorithms, which is by itself a definite turn-off for mathematically inclined computer scientists. Moreover, brute-force algorithms are usually inefficient and almost never optimal. These facts notwithstanding, the brute-force approach should not be overlooked as an important algorithm design technique.

The principal strength of the brute-force strategy is its very wide applicability. In fact, it seems to be the only general approach for which it is more difficult to point out problems it cannot tackle. The other advantage of brute force is its simplicity. Brute-force algorithms are usually simple to design and implement. For programmers, those are obviously very desirable characteristics. As a result, a brute-force algorithm can be an algorithm of practical choice despite the existence of an asymptotically better alternative (for example, the definition-based matrix multiplication versus Strassen's algorithm or its descendants).

As to the inefficiency of brute-force algorithms, one needs to have the following points in mind. First, one can give examples of important algorithmic tasks (computing the sum of n numbers, finding the largest element in a list, and so on), for which the brute-force approach yields more efficient algorithms than those based on more sophisticated techniques, such as divide-and-conquer. Second, for some important problems (sorting, searching, matrix multiplication, and string matching), brute force yields reasonable algorithms of at least some practical value for all but very large instances. Third, the economics of computing -- with exponentially increasing computer speed and increasing (though, alas, not exponentially) compensation for a programmer's labor -- have been working in the brute-force's favor. A brute-force algorithm that was too slow for all but very small instances of a problem a few years ago may well be of practical interest (and an economically sound choice) for solving its medium-size instances today. Finally, brute-force algorithms can serve as a natural yardstick for more efficient alternatives for solving the same problem.

Divide-and-Conquer

Divide-and-conquer is probably the best known general algorithm design technique. It is based on partitioning a problem into a number of smaller subproblems, usually of the same kind and ideally of about the same size. The subproblems are then solved (usually recursively or, if they are small enough, by a simpler algorithm) and their solutions are combined to get a solution to the original problem. Standard examples include mergesort, quicksort, multiplication of large integers, and Strassen's matrix multiplication. It might be useful to distinguish between two types of divide-and-conquer algorithms. For some of them, the bulk of the work is done while combining solutions to smaller subproblems (such as in mergesort); for others, it is processing before a partition that constitutes the heart of the algorithm in question (such as in quicksort). I call these two types of the divide-and-conquer technique "divide-before-processing" and "process-before-dividing," respectively.

Though most single-processor applications of divide-and-conquer divide a problem into two subproblems, other situations do arise (for example, the multiway mergesort; see Algorithms from P to NP, Volume I: Design & Efficiency, by B.M.E. Moret and H.D. Shapiro, Benjamin/Cummings Publishing, 1990). As to the case of a single subproblem, it is difficult to disagree with Brassard and Bratley that for such applications, "It is hard to justify calling the technique divide-and-conquer." Hence, though binary search is often cited as a quintessential divide-and-conquer algorithm, it represents a generative case of the strategy; in fact, binary search fits better in a separate category that I'll discuss shortly.

The fame of divide-and-conquer is well deserved. Many spectacular algorithms in computer science are based on this design technique. It does have limitations, however. First, it can be inapplicable to a problem (for example, computing the greatest common divisor or finding a function's antiderivative). Second, even if applicable, divide-and-conquer can be inefficient either because of the problem's nature (searching in an unsorted list, for example) or because a partition fails to divide an instance into about equal-size parts (for instance, quicksort on sorted arrays with the trivial pivot-selection strategy). While the latter phenomenon can often be alleviated, nothing can be done about the former.

Decrease-and-Conquer

Solving a problem by reducing its instance to a smaller one, solving the latter (recursively or otherwise), then extending the obtained solution to get a solution to the original instance is, of course, a well-known design approach in its own right. For obvious reasons, I call it "decrease-and-conquer." [Brassard and Bratley use the term "simplification," which I will use shortly for a different design technique; Manber called it the "induction approach" in Algorithms: A Creative Approach (Addison-Wesley, 1989) and "incremental method" in "Designing Algorithms Incrementally" (DDJ, April 1999)]. This approach has several important special cases. The first, and more frequently encountered, decreases the size of an instance by a constant. The canonical example here is insertion sort; many other examples can be found in Manber's book. Though in most algorithms of this kind the size is reduced by one, other situations do arise; for instance, in algorithms that must distinguish between even and odd cases of their input's size.

The second special case of the decrease-and-conquer technique covers size reduction by a constant factor. Examples include binary search and computing exponents by squaring. Examples involving a size reduction by the factor other than two are more rare but they do happen, for example, the Fibonacci search for locating the extremum of a unimodal function (see Numerical Recipes, by W.H. Press et al., Cambridge University Press, 1986).

Finally, the third important special case of decrease-and-conquer covers more sophisticated situations of variable size reduction. Examples include Euclid's algorithm, interpolation search, and the partition-based algorithm for the selection problem.

Though the decrease-and-conquer approach is well known, most authors of books on algorithms consider it either a special case of divide-and-conquer or vice versa. Still, it is more useful to consider divide-and-conquer and decrease-and-conquer as two distinct design techniques. The principal difference between the two lies in the number of subproblems that need to be solved: It is several for divide-and-conquer algorithms and one for decrease-and-conquer algorithms. What these two techniques do have in common is the fact that they both work by solving smaller instances of a given problem. Also, it is worth mentioning that both are recursive in their original formulations though a final implementation of an algorithm based on either of them may well be iterative, of course.

Transform-and-Conquer

The last general technique, missed by textbooks altogether, is based on the idea of transformation; it is natural to call it "transform-and-conquer." One can identify several flavors of this approach. The first one (I call it "instance simplification") solves a problem by first transforming an instance given to another instance of the same problem (and of the same size) with some special property that makes the problem easier to solve. Good examples include presorting (for example, for finding equal elements of a list), Gaussian elimination for solving systems of linear equations, and heapsort (if the heap is interpreted as an array with the special properties required from a heap).

The second variety (called "representation change") is based on a transformation of a problem's input to a different representation, which is more conducive to an efficient algorithmic solution. Examples include search trees, hashing, and heapsort if the heap is interpreted as a binary tree.

Preconditioning (or preprocessing) can be considered as yet another variety of the transformation strategy. The idea is to process a part of the input or the entire input to get some auxiliary information, which speeds up solving the problem. The examples include the Knuth-Morris-Pratt and Boyer-Moore algorithms for string matching and the distribution counting algorithm for sorting.

Finally, in the last and most drastic version of the transform-and-conquer approach, an instance of a problem is transformed to an instance of a different problem altogether. A simple example is computing the least common multiple by the formula lcm(u,v)=uv/gcd(u,v) with the greatest common divisor gcd(u,v) to be computed by, say, Euclid's algorithm. A group of less obvious examples can be found in Flows in Networks, by L.R. Ford and D.R. Fulkerson (Princeton University Press, 1962), where the authors solve several combinatorial and graph problems by transforming them into problems about network flows.

A New Classification Scheme of Algorithm Design Techniques

In my paper "Do We Teach the Right Algorithm Design Techniques?" (Proceedings of SIGCSE'99, March 1999), I suggested a new, hierarchical classification scheme of known algorithm design techniques. On its highest level, it is based on the following criterion for separating most general techniques from less general ones: To qualify for inclusion in the category of most general approaches, a technique must yield reasonable (though not necessarily optimal) algorithms for the problems of sorting and searching.

The choice of sorting and searching can be justified by the importance of these two problems and by the fact that it results in partitioning the known techniques in a manner quite supported by intuition. Specifically, only four techniques -- brute force, divide-and-conquer, decrease-and-conquer, and transform-and-conquer -- pass the test; the others -- greedy approach, dynamic programming, backtracking, and branch-and-bound -- fail to qualify as the most general techniques.

Further, special cases of the most general techniques just mentioned suggest a hierarchical nature of algorithm design strategies. This line of thinking can be applied to less general techniques as well. Thus, Moret and Shapiro have considered the greedy approach as one of two types of a more general approach to optimization problems:

Greedy methods build solutions piece by piece...Each step increases the size of the partial solution and is based on local optimization: The choice selected is that which produces the largest immediate gain while maintaining feasibility. Iterative methods start with any feasible solution and proceed to improve upon the solution by repeated applications of a simple step. The step typically involves a small, localized change which improves the value of the objective function.

Moret and Shapiro devote separate chapters of their book to illustrate each of these techniques; more examples can be found by following the references therein. I like this delineation, though the name "improvement methods" seems to be more descriptive of the second type than "iterative methods."

Further, one can point out two types of dynamic programming algorithms as well. The canonical version of this approach is bottom-up: A table of solutions to subproblems is filled starting with the problem's smallest subproblems. A solution to the original instance of the problem is then obtained from the table constructed, with many of the table's entries remaining typically unused for the instance in question. To overcome that shortcoming, a top-down version of dynamic programming has been developed, based on using so-called "memory functions" (see Brassard and Bratley).

Finally, the two techniques -- backtracking and branch-and-bound -- both deal with combinatorial problems by constructing so-called "state-space trees." The difference between them lies in that backtracking is not limited to optimization problems, while branch-and-bound is not restricted to a specific way of traversing the problem's statement space tree. So, it would be natural to consider the techniques as special cases of a more general approach. What is this approach to be called? The name "exhaustive search" is often used in the literature. However, there are two problems with this term. First, both backtracking and branch-and-bound try, in fact, to avoid exhaustive search by pruning a problem's tree. Second, the exhaustive search can be, in fact, an alternative approach to solving a problem (albeit usually an inferior one) by generating and checking all the candidates for the problem's domain. Therefore, it is more natural to consider the latter approach as a special case of the brute-force technique. Thus, instead of exhaustive search, I use state-space-tree techniques to refer to the general design approach containing both backtracking and branch-and-bound as its special cases. Thus, I end up with the hierarchical taxonomy of major design techniques described in Table 1.

Conclusion

The currently accepted classification of algorithm design techniques has several important shortcomings. These shortcomings can be rectified by reclassifying, and in a few cases renaming, some of the known techniques. The new, hierarchical taxonomy puts four techniques -- brute force, divide-and-conquer, decrease-and-conquer, and transform-and-conquer -- in the category of most general techniques, leaving the rest -- local search techniques, dynamic programming, and state-space-tree techniques -- in the second, less general, category. Further, important special types of the major design techniques are also identified.

A few cautionary notes seems to be in order, however. First, no matter how many general design techniques are recognized, there will always be algorithms that cannot be naturally interpreted as an application of one of those techniques: Some algorithms are just based on insights specific to the problems they solve. On the other hand, some algorithms can be interpreted as an application of different techniques. For example, Horner's rule for evaluating a polynomial can be considered both as a decrease-and-conquer algorithm (see Manber's Introduction to Algorithms) and as an algorithm based on the transformation strategy. Finally, some algorithms may incorporate ideas of several techniques. For example, the fast Fourier transform takes advantage of both the transformation (representation change) and divide-and-conquer ideas.

These remarks notwithstanding, there are several advantages in the new classification. First, it improves the currently accepted taxonomy by eliminating the shortcomings enumerated earlier. Second, it better reflects the richness of algorithm design techniques and shows them on different levels of detail. Third, it allows the programmer to classify some important algorithms (Euclid's algorithm, heapsort, search trees, and hashing, for instance), which the currently accepted taxonomy is incapable of doing. Most importantly, the new taxonomy should serve as a better road map for designing algorithms.

DDJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.