Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Database

A Generic Iterator for Tree Traversal


Nov00: Algorithm Alley

Alexander works as a system architect at PricewaterhouseCoopers. He specializes in designing and developing object-oriented frameworks. He can be reached at [email protected].


Tree-like structures can be implemented in a variety of ways. For example, nodes (siblings) of one level of a tree can be linked using different types of collection, such as java.util.vector or java.util.list. These collections (or rather, their iterators) may provide different methods for collection traversal. Another way is that the link between a parent node and its child nodes can be one-way (from parent to children) or bidirectional. Bidirectional links simplify traversal logic.

The most notable difference among tree-like structures is in the public interfaces of tree nodes. For example, the javax.swing.tree.TreeNode interface (TreeNode is used for TreeModel, which is in turn used by JTree) provides the methods children() and isLeaf(). Node interfaces that employ the XML DOM tree (org.w3c.dom.Node; see http://sundev .natural.com/docs/java/oracle_xml_ parser2.0.2.4/org.w3c.dom.Node.html) define the methods getChildNodes() and hasChildNodes(), which essentially mean the same as children() and isLeaf() (they negate hasChildNodes()).

Normally, code performing tree traversal should be adjusted for each type of tree-like structure. Generic tree traversal logic, on the other hand, can be used with any type of tree-like structure or tree node, which lets you focus on the application logic rather than the internals of the tree structure organization.

Recursion is probably the best known and most frequently used way of traversing a tree. Recursive algorithms use the method call stack to keep the state of the traversal for every level of a tree. Listing One includes a class using recursion to traverse an XML DOM tree. Alternatively, traversing algorithms can utilize a separate stack object to store the tree level's state, as in Listing Two.

You can avoid using stack objects for tree-like structures that provide support for child-parent links. Once the child level has been processed, this link can be used to return back to the parent level; see Listing Three.

The important design decision for any generic algorithm is how to encapsulate its logic within a reusable class. There are two basic choices available for tree traversal -- visitor patterns and iterator patterns (see Design Patterns: Elements of Reusable Object-Oriented Software, by Erich Gamma et al., Addison-Wesley 1995).

Visitor patterns use two classes -- the visitor itself and the class performing the traversal. Traversal classes call the visitor for every node they traverse. The visitor applies its logic to a node it receives from the traversal class. Traversal classes are initialized with concrete visitor objects before the beginning of traversals. (This implementation of the visitor pattern is slightly different from the one described in Design Patterns, which assumes that the collection itself traverses its elements.)

The downside of the visitor pattern is that the visitor's interface is not extensible. It is fully controlled by the traversal class and takes only those parameters that the traversal class passes. So, if the logic of tree processing requires additional information not provided by a traversal class or a tree node itself, the visitor class needs to have state (member) objects that should be initialized before the traversal begins. Because the visitor interface is usually implemented by a separate class, encapsulation may potentially be weakened. The "one class for each distinct type of tree processing" requirement can also lead to the creation of numerous classes that are too granular.

The iterator pattern is simpler and its pervasive use in collection class libraries makes it one of the best-known design patterns. Tree iterators do not impose limitations on calling programs. The calling program only needs to call the iterator's next (sometimes called "advance") method.

Another advantage of the iterator pattern is that tree-like structures can be viewed simply as another type of collection, similar to the list or Java Vector class. The iterator pattern, not visitor, is the easiest way to traverse a collection. After all, a visitor is not normally used with lists, so why use it for a tree? Given that, iterator is a better choice of a design pattern for generic algorithms.

To make it even more consistent with how collections operate, the tree iterator can implement the abstract iterator interface provided by a collection class library. For Java, this is java.util.Iterator interface, which comes with the Java collection framework. This interface requires the implementation of three methods -- next, hasNext, and remove. remove has to be defined, but it doesn't need to be coded; instead, it can simply throw UnsupportedOperationException. Listing Four illustrates the use of the iterator for traversing XML DOM tree structure.

It is obvious that recursion can't be used with the iterator pattern. Using the child-parent link seems to be the most elegant alternative. This link effectively simulates stack structure, so there is no need for a separate stack object.

Another option is to introduce an auxiliary class, LevelIterator, which provides tree-level abstraction to the tree iterator. The task of traversing nodes of the same level is also delegated to this class. Another LevelIterator task is to maintain the link back to the parent level, which lets the tree iterator rely on the child-parent link even if it is not supported natively by the tree structure. Example 1 illustrates how the tree iterator's next method can be implemented using LevelIterator. The LevelIterator.toParent method simply returns the parent's LevelIterator object, while LevelIterator.nextLevel creates a new LevelIterator object for the child level. Creating a new object for every nextLevel call could be expensive for complex trees, so it is desirable to use object pooling to minimize the number of new objects. Object pool implementation for a tree-like structure is straightforward since the number of objects always equals the number of tree levels and the object is always returned to the pool before it can be allocated again. So a simple stack structure can be adapted for use as an object pool of LevelIterator objects. (The complete source code of my tree iterator and implementation of the object pool is available electronically; see "Resource Center," page 5). LevelIterator also runs some other chores, such as maintaining a node index within the level and a level index within the tree (depth).

After the public interface and implementation aspects of the algorithm are defined, you need to decide how the iterator deals with different kinds of tree nodes (org.w3c.dom.Node or javax.swing.tree .TreeNode, for example).

In situations where there are no common interfaces that can be used by an algorithm, the adapter pattern is handy. The adapter converts calls to its generic methods into the calls to methods used by the particular Node class. In fact, just two methods are needed to implement the adapter for the tree iterator -- getChildren and hasChildren (see Example 2).

The adapter class is usually trivial and doesn't do much. Listing Five is a trivial adapter for swing.tree.TreeNode. The simplicity of the adapter's code stems from TreeNode's native support of the enumeration interface for child nodes. When there is no such support, implementation is trickier. The adapter for org.w3c.dom.Node includes an additional nested adapter class for enumeration. This class uses the Node.getNextSibling() method to enumerate over the list of children; see Listing Six.

In Java, enumeration seems to be the natural interface for the collection of nodes, so additional enumeration adapters are rarely required. The DOM specification, being language independent, does not natively support enumeration; however, vendors of XML parsers usually extend the basic interface of org.w3c.dom by adding enumeration support. Therefore, it makes sense to use the DOM adapter in Listing Six only if there is a need to keep the adapter class generic so it can work with any XML parser. Otherwise, an additional enumeration adapter is not needed. Listing Seven contains the adapter's code for IBM's XML4J parser, which does support enumeration natively. This approach is also preferred from the performance standpoint, since creating new enumeration objects for every level of a tree adds some overhead, especially for deep trees with a low number of nodes at most of the levels. Because DOM trees usually fall under this category, their terminal levels frequently contain only one node.

Adapter objects should be passed to the tree iterator constructor along with the tree's root, for example:

Document xdoc = parser.readStream(new FileReader(fileName));

TIterator tIterator = new TIterator (xdoc,new DOMAdapter());

In addition to being generic, the implementation of the tree iterator presented here includes capabilities that allow for improving traversal performance in certain situations.

Limiting the Depth of Traversal

Tree traversal algorithms usually go over all nodes of a tree, assuming that full traversal is required by the calling code. However, the calling code occasionally needs to process only the subset of the tree's levels, meaning that it doesn't need to go all the way to the leaf level of a tree. In this case, full traversal carries significant overhead, since it traverses nodes that are ignored by the calling code. Tree traversal is a linear algorithm (O(n), where n is the number of nodes), so it is important to be able to limit the number of traversed nodes.

Consider the XML tree in Figure 1 (this example comes with IBM's XML4J parser). Suppose you want to process the IDs of all people in this file, but at the same time, are not interested in a person's name, e-mail, or other attribute. Full traversal is not what you want, since it traverses levels with name and e-mail, thus increasing the overall processing time.

Tree iterators should be able to stop at the second level and not go any deeper. To accomplish this, the iterator accepts the additional parameter maxDepth. This parameter is checked in the iterator's next method before moving to the children of the current node; see Example 3(a). So, for ID processing, the iterator should be created with the maxDepth parameter, as in Example 3(b). The only disadvantage of this approach is that the idLevel value will have to be changed if the structure (schema) of the XML document changes. To avoid this problem, the idLevel can be dynamically found by performing an additional loop, as in Listing Eight. This extra loop is lightweight, because the iterator only traverses the first node of each level and exits right after the required level is found.

Implementation of Selective Traversal

Oftentimes, there is a need to process nodes selectively based on conditions applied to one of the parent nodes. In other words, the condition is checked against the parent node (control node); if the condition is met, the entire branch that originates from this node should be processed. Otherwise, children of this node should be ignored.

It is expensive to use full traversal in situations like this, especially if the selectivity of the condition is high. In the extreme case, when search criteria contain a unique key (for example, "print all information about an employee with a given SSN"), the cost of full traversal could be prohibitive for large trees. Also, if the control node is located close to the root of the tree, the number of redundant steps for full traversal will be extremely high. In any event, full traversal overhead directly depends on the number of child nodes with control nodes that don't meet selection criteria.

On the other hand, it is desirable to be able to use generic traversal algorithms even when full traversal is not needed, since the rules of navigating through levels of a tree still apply. For example, suppose the traversing program needs to print all information about K.TAMURA. The easiest way to do this is to implement a nested loop that uses its own iterator. Maximum depth for the outer loop iterator should be limited to the level of the "person" node (second level); see Listing Nine.

The disadvantage of this approach is the need to have multiple nested loops and multiple iterators, which could lead to complicated code in cases of complex selection criteria.

You can do all processing in a single loop using the same iterator. But it requires additional intelligence on the iterator's side. One way to do it is to make the iterator implement a special skipChildren method. When this method is called, the iterator sets its skipChildren flag. The flag is reset after the iterator advances to the first child:

if (! skipChildren and node.has Children()) {

// move to child's level ...

}

skipChildren = false;

Listing Ten prints all nodes for the K.TAMURA control node, this time using the skipChildren method.

The most flexible approach for selective traversal is to make the handling of the maximum depth by the iterator more sophisticated. Indeed, selective traversal can be accomplished by dynamically changing the maximum depth for different nodes (branches). In the example with the node printing for K.TAMURA, the initial maximum depth is two (depth of idLevel). For selected nodes (ID=''K.TAMURA''), the depth should be adjusted to "unlimited," so the tree iterator can process children of the node. With this approach, maximum depth can't remain just a static parameter applied to the entire tree. Instead, the original depth should be restored when the iterator returns to the level of the parent node. In other words, depth becomes an attribute of a level and should be kept in the stack along with other level information (such as the current node). For the tree iterator I present here, an instance of LevelIterator class represents the tree level, so the current maximum depth value is kept there.

The calling program calls the setLevelDepth method of the iterator to change maximum depth value for the current node (control node). This method creates a new LevelIterator object for the child level and sets the new maximum depth on it. When next is called, the iterator navigates to the first node of the child level. The new maximum depth becomes current and is used for all child levels of the node -- unless setLevelDepth is called again. When all children are processed and the iterator returns to the control node level, the corresponding maximum depth is restored (that is, the control node's LevelIterator object becomes current). Listing Eleven prints the required nodes using the setLevelDepth method.

The logic for the traversal with the dynamically changed maximum depth is similar to the skipChildren flag approach. It is somewhat more elegant because it doesn't require using negative logic, but in terms of functionality, it does the same thing. The variable maximum depth approach, however, provides much greater flexibility when it comes to more complex selective processing. It allows for applying a different maximum depth to different branches of the tree. Some branches can be processed fully and some partially, up to the required depth. For example, you might need to process all levels for managers and only some levels for other entries in a personnel file.

This approach also allows for depth nesting. setLevelDepth can be called multiple times, first for the top-most control node, then for the node within the same branch that meets some additional criteria, and so on. In other words, coverage of the tree during traversal is defined by the combination of depths specified for the given set of criteria. Every criterion in the set can impose its own depth limitation on the tree. And no matter how complex these criteria are, the traversal is still performed within one loop. So the increased complexity of the tree iterator code (because of the handling of the maximum depth setting for each level) is justified by the gained flexibility in selective traversal.

DDJ

Listing One

package aan.treexamples;
import org.w3c.dom.*;
public class RecursiveTraversal implements ITraversal {
  /* Performs full tree traversal using recursion. */
  public void traverse( Node parentNode ) {
    // traverse all nodes that belong to the parent
    for(Node node=parentNode.getFirstChild(); node!=null; 
                                              node=node.getNextSibling()) {
      // print node information
      System.out.println( node.getNodeName()+"="+node.getNodeValue());
      // traverse children
      traverse(node);
    } 
  } 
} 

Back to Article

Listing Two

package aan.treexamples;
import org.w3c.dom.*;
import java.util.*;
public class StackTraversal implements ITraversal {
  /* Performs full tree traversal using stack. */
  public void traverse( Node rootNode ) {
    Stack stack = new Stack();
    // ignore root -- root acts as a container
    Node node=rootNode.getFirstChild();
    while (node!=null) {
      // print node information
      System.out.println( node.getNodeName()+"="+node.getNodeValue());
      if ( node.hasChildNodes()) {
        // store next sibling in stack. Return to it after all children 
        //                                                     are processed.
        if (node.getNextSibling()!=null)
          stack.push( node.getNextSibling() );
        node = node.getFirstChild();
      } 
      else {
        node = node.getNextSibling();
        if (node==null && !stack.isEmpty())
          // return to the parent's level.
          // some levels can be skipped if parent's node was the last one.
          node=(Node) stack.pop();
      } 
    } 
  } 
} 

Back to Article

Listing Three

package aan.treexamples;
import org.w3c.dom.*;
public class LinkTraversal implements ITraversal {
  /* Performs full tree traversal usingchild-parent link. */
  public void traverse( Node rootNode ) {
    // ignore root -- root acts as a container
    Node node=rootNode.getFirstChild();
    while (node!=null) {
      // print node information
      System.out.println( node.getNodeName()+"="+node.getNodeValue());
      if ( node.hasChildNodes()) {
        node = node.getFirstChild();
      } 
      else {    // leaf
        // find the parent level 
        while (node.getNextSibling()==null && node != rootNode)
            // use child-parent link to get to the parent level
            node=node.getParentNode();
        node = node.getNextSibling();
      } 
    } 
  } 
} 

Back to Article

Listing Four

package aan.treexamples;
import aan.tree.*;
import org.w3c.dom.*;
public class IterFullTraversal implements ITraversal {
  /* Performs full tree traversal using tree iterator. */
  public void traverse ( Node rootNode ) {
    // create iterator
    TIterator tIterator = new TIterator ( rootNode , new DOMAdapter() );
    Node node = null;
    while( (node=(Node)tIterator.next()) != null )  {
      // print node information
      System.out.println( node.getNodeName()+"="+node.getNodeValue());
    } 
  } 
} 

Back to Article

Listing Five

package aan.tree;
import java.util.Enumeration;
import javax.swing.tree.TreeNode;
/* This adapter maps <code>javax.swing.tree.TreeNode</code> interface to
 * generic methods required by <code>TIterator</code>. */
public class SwingAdapter implements TNodeAdapter {
  public Enumeration getChildren( Object node ) {
    return ((TreeNode)node).children();
  } 
  public boolean hasChildren( Object node ) {
    return ! ((TreeNode)node).isLeaf();
  } 
} 

Back to Article

Listing Six

package aan.tree;
import java.util.Enumeration;
import org.w3c.dom.*;
/** This adapter maps <code>org.w3c.dom.Node</code> interface to generic 
 * methods required by <code>TIterator</code>. Enumeration provided by the 
 * nested class in order to conforom to w3c spec. */
public class DOMAdapter implements TNodeAdapter {
  public Enumeration getChildren( Object node ) {
    return new Enumerator( ((Node)node).getFirstChild() );
  } 
  public boolean hasChildren( Object node ) {
    return ((Node)node).hasChildNodes();
  } 
  /* Maps <code>org.w3c.dom.Node</code> methods to <code>Enumeration<code>.
   * Inner class <code>HeadNode</code> provides dummy impelmemtation for
   * <code>Node</code>, it is used as the head of the list. This is needed 
   * to advance to the first element after the <code>next()</code> */
  static public class Enumerator implements Enumeration {
    private Node node;
    Enumerator( Node node) {
      // empty node is the head of the list
      this.node = new HeadNode( node );
    } 
    public Object nextElement() {
      node = node.getNextSibling();
      return node;
    } 
    public boolean hasMoreElements() {
      return (node.getNextSibling() != null);
    } 
    /* Dummy implementation of the <code>Node</code>.
     * It returns its node after the <code>nextSibling</code> call */
    private class HeadNode  implements Node {
      private Node nextNode;
      /* Creates the node and initiates with the current node */
      private HeadNode( Node node){
        nextNode = node;
      } 
      /* Returns current node as the next. This is the only method that's 
       * used out of <code>Node</code> interface.  @return current node */
      public Node getNextSibling() {
        return nextNode;
      } 
      // all these methods are not used
      public String getNodeName(){ return null;} 
      public String getNodeValue() throws DOMException { return null;} 
      public void setNodeValue(String nodeValue) throws DOMException {} 
      public short getNodeType() {return 0;} 
      public Node getParentNode() {return null;} 
      public NodeList getChildNodes() {return null;} 
      public Node getFirstChild()  {return null;} 
      public Node getLastChild()  {return null;} 
      public Node getPreviousSibling()  {return null;} 
      public NamedNodeMap getAttributes(){return null;} 
      public Document getOwnerDocument(){return null;} 
      public Node insertBefore(Node newChild, Node refChild)
        throws DOMException {return null;} 
      public Node replaceChild(Node newChild, Node oldChild)
        throws DOMException {return null;} 
      public Node removeChild(Node oldChild)
        throws DOMException {return null;} 
      public Node appendChild(Node newChild)
        throws DOMException {return null;} 
      public boolean hasChildNodes() {return false;} 
      public Node cloneNode(boolean deep) {return null;} 
    }  // end of HeadNode
  }  // end of Enumerator
} 

Back to Article

Listing Seven

package aan.tree;
import java.util.Enumeration;
import org.w3c.dom.Node;
import com.ibm.xml.parser.Parent;
/* Maps <code>org.w3c.dom.Node</code> and 
 * <code>com.ibm.xml.parser.Parent</code> to generic methods required 
 * by <code>TIterator</code>. */
public class XML4JAdapter implements TNodeAdapter {
  public Enumeration getChildren( Object node ) {
    return ((Parent)node).elements();
  } 
  public boolean hasChildren( Object node ) {
    return ((Node)node).hasChildNodes();
  } 
} 

Back to Article

Listing Eight

package aan.treexamples;
import aan.tree.*;
import org.w3c.dom.*;
public class IterMaxDepthTraversal implements ITraversal {
  /* Traverses tree with traversal depth set to depth of the "person" level */
  public void traverse ( Node rootNode ) {
    // create iterator
    TIterator tIterator = new TIterator ( rootNode, new DOMAdapter() );
    // find the depth of the required level
    int targetDepth = -1;
    Node node = null;
    while( (node=(Node)tIterator.next()) != null )  
      if (node.getNodeName().equals("person")) {
        targetDepth = tIterator.getDepth();
        break;
      } 
    // we can reuse the same iterator
    tIterator.setRoot(rootNode);
    // limit the depth of traversal
    tIterator.setMaxDepth( targetDepth );
    while( (node=(Node)tIterator.next()) != null )  {
      // print node information
      System.out.println( node.getNodeName()+"="+node.getNodeValue());
    } 
  } 
} 

Back to Article

Listing Nine

package aan.treexamples;
import aan.tree.*;
import org.w3c.dom.*;
public class SelectiveNestedTraversal implements ITraversal {
  /* Selectively traverses the tree ising nested loops. */
  public void traverse ( Node rootNode ) {
    // create 1st iterator. depth should be limited to ID's level
    DOMAdapter domAdapter = new DOMAdapter();
    TIterator tIterator = new TIterator ( rootNode, domAdapter, 2 );
    Node node = null;
    while( ( node=(Node)tIterator.next() )!=null ){
      // print node information
      System.out.println( node.getNodeName()+"="+node.getNodeValue());
      if  (node instanceof Element && ((Element)node).getAttribute( "id" ).
        equalsIgnoreCase( "K.TAMURA" )) {
        // create 2nd iterator, perform nested loop to process the branch.
        TIterator branchIterator =  new TIterator ( node, domAdapter);
        Node branchNode = null;
        while( (branchNode=(Node)branchIterator.next()) != null )
          // print node information
          System.out.println( branchNode.getNodeName()+"=" 			+branchNode.getNodeValue());
      } 
    } 
  } 
} 

Back to Article

Listing Ten

package aan.treexamples;
import aan.tree.*;
import org.w3c.dom.*;
public class SelectiveSkipTraversal implements ITraversal {
  /* Selectively traverses the tree ising skip flag. */
  public void traverse ( Node rootNode ) {
    // create an iterator.
    int idLevel = 2;
    TIterator tIterator = new TIterator ( rootNode, new DOMAdapter());
    Node node = null;
    while( ( node=(Node)tIterator.next() )!=null ){
      // print node information
      System.out.println( node.getNodeName()+"="+node.getNodeValue());
      if  ( tIterator.getDepth() == idLevel && ! (node instanceof Element &&
        ((Element)node).getAttribute( "id").equalsIgnoreCase("K.TAMURA" ))) {
        // allow parsing all children of the node -- applies  				only to the current node
        tIterator.skipChildren();
      } 
    } 
  } 
} 

Back to Article

Listing Eleven

package aan.treexamples;
import aan.tree.*;
import org.w3c.dom.*;
public class SelectiveSetDepthTraversal implements ITraversal {
  /* Selectively traverses the tree using <code>setLevelDepth</code> */
  public void traverse (Node rootNode) {
    // create iterator, initially it is limited to the ID level
    int idLevel =2;
    TIterator tIterator = new TIterator (rootNode,new DOMAdapter(),idLevel);
    Node node = null;
    while( ( node=(Node)tIterator.next() )!=null ){
      // print node information
      System.out.println( node.getNodeName()+"="+node.getNodeValue());
      if  ( tIterator.getDepth() == idLevel && (node instanceof Element &&
        ((Element)node).getAttribute( "id").equalsIgnoreCase("K.TAMURA" ))) {
        // this allows iterator to traverse children of the current  node
        tIterator.setLevelDepth( TIterator.DEPTH_UNLIMITED);
      } 
    } 
  } 
} 

Back to Article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.