Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

STL & Generic Programming: STL Container Iterators


April 2001/STL & Generic Programming


The std::map Example Explained

Before I get to the main topic of this month’s column, I owe you the solution to the little homework assignment I gave you. In my last column, I showed you how std::map could be used to easily and efficiently solve a time-honored problem: how to keep track of the number of times that each word occurs in a text. As an exercise, I encouraged you to use your STL book or documentation to fully understand the code, which is shown again in Listing 1. As promised, I will now explain the code in detail.

What needs explanation here is of course the one line that makes up the body of the while-loop. Before we analyze the code, a word about style is in order. I consider cramming several nested function calls into a single line like this to be poor programming style, typical of the it-was-hard-for-me-to-write-why-should-it-be-easy-for-others-to-understand school of software engineering. Another great example of an attitude that is deemed cool only by those who sport it, while everybody else thinks it’s pretty dumb. A better idea is to split up a line like this one into several lines, each of which defines a variable that is then used as a function argument in a subsequent line. Listing 2 shows what becomes of my one-liner when it is completely unraveled.

Without going into any details yet, if you compare Listing 2 to my original one-liner in Listing 1, you will probably agree that these two versions represent two extremes, neither of which is entirely satisfactory. Due to the abundance of class templates in the STL, defining intermediate variables often obfuscates the code more than it clarifies it. As we will see in just a moment, the STL actually encourages you to lean at least a little towards the one-line style. For a very interesting, almost funny pitfall in this context, I encourage you to read Herb Sutter’s “Guru of the Week #75” [1].

Let us now analyze Listing 2 step by step. In the first line, an std::pair is declared. std::pair is a simple struct that contains two public data members, first and second, whose types are given by template parameters. There are some extras that come with std::pair that you should look up in your documentation, but the essence of it is just this:

template<typename _T1, typename _T2>
struct pair
{
  pair(const _T1& f, const _T2& s) :
    first(f), second(s) {}
  _T1 first;
  _T2 second;
};

Notice how the STL violates one of the rules of orthodox OO (object-oriented) programming here by providing public data members. I recommend that you read Jim Hyslop’s and Herb Sutter’s thoughts on this interesting subject [2]. Also notice that I’m using a constructor here to generate the pair, whereas the original one-liner in Listing 1 used the function template std::make_pair. std::make_pair is a simple wrapper around the constructor:

template<typename _T1, typename _T2>
inline pair<_T1, _T2>
make_pair(const _T1& f, const _T2& s)
{return pair<_T1, _T2>(f, s); }

The point of having this function template is that it allows you to generate a temporary object of type std::pair without ever having to write the type of the object. If it weren’t for the function std::make_pair, the call

std::make_pair(strWord, 0)

in the one-liner in Listing 1 would have to be replaced by the constructor call

std::pair<std::string, int>(strWord, 0)

Using std::make_pair is more elegant because you let the compiler deduce the template arguments std::string and int. When using the STL, you will encounter many more of these “constructor wrappers.” Examples are std::bind1st, std::mem_fun, and std::ptr_fun, which I will discuss in future columns. All of these are meant to be used in the same way as I used std::make_pair, that is, to generate a temporary object that is directly supplied as a function argument, thus avoiding the definition of an intermediate named object. In the presence of complex template arguments, this technique can really improve code readability. So, as you can see, a certain dose of cram-it-all-into-one-line can be a good thing when working with the STL. Note also how the STL violates another supposedly ironclad rule of orthodox OO programming, by using non-member functions. I recommend Scott Meyers’ article [3] for an interesting angle on this subject.

At this point in our analysis of Listing 2, we have a pair consisting of the current word and the integer zero. In the second line, this pair is passed as a key-value pair to the insert method of our word counting map. This method works as follows: if a key-value pair with the same key (our current word in this case) already exists in the map, nothing happens. No new key-value pair is inserted, and the value of the existing pair is not modified. If no key-value pair with the same key exists in the map, the pair that was passed to insert is inserted into the map.

The return value of std::map::insert is a pair. The second element of that pair is a Boolean that is true if an actual insertion took place, and false otherwise. The first element of the pair is an iterator that points to the key-value pair in the map that has the given key, regardless of whether that pair has just been inserted or whether it was already there.

In our application, we don’t care if the current word already had an entry in the map or not. In either case, all that remains to be done is to increment the value (the word count) for the word’s entry. Therefore, we ignore the second component of insert’s return value. The line marked // 3 in Listing 2 extracts the first component of that return value and assigns it to a suitable iterator variable. The result of dereferencing an std::map::iterator is a reference to the key-value pair that the iterator points to. Therefore, the last line of Listing 2 increments the value of the map entry for our current word. If we wanted to be ridiculously verbose, we could have replaced this line with

std::pair<const std::string, int>&
  pairEntryInMap = *itEntryInMap;
++pairEntryInMap.second;

Note that the exact result type of dereferencing the iterator is a reference to

std::pair<const std::string, int>

and not to

std::pair<std::string, int>

although our original declaration of the map was

std::map<std::string, int>

Whenever you access a key-value pair in an std::map, you get a pair whose first component is const, even if you did not explicitly make the key type constant in your map declaration. This is necessary because the map is internally ordered by keys. If it were possible for you to modify keys, you would be corrupting the map, rendering it inconsistent and useless.

I want you to realize how important it is to understand precisely how STL methods, such as std::map::insert, work. Somebody who has a superficial understanding of std::map might well write map insertion as a two-step process: in the first step, he would use std::map’s find method (or, for some really bad code, the algorithm std::find) to check if the map already contained an entry with the key in question. If the answer was no, the insertion would be made in a second step. This is bad because it uses two independent map lookups where one would suffice. std::map::insert was written specifically to avoid this inefficiency.

In conclusion, I urge you to compare the ease and efficiency with which we just implemented the word counting mechanism to the effort that it takes to write something similar from scratch.

STL Container Iterators

As for the main topic of this column, STL container iterators, let me begin by saying that I will make iterators the topic of more columns in the future. The reason is that, in my experience, this is where the extensibility of the STL really starts to pay off. Writing your own iterators and using them creatively, especially in conjunction with STL algorithms, can work wonders for your code in terms of elegance, maintainability, efficiency, and productivity. But first things first. Your first exposure to iterators will undoubtedly come from using the iterators that the STL containers expose. This is where you build your understanding and your intuition of the iterator concept. As usual, reading my treatment of the subject is not a substitute for studying your STL documentation. My goal is to introduce the subject and to emphasize some points that are often overlooked in the heat of the coding battle.

The interfaces of the iterators that the STL containers expose emphasize the analogy to C-style pointers. All of these iterator classes define the following operators:

operator++ // prefix and postfix
operator-- // prefix and postfix
operator*
operator->

Moreover, these operators have the effect that you would expect if you viewed iterators as generalized pointers. No big surprises here. The operators

operator+
operator-

are used to advance or decrement an iterator by an arbitrary number of steps. These operators are provided by vector iterators and deque iterators, but not by list, map, or set iterators. Of course these operators could be implemented in terms of operator++ and operator-- for all container iterators. The point is that only vector and deque iterators can be advanced and decremented in amortized constant time, like actual C-style pointers. For all other STL containers, the cost of advancing or decrementing an iterator is proportional to the number of steps moved. The STL reminds you of this fact by not providing operator+ and operator- in these cases.

There is a non-member function template std::advance that operates on any iterator and has the same effect as operator+. The std::advance template determines intelligently (typically at compile time) what kind of iterator it has been given. It then uses the most efficient way to advance the given iterator. When using std::advance, bear in mind that the complexity of the operation you are performing depends heavily on the type of iterator that you have supplied. All this goes to show that the analogy between C-style pointers and STL container iterators, helpful as it is, will break down eventually, especially when it comes to efficiency considerations.

Const and Non-const Iterators

One of the hallmarks of good C++ programming style is const correctness. Objects that are not meant to be modified should be declared const, and member functions that do not modify the state of the object should be made const member functions. The STL container iterators support and encourage this principle, and again, they do so in analogy to C-style pointers. When you work with a C-style pointer and you know that the objects pointed to are not to be modified, then you declare your pointer as a pointer to const, as in

const int* ptr;

or, equivalently,

int const * ptr;

All STL containers expose two types of iterators, called iterator and const_iterator. As a general rule, plain iterators allow you to modify the container element that they currently point to, whereas const_iterators do not. The only exception to this rule is that the keys of a map or set can never be modified, because that would corrupt the ordering. Technically, iterators in general return a reference to the object they point to, whereas const_iterators return a const reference to the object they point to. Here’s a simple example:

std::vector<int> vectOfInts(42);

std::vector<int>::iterator it = vectOfInts.begin();
*it = 0; // OK

std::vector<int>::const_iterator cit = vectOfInts.begin();

// ERROR, attempt to modify *cit
// through a const_iterator
*cit = 1; 

// fine, no attempt to modify *cit
int i = *cit;

As you can see, an iterator behaves just like a pointer to non-const, whereas a const_iterator behaves like a pointer to const. In fact, it is quite likely that in your STL implementation, the iterators of the example above are typedefed as int* and const int*, respectively. It is important to understand the difference between a const_iterator and an iterator that is declared as const. The former corresponds to a pointer to const, whereas the latter corresponds to a const pointer:

std::vector<int> vectOfInts(42);

const std::vector<int>::iterator
   it = vectOfInts.begin();
*it = 0; // OK, *it is modifiable

// ERROR, calling non-const operator
// on const object.
++it; 

// ERROR, attempt to modify a const object
it = vectOfInts.end();

Const iterators like the one in this example are not a very common occurrence when working with STL containers and iterators. After all, an iterator’s job is to iterate, so it’s hardly any good if it can’t move. By contrast, const_iterators can and should be used quite frequently. A typical example is the for-loop at the end of Listing 1. Here, we iterate through a container for the sake of printing its elements to the console. Common const-correctness requires that we use const_iterators instead of iterators.

As far as conversions between const and non-const are concerned, iterators again behave like their cousins, the pointers. An iterator converts to a const_iterator, but not vice versa. Unfortunately, an early STL implementation that came with Microsoft Visual C++ had conversions both ways, but that has been fixed. Again, Listing 1 has an example for this conversion: mapWordCount.begin and mapWordCount.end return std::map::iterators, which get converted to std::map::const_iterators as they are being assigned to itCurrent and itEnd, respectively.

Now we know what the story is on constness and iterators. But what does it mean for a container to be const? First of all, you cannot of course call any non-const member functions on a const container:

const std::vector<int> constVect;

// ERROR, cannot modify a const object
constVect.resize(100);

The only way to create a const container object with anything in it is to put that something in it when the container is initialized:

std::vector<int> nonconstVect;
...
// put stuff in nonconstVect
...
const std::vector<int> constVect = nonconstVect;

The STL goes one step further and ensures that const container objects actually obey deep constness; that is, not only is it impossible to modify a const container object, it is equally impossible to modify any of the objects that it contains.

const std::vector<int> constVect(42);

// ERROR, constVect[0] is not modifiable
constVect[0] = 1;

In order to fully enforce this deep constness of containers, the STL allows you to obtain only const_iterators, but not iterators, from a const container object:

const std::vector<int> constVect(42);
// ERROR, cannot obtain non-const iterator
// for const container
std::vector<int>::iterator it = constVect.begin();
std::vector<int>::const_iterator
  cit = constVect.begin(); // OK

Technically, this is achieved by overloading the begin and end methods of the containers. Each of these has a non-const version, which returns an iterator, and a const version, which returns a const_iterator.

Iterator Validity

Designers of container and iterator classes must face an important question: what should happen to an iterator when elements are added or removed from the container it points into? For STL vectors and deques, the short answer is: when a vector or deque is modified in this manner, then all iterators into the vector or deque become invalid. The full truth is a little more subtle than that (for example, see the remarks on page 436 of Matt Austern’s book [4]). However, to avoid hard-to-find bugs, you’re best off if you just assume that after adding or removing elements from a vector or deque, previously used memory addresses are no longer good because of shifts and possible reallocations, and therefore all iterators are invalid. Any reasoning more subtle than that should be used only if there is a compelling need for it.

With lists, maps, and sets, the situation is considerably better. When elements are added to these containers, all iterators that point to existing elements remain valid. When an element is removed, only iterators that point to that element are invalidated. The technical reason why this guarantee can be made is that insertions and deletions never cause reallocations of existing elements, but only bending of pointers.

Listing 3 shows two loops that iterate through a container and erase all elements that satisfy some condition. The first loop works on an std::deque, the second one on an std::list. In the list example, we may fetch the end iterator once, assign it to a variable, and then use that variable in the while-condition. In the deque example, that would be a terrible mistake, because the end iterator is probably going to change as elements are being erased from the deque. In both examples, the iterator to the erased element becomes invalid. For all STL containers, the member function erase returns an iterator to the position one past the erased element. Therefore, this return value is used to advance to the next position in the loop.

References

[1] http://www.peerdirect.com/resources/gotw075.html

[2] Jim Hyslop and Herb Sutter. “Conversations: Access Restrictions,” C/C++ Users Journal C++ Experts Forum, February 2001, http://www.cuj.com/experts/1902/hyslop.html.

[3] Scott Meyers. “How Non-Member Functions Improve Encapsulation,” C/C++ Users Journal, February 2000.

[4] Matthew H. Austern. Generic Programming and the STL (Addison Wesley, 1998).

Thomas Becker works as a senior software engineer for Zephyr Associates, Inc. in Zephyr Cove, NV. He can be reached at [email protected].


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.