The std::map Example Explained
Before I get to the main topic of this months column, I owe you the solution to the little homework assignment I gave you. In my last column, I showed you how std::map could be used to easily and efficiently solve a time-honored problem: how to keep track of the number of times that each word occurs in a text. As an exercise, I encouraged you to use your STL book or documentation to fully understand the code, which is shown again in Listing 1. As promised, I will now explain the code in detail.
What needs explanation here is of course the one line that makes up the body of the while-loop. Before we analyze the code, a word about style is in order. I consider cramming several nested function calls into a single line like this to be poor programming style, typical of the it-was-hard-for-me-to-write-why-should-it-be-easy-for-others-to-understand school of software engineering. Another great example of an attitude that is deemed cool only by those who sport it, while everybody else thinks its pretty dumb. A better idea is to split up a line like this one into several lines, each of which defines a variable that is then used as a function argument in a subsequent line. Listing 2 shows what becomes of my one-liner when it is completely unraveled.
Without going into any details yet, if you compare Listing 2 to my original one-liner in Listing 1, you will probably agree that these two versions represent two extremes, neither of which is entirely satisfactory. Due to the abundance of class templates in the STL, defining intermediate variables often obfuscates the code more than it clarifies it. As we will see in just a moment, the STL actually encourages you to lean at least a little towards the one-line style. For a very interesting, almost funny pitfall in this context, I encourage you to read Herb Sutters Guru of the Week #75 [1].
Let us now analyze Listing 2 step by step. In the first line, an std::pair is declared. std::pair is a simple struct that contains two public data members, first and second, whose types are given by template parameters. There are some extras that come with std::pair that you should look up in your documentation, but the essence of it is just this:
template<typename _T1, typename _T2> struct pair { pair(const _T1& f, const _T2& s) : first(f), second(s) {} _T1 first; _T2 second; };
Notice how the STL violates one of the rules of orthodox OO (object-oriented) programming here by providing public data members. I recommend that you read Jim Hyslops and Herb Sutters thoughts on this interesting subject [2]. Also notice that Im using a constructor here to generate the pair, whereas the original one-liner in Listing 1 used the function template std::make_pair. std::make_pair is a simple wrapper around the constructor:
template<typename _T1, typename _T2> inline pair<_T1, _T2> make_pair(const _T1& f, const _T2& s) {return pair<_T1, _T2>(f, s); }
The point of having this function template is that it allows you to generate a temporary object of type std::pair without ever having to write the type of the object. If it werent for the function std::make_pair, the call
std::make_pair(strWord, 0)
in the one-liner in Listing 1 would have to be replaced by the constructor call
std::pair<std::string, int>(strWord, 0)
Using std::make_pair is more elegant because you let the compiler deduce the template arguments std::string and int. When using the STL, you will encounter many more of these constructor wrappers. Examples are std::bind1st, std::mem_fun, and std::ptr_fun, which I will discuss in future columns. All of these are meant to be used in the same way as I used std::make_pair, that is, to generate a temporary object that is directly supplied as a function argument, thus avoiding the definition of an intermediate named object. In the presence of complex template arguments, this technique can really improve code readability. So, as you can see, a certain dose of cram-it-all-into-one-line can be a good thing when working with the STL. Note also how the STL violates another supposedly ironclad rule of orthodox OO programming, by using non-member functions. I recommend Scott Meyers article [3] for an interesting angle on this subject.
At this point in our analysis of Listing 2, we have a pair consisting of the current word and the integer zero. In the second line, this pair is passed as a key-value pair to the insert method of our word counting map. This method works as follows: if a key-value pair with the same key (our current word in this case) already exists in the map, nothing happens. No new key-value pair is inserted, and the value of the existing pair is not modified. If no key-value pair with the same key exists in the map, the pair that was passed to insert is inserted into the map.
The return value of std::map::insert is a pair. The second element of that pair is a Boolean that is true if an actual insertion took place, and false otherwise. The first element of the pair is an iterator that points to the key-value pair in the map that has the given key, regardless of whether that pair has just been inserted or whether it was already there.
In our application, we dont care if the current word already had an entry in the map or not. In either case, all that remains to be done is to increment the value (the word count) for the words entry. Therefore, we ignore the second component of inserts return value. The line marked // 3 in Listing 2 extracts the first component of that return value and assigns it to a suitable iterator variable. The result of dereferencing an std::map::iterator is a reference to the key-value pair that the iterator points to. Therefore, the last line of Listing 2 increments the value of the map entry for our current word. If we wanted to be ridiculously verbose, we could have replaced this line with
std::pair<const std::string, int>& pairEntryInMap = *itEntryInMap; ++pairEntryInMap.second;
Note that the exact result type of dereferencing the iterator is a reference to
std::pair<const std::string, int>
and not to
std::pair<std::string, int>
although our original declaration of the map was
std::map<std::string, int>
Whenever you access a key-value pair in an std::map, you get a pair whose first component is const, even if you did not explicitly make the key type constant in your map declaration. This is necessary because the map is internally ordered by keys. If it were possible for you to modify keys, you would be corrupting the map, rendering it inconsistent and useless.
I want you to realize how important it is to understand precisely how STL methods, such as std::map::insert, work. Somebody who has a superficial understanding of std::map might well write map insertion as a two-step process: in the first step, he would use std::maps find method (or, for some really bad code, the algorithm std::find) to check if the map already contained an entry with the key in question. If the answer was no, the insertion would be made in a second step. This is bad because it uses two independent map lookups where one would suffice. std::map::insert was written specifically to avoid this inefficiency.
In conclusion, I urge you to compare the ease and efficiency with which we just implemented the word counting mechanism to the effort that it takes to write something similar from scratch.
STL Container Iterators
As for the main topic of this column, STL container iterators, let me begin by saying that I will make iterators the topic of more columns in the future. The reason is that, in my experience, this is where the extensibility of the STL really starts to pay off. Writing your own iterators and using them creatively, especially in conjunction with STL algorithms, can work wonders for your code in terms of elegance, maintainability, efficiency, and productivity. But first things first. Your first exposure to iterators will undoubtedly come from using the iterators that the STL containers expose. This is where you build your understanding and your intuition of the iterator concept. As usual, reading my treatment of the subject is not a substitute for studying your STL documentation. My goal is to introduce the subject and to emphasize some points that are often overlooked in the heat of the coding battle.
The interfaces of the iterators that the STL containers expose emphasize the analogy to C-style pointers. All of these iterator classes define the following operators:
operator++ // prefix and postfix operator-- // prefix and postfix operator* operator->
Moreover, these operators have the effect that you would expect if you viewed iterators as generalized pointers. No big surprises here. The operators
operator+ operator-
are used to advance or decrement an iterator by an arbitrary number of steps. These operators are provided by vector iterators and deque iterators, but not by list, map, or set iterators. Of course these operators could be implemented in terms of operator++ and operator-- for all container iterators. The point is that only vector and deque iterators can be advanced and decremented in amortized constant time, like actual C-style pointers. For all other STL containers, the cost of advancing or decrementing an iterator is proportional to the number of steps moved. The STL reminds you of this fact by not providing operator+ and operator- in these cases.
There is a non-member function template std::advance that operates on any iterator and has the same effect as operator+. The std::advance template determines intelligently (typically at compile time) what kind of iterator it has been given. It then uses the most efficient way to advance the given iterator. When using std::advance, bear in mind that the complexity of the operation you are performing depends heavily on the type of iterator that you have supplied. All this goes to show that the analogy between C-style pointers and STL container iterators, helpful as it is, will break down eventually, especially when it comes to efficiency considerations.
Const and Non-const Iterators
One of the hallmarks of good C++ programming style is const correctness. Objects that are not meant to be modified should be declared const, and member functions that do not modify the state of the object should be made const member functions. The STL container iterators support and encourage this principle, and again, they do so in analogy to C-style pointers. When you work with a C-style pointer and you know that the objects pointed to are not to be modified, then you declare your pointer as a pointer to const, as in
const int* ptr;
or, equivalently,
int const * ptr;
All STL containers expose two types of iterators, called iterator and const_iterator. As a general rule, plain iterators allow you to modify the container element that they currently point to, whereas const_iterators do not. The only exception to this rule is that the keys of a map or set can never be modified, because that would corrupt the ordering. Technically, iterators in general return a reference to the object they point to, whereas const_iterators return a const reference to the object they point to. Heres a simple example:
std::vector<int> vectOfInts(42); std::vector<int>::iterator it = vectOfInts.begin(); *it = 0; // OK std::vector<int>::const_iterator cit = vectOfInts.begin(); // ERROR, attempt to modify *cit // through a const_iterator *cit = 1; // fine, no attempt to modify *cit int i = *cit;
As you can see, an iterator behaves just like a pointer to non-const, whereas a const_iterator behaves like a pointer to const. In fact, it is quite likely that in your STL implementation, the iterators of the example above are typedefed as int* and const int*, respectively. It is important to understand the difference between a const_iterator and an iterator that is declared as const. The former corresponds to a pointer to const, whereas the latter corresponds to a const pointer:
std::vector<int> vectOfInts(42); const std::vector<int>::iterator it = vectOfInts.begin(); *it = 0; // OK, *it is modifiable // ERROR, calling non-const operator // on const object. ++it; // ERROR, attempt to modify a const object it = vectOfInts.end();
Const iterators like the one in this example are not a very common occurrence when working with STL containers and iterators. After all, an iterators job is to iterate, so its hardly any good if it cant move. By contrast, const_iterators can and should be used quite frequently. A typical example is the for-loop at the end of Listing 1. Here, we iterate through a container for the sake of printing its elements to the console. Common const-correctness requires that we use const_iterators instead of iterators.
As far as conversions between const and non-const are concerned, iterators again behave like their cousins, the pointers. An iterator converts to a const_iterator, but not vice versa. Unfortunately, an early STL implementation that came with Microsoft Visual C++ had conversions both ways, but that has been fixed. Again, Listing 1 has an example for this conversion: mapWordCount.begin and mapWordCount.end return std::map::iterators, which get converted to std::map::const_iterators as they are being assigned to itCurrent and itEnd, respectively.
Now we know what the story is on constness and iterators. But what does it mean for a container to be const? First of all, you cannot of course call any non-const member functions on a const container:
const std::vector<int> constVect; // ERROR, cannot modify a const object constVect.resize(100);
The only way to create a const container object with anything in it is to put that something in it when the container is initialized:
std::vector<int> nonconstVect; ... // put stuff in nonconstVect ... const std::vector<int> constVect = nonconstVect;
The STL goes one step further and ensures that const container objects actually obey deep constness; that is, not only is it impossible to modify a const container object, it is equally impossible to modify any of the objects that it contains.
const std::vector<int> constVect(42); // ERROR, constVect[0] is not modifiable constVect[0] = 1;
In order to fully enforce this deep constness of containers, the STL allows you to obtain only const_iterators, but not iterators, from a const container object:
const std::vector<int> constVect(42); // ERROR, cannot obtain non-const iterator // for const container std::vector<int>::iterator it = constVect.begin(); std::vector<int>::const_iterator cit = constVect.begin(); // OK
Technically, this is achieved by overloading the begin and end methods of the containers. Each of these has a non-const version, which returns an iterator, and a const version, which returns a const_iterator.
Iterator Validity
Designers of container and iterator classes must face an important question: what should happen to an iterator when elements are added or removed from the container it points into? For STL vectors and deques, the short answer is: when a vector or deque is modified in this manner, then all iterators into the vector or deque become invalid. The full truth is a little more subtle than that (for example, see the remarks on page 436 of Matt Austerns book [4]). However, to avoid hard-to-find bugs, youre best off if you just assume that after adding or removing elements from a vector or deque, previously used memory addresses are no longer good because of shifts and possible reallocations, and therefore all iterators are invalid. Any reasoning more subtle than that should be used only if there is a compelling need for it.
With lists, maps, and sets, the situation is considerably better. When elements are added to these containers, all iterators that point to existing elements remain valid. When an element is removed, only iterators that point to that element are invalidated. The technical reason why this guarantee can be made is that insertions and deletions never cause reallocations of existing elements, but only bending of pointers.
Listing 3 shows two loops that iterate through a container and erase all elements that satisfy some condition. The first loop works on an std::deque, the second one on an std::list. In the list example, we may fetch the end iterator once, assign it to a variable, and then use that variable in the while-condition. In the deque example, that would be a terrible mistake, because the end iterator is probably going to change as elements are being erased from the deque. In both examples, the iterator to the erased element becomes invalid. For all STL containers, the member function erase returns an iterator to the position one past the erased element. Therefore, this return value is used to advance to the next position in the loop.
References
[1] http://www.peerdirect.com/resources/gotw075.html
[2] Jim Hyslop and Herb Sutter. Conversations: Access Restrictions, C/C++ Users Journal C++ Experts Forum, February 2001, http://www.cuj.com/experts/1902/hyslop.html.
[3] Scott Meyers. How Non-Member Functions Improve Encapsulation, C/C++ Users Journal, February 2000.
[4] Matthew H. Austern. Generic Programming and the STL (Addison Wesley, 1998).
Thomas Becker works as a senior software engineer for Zephyr Associates, Inc. in Zephyr Cove, NV. He can be reached at [email protected].