Several algorithms in the standard library use a unary predicate to accomplish their tasks. Examples are the _if algorithms like count_if, find_if, remove_if, or replace_if, but also algorithms like partition. In this installment of the column, we take a closer look at unary predicates and what they may and must not do.
Let us see how the Standard defines a unary predicate. It is simply called predicate in the Standard [1].
UNARY PREDICATE.
The Predicate parameter is used whenever an algorithm expects a function object that when applied to the result of dereferencing the corresponding iterator returns a value testable as true. In other words, if an algorithm takes Predicate pred as its argument and first as its iterator argument, it should work correctly in the construct (pred(*first)){...}.
The function object pred shall not apply any non-constant function through the dereferenced iterator.
This function object may be a pointer to function, or an object of a type with an appropriate function call operator.
From this specification and the examination of the algorithms that use unary predicates (which we will look into later in this article), we can identify a number of properties typical of a unary predicate. We will discuss each of the properties in detail in this article. A unary predicate's properties are:
- A unary predicate must be callable.
- A unary predicate must accept one argument and return a value that is convertible to Boolean.
- A unary predicate need not be copyable.
- A unary predicate must not modify its argument.
- A unary predicate must not invalidate iterators or sequences that the algorithm works with.
- A unary predicate can have any side effect other than the ones described in 4 and 5.
- A unary predicate must be order-insensitive, which means that the effect of calling the predicate must be independent of the order in which sequence elements are supplied to it.
- A unary predicate need not yield the same result for different invocations with the same argument.
Basic Properties
Side-Effect Properties
Other Properties
Let us see why it makes sense that a predicate has exactly these properties.
Basic Properties 1, 2, and 3
A unary predicate must be callable, but need not necessarily be copyable, and must accept one argument and return a Boolean value.
These properties are more or less evident when we consider how the standard defines use of a unary predicate by an algorithm, namely that the predicate "should work correctly in the construct if (pred(*first)){...}". Here is a typical implementation of an algorithm that demonstrates the intended use of a unary predicate:
template <class InputIterator, class Predicate> typename iterator_traits<InputIterator>::difference_type count_if(InputIterator first, InputIterator last, Predicate pred) { typename iterator_traits<InputIterator>::difference_type n = 0; for ( ; first != last; ++first) if ( <B>pred(*first)</B> ) ++n; return n; }
In other words, a unary predicate is called like a function. The requirement of "callable" is met by a pointer to a function, but also by an object (or a reference to an object) of a type that has the function call operator defined (a so-called functor). One argument is supplied when the predicate is called. The argument is the result of dereferencing an iterator, that is, a reference to a sequence element. The return value is used as a conditional expression and must be convertible to type bool. And this fully describes the purpose of a unary predicate: it is called to produce a Boolean result for elements in a sequence.
In particular, there is no requirement regarding the copy semantics of a unary predicate. It need not even be copyable at all. As a general rule, an algorithm must not rely on any properties that are not required of the objects it uses. This includes that an algorithm must not copy the predicate, because a user is not required to provide any reasonable copy semantics for his predicates. It's just fine if we declare copy constructor and copy assignment as private members of our predicate type and pass around our predicate objects by reference, if that makes sense to us. It should not break any algorithm. In practice, you will find libraries that assume copyability of predicates, although they shouldn't. One of the surprising effects of such standard library implementations have been discussed in an article in C++ Report [2]. Meanwhile some library implementations have eliminated this restriction and behave as expected; for instance, see Metrowerks CodeWarrior 6.0. Taking into account the different library implementations, we would say that unary predicates with "interesting" or no copy semantics are best avoided.
In practice, most predicates have normal copy semantics. This is because usually we invoke an algorithm in a way that the predicate is passed by value. To do this, the predicate must be copyable. Non-copyable predicates can be useful, but they are unusual, because they must be passed by reference, which takes extra care and requires interesting-looking template syntax. We discussed some of the related issues like pass-by-reference of function objects in a previous article, how we can do it, and why we might want to do it [3]. We'll spare you any further discussion of the matter right here. Instead let's move on to the remaining predicate properties, which address the issue of side effects produced by a unary predicate.
Side-Effect Properties 4, 5, and 6
A unary predicate can have any side effects except modification of its argument and invalidation of iterators or sequences that the algorithm works with.
The Standard prohibits certain side effects, but allows others. Why? To understand this requirement, consider what happens inside an algorithm that uses a unary predicate. There are two side-effect-producing entities: the algorithm itself and the unary predicate. The algorithm iterates over the sequence of input elements, inspects the elements, supplies them as an argument to the predicate, modifies and copies them, and might produce other side effects. The unary predicate receives a reference to an element in the input sequence and might also inspect and modify the element and produce other side effects. Naturally, these activities might collide. For this reason, let us take a look at the side effects produced by a unary predicate and classify them regarding their potential for conflict into harmful, potentially harmful, and harmless side effects.
Harmful Side Effects
Harmful side effects invalidate any iterator or sequence that the algorithm operates on. (No function object should ever produce a harmful side effect. This is a general rule that applies to all function objects, not just to predicates. The Standard does not even explicitly prohibit these side-effects, probably because they are considered "common sense.")
An example of a harmful predicate would be a predicate that internally keeps a pointer or reference to the sequence that the algorithm operates on and uses the reference for erasing sequence elements. The removal of elements might invalidate iterators that were supplied to the algorithm in order to describe the input or output sequence, and under these circumstance, the algorithm is likely to produce a program crash.
Removal of sequence elements is a fairly obvious source of problems, but at times the invalidation of the sequence is less obvious. If for instance the algorithms relies on the sorting order of the sequence and the predicate intentionally or inadvertently modifies the sorting order when it is invoked, then this would lead to unpredictable results.
In any case, predicates with harmful side effects must be avoided under all circumstances. As a rule, never use function objects that invalidate any iterator or sequence that the algorithm operates on.
Potentially Harmful Side Effects
This category of side effects is explicitly prohibited by the Standard. All unary predicates that apply non-constant functionality to their arguments fall into this category, because they modify sequence elements. Let us refer to such predicates as mutating unary predicates. (Note the difference between "modifying the sequence" (harmful) and "modifying the sequence elements" (only potentially harmful): "modifying the sequence" means that elements are inserted or removed or shuffled around so that certain iterators or iterator ranges become invalid. "Modifying the sequence elements" means that sequence elements are accessed and their content is changed, but it does not invalidate any iterator.)
The potential harm of a mutating unary predicate stems from the fact that the predicate is not the only one that can access elements from the input sequence and change them. The algorithm itself might attempt to modify the same sequence of elements. In such a case, there are two side-effect-producing entities (the algorithm and the predicate), and their activities might collide.
When does such a collision happen? Not all algorithms modify elements from the input sequence, but some of them indeed do. Algorithms fall into several categories: non-modifying and mutating algorithms, and the mutating algorithms can be distinguished as in-place and copy algorithms. The non-modifying algorithms (e.g., count_if) only inspect sequence elements; they don't change anything. The mutating copy algorithms (e.g., replace_copy_if) do not modify elements in the input sequence, but copy them to the output sequence; they modify elements in the output sequence. The mutating in-place algorithms (e.g., replace_if) modify elements "in place," which means that they modify elements in the input sequence; they are the critical ones. Hence, the potential conflict between predicate and algorithm occurs for mutating in-place algorithms in conjunction with a mutating unary predicate.
Application of two modifications to the same sequence element raises a couple of questions such as: Which modification is performed first and potentially overwritten by the second? Is the result predictable at all? In order to avoid any such conflict between the algorithm and the predicate, the Standard requires that a unary predicate must not modify the elements in the input sequence that it receives as arguments. Note that the mutating side effect is disallowed for all unary predicates, not just for those that are supplied to mutating in-place algorithms.
This restriction is often reflected in the predicate's function signature: typically a unary predicate takes its argument either by value or by constant reference, in order to make sure that its argument (the referred-to input sequence element) cannot be modified.
Harmless Side Effects
Last, but not least, a predicate may have harmless side effects. All non-mutating access to sequence elements falls into this category. A predicate may apply constant functionality to its arguments; that is, it may inspect the sequence elements, but it must not modify them. In addition, a unary predicate may modify objects other than its argument. It may for instance have data members and change their values. Or it may refer to unrelated element sequences and change them. This is harmless as long as the sequences that are changed by the predicate are not simultaneously used by the algorithm.
Why Do We Care?
We've seen the different types of side effects that a predicate can produce and that many of the side effects are prohibited by the Standard. Why would we want to use predicates with side effects under these circumstances? Is a unary predicate with side effects a rare or a fairly common case?
Well, it depends. If you look at typical examples of predicates that are given in C++ text books, then you'll find unary predicates such as isEven defined as bool isEven(int elem) { return elem %2 == 0; } or expressions such as bind2nd(greater<int>(),4), which is a unary predicate created from a predefined binary predicate by means of a so-called binder. Even if you study unary predicates that are implemented as function object types (i.e., they are classes with an overloaded function call operator), they rarely have data members or do anything complicated, and they never have side effects.
In practice, this is slightly different. For instance, we care about efficiency. Obviously, it is faster to perform several things in one pass over the sequence instead of stepping through a lengthy sequence repeatedly. Consider a couple of examples.
Say, we have a container that represents our customer base. We need to determine the number of all frequent buyers for our internal statistics, and, as we go, we want to build up a mailing list because we intend to send promotional mail to the frequent buyers. However, the mailing list should not exceed a limit of 5,000 addressees. That's the task. How can we achieve this? One conceivable solution is a unary predicate that yields true for a frequent buyer and accumulates information for the mailing list as it goes. When supplied to the count_if algorithm, it would produce the desired count and would build up the mailing list as a side effect. Such a unary predicate strictly conforms to the rules. It accepts a constant reference to a client, inspects the client, and produces a harmless side effect, namely the mailing list.
Let us consider another, similar example. We need to remove all infrequent buyers from our customer base and update the records of the remaining frequent buyers regarding applicable discounts. Again, we try to be efficient and want to do both in one pass over the customer base. In comparison to the previous approach, we could try providing a unary predicate to remove_if that yields true for infrequent buyers (so that the algorithm removes them) and adds information about the discounts to the remaining buyers. In contrast to the earlier example, this is illegal, because adding information to elements in the input sequence is a prohibited side effect. Remember: The predicate must not modify its argument. But that's exactly what we intend to do: we want to update the remaining elements in the sequence. So, what do we do?
There is not much that we can do. A thorough study of the algorithms section of the Standard reveals that the for_each algorithm is the only algorithm that accepts a function object and allows the function object to modify its argument. (We discussed for_each in a previous article [4].) For this reason, we are very restricted in our choice of algorithms for a given task if that task includes modification of elements in the input sequence; for_each is the basically the only choice.
The consequence is that we must split the task into the non-modifying activities (implemented as a unary predicate for remove_if) and the mutating activities (implemented as a functor for for_each). This way an additional pass over the entire sequence cannot be avoided, including the inevitable loss in efficiency.
The alternatives are:
- a function object for use with for_each that removes and modifies elements (thus repeating the functionality of remove_if in our predicate, which is certainly not desirable)
- a user-defined version of the remove_if algorithm that allows modification of elements in the input sequence (which is doable; even a copy of the standard algorithm might do depending on how it is implemented)
- a manually coded algorithm (ignoring the standard algorithms altogether)
The bottom line is that unary predicates cannot be used if elements from the input sequence must be modified.
If such a modification is desired (e.g., for efficiency reasons), then we must split the task into modifying and non-modifying aspects and must accept multiple passes over the input sequence if we want to use the standard algorithms.
Property 7
A unary predicate must be order-insensitive; that is, its effects must be independent of the order of invocation.
In conjunction with side effects of unary predicates, two other aspects are relevant: order and number of invocation of the predicate. If a side effect is produced each time the predicate is applied to an element of the input sequence, then we would want to know how often and in which order the side effects are produced. For instance, in our example, we increment a count in order to determine the maximum size for the generated mailing list. Naturally, it makes a difference if the predicate is applied exactly once per element or maybe repeatedly. Depending on the nature of the side effect, the order and number of invocations plays a role.
The number of invocations is exactly specified: algorithms like count_if or remove_if apply the unary predicate exactly once per element in the input sequence. Order of invocation is different: none of the algorithms that take a unary predicate specifies the order in which it supplies sequence elements to the predicate. As a consequence, unary predicates must be independent of the order of invocation. If we use a predicate that depends on the order of invocation, the result is unpredictable.
Here is an example: a (order-sensitive) predicate that yields true for every nth element in the sequence:
class Nth { public: Nth(int n) : theN(n), theCnt(0) {} bool operator()(int) { return (++theCnt)%theN; } private: const int theN; int theCnt; };
If we supply a predicate such as Nth(3) to an algorithm like remove_copy_if, then one might expect that it would move every third element from the input sequence to the output sequence. But this is not guaranteed, because the sequence elements are not necessarily supplied in any definite order. All that we can reasonably rely on is that a third of all elements will be moved from the input sequence to the output sequence.
Why doesn't the Standard give a guarantee regarding the order of invocation of the unary predicates? This is because certain algorithms can be optimized for certain types of iterators. For instance, an algorithm might step through the sequence from beginning to end if the iterator is an input iterator, but could do random jumps for a random access iterator. Since the Standard does not want to restrict the potential for such optimizations, it does not give any guarantees for the order of invocation of the unary predicates.
The consequence for users of the STL is that all unary predicates must not depend on the order in which sequence elements are supplied to it. If we want to use order-sensitive predicates, then we must implement our user-defined algorithms that give us the guarantee we need regarding the order of invocation of the unary predicate.
Related to the order and number of invocations of the unary predicate, there is the following last observation.
Property 8
A unary predicate need not yield the same result for different invocations with the same argument.
This property might sound a little far-fetched. We include this into the list of properties because we've observed that occasionally it is assumed that there is an implied requirement that predicates exhibit "stable" behavior, in the sense that they yield the same result each time they are invoked with the same or an "equal" argument. This assumption is not justified; the Standard does not define any such requirement.
Why then is it occasionally assumed that "stable" behavior is required of a unary predicate? Because it would give the implementations a considerable amount of latitude. With "stable" behavior, it would not matter how often the predicate is called for the same sequence element, since the result would always be the same. It would also be irrelevant whether a reference to a sequence element is supplied to the algorithm or a temporary copy of the element (namely an "equal" element).
However, for unary predicates in the STL, there is no requirement of "stable" behavior. It perfectly makes sense to define an "instable" predicate. An example would be a predicate that yields true for all elements with a certain property until a limit is reached. It could be used to copy a maximum number of elements with a given property from an input sequence to an output sequence using remove_copy_if.
Summary
The following algorithms in the standard library use a unary predicate: replace_if, remove_if, partition, stable_partition, replace_copy_if, remove_copy_if, count_if, and find_if.
If a unary predicate has the properties listed below, then it can be provided to any of the algorithms listed above, and the result will be portable and predictable.
- A unary predicate must be callable and must accept one argument and return a Boolean value, but need not have any particular copy semantics. (Some library implementations have restrictions in this area because they require certain copy semantics.)
- A unary predicate must not modify its argument and must not invalidate iterators or sequences that the algorithm works with, but may have any other side effects.
- A unary predicate must be independent of the order of invocation and may yield different results for different invocations with the same argument.
References
[1] International Standard. Programming languages C++ ISO/IEC IS 14882:1998(E).
[2] Nicolai M. Josuttis. "Predicates vs. Function Objects," C++ Report, June 2000.
[3] Klaus Kreft and Angelika Langer. "Effective Standard C++ Library: Explicit Function Template Argument Specification and the STL," C/C++ Users Journal, December 2000, http://www.cuj.com/experts/1812/langer.html.
[4] Klaus Kreft and Angelika Langer. "Effective Standard C++ Library: for_each vs. transform," C/C++ Users Journal, February 2001. http://www.cuj.com/experts/1902/langer.html
Klaus Kreft is senior consultant at Siemens Business Services in Germany. He can be reached at [email protected].
Angelika Langer works as an independent instructor and mentor. She can be reached at [email protected].
They are authors of the book Standard C++ IOStreams and Locales.