If all these conditions are met, then a type can be copied using memcpy
rather than using a compiler-generated assignment operator. The type-traits library provides a class has_trivial_assign
, such that has_trivial_assign<T>::value
is true
only if T
has a trivial assignment operator. This class "just works" for scalar types, but has to be explicitly specialized for class/struct types that also happen to have a trivial assignment operator. In other words, if has_trivial_assign
gives the wrong answer, it will give the safe wrong answer -- that trivial assignment is not allowable. The code for an optimized version of copy that uses memcpy
where appropriate is given in Listing Four. The code begins by defining a template class copier
that takes a single Boolean template parameter, and has a static template member function do_copy
, which performs the generic version of copy (in other words the "slow but safe version"). Following that there is a specialization for copier<true>
. Again, this defines a static template member function do_copy
, but this version uses memcpy
to perform an optimized copy.
Listing Four
namespace detail{ template <bool b> struct copier { template<typename I1, typename I2> static I2 do_copy(I1 first, I1 last, I2 out); }; template <bool b> template<typename I1, typename I2> I2 copier<b>::do_copy(I1 first, I1 last, I2 out) { while(first != last) { *out = *first; ++out; ++first; } return out; } template <> struct copier<true> { template<typename I1, typename I2> static I2* do_copy(I1* first, I1* last, I2* out) { memcpy(out, first, (last-first)*sizeof(I2)); return out+(last-first); } }; } template<typename I1, typename I2> inline I2 copy(I1 first, I1 last, I2 out) { typedef typename boost::remove_cv< typename std::iterator_traits<I1> ::value_type>::type v1_t; typedef typename boost::remove_cv< typename std::iterator_traits<I2> ::value_type>::type v2_t; enum{ can_opt = boost::is_same<v1_t, v2_t>::value && boost::is_pointer<I1>::value && boost::is_pointer<I2>::value && boost:: has_trivial_assign<v1_t>::value }; return detail::copier<can_opt>:: do_copy(first, last, out); }
To complete the implementation, what we need now is a version of copy
that calls copier<true>::do_copy
if it is safe to use memcpy
, and otherwise calls copier<false>::do_copy
to do a generic copy. This is what the version in Listing Four does. To understand how the code works, look at the code for copy
and consider first the two typedef
s v1_t
and v2_t
. These use std::iterator_traits<Iter1>::value_type
to determine what type the two iterators point to, and then feed the result into another type-traits class remove_cv
that removes the top-level const
- or volatile
qualifiers: This will let copy
compare the two types without regard to const
- or volatile
-qualifiers. Next, copy
declares an enumerated value can_opt
that will become the template parameter to copier -- declaring this here as a constant is really just a convenience -- the value could be passed directly to class copier. The value of can_opt
is computed by verifying that all of the following are true:
- First, that the two iterators point to the same type by using a type-traits class
is_same
. - Then, that both iterators are real pointers, using the class
is_pointer
previously described. - Finally, that the pointed-to types have a trivial assignment operator using
has_trivial_assign
.
Finally we can use the value of can_opt
as the template argument to copier. This version of copy
will now adapt to whatever parameters are passed to it, if it's possible to use memcpy
, then it will do so; otherwise it will use a generic copy
.
Was It Worth It?
According to Donald Knuth, "premature optimization is the root of all evil" (ACM Computing Surveys, December, 1974). So the question must be asked: Was our optimization premature? To put this in perspective, Table 1 lists the timings for our version of copy
compared to a conventional generic copy.
(The test code is available as part of the boost utility library; see algo_opt_ examples.cpp. The code was compiled with gcc 2.95 with all optimizations turned on.
Table 1: Time taken to copy 1000 elements using copy<const T*, T*> (times in microseconds).
Clearly, the optimization makes a difference in this case; but, to be fair, the timings are loaded to exclude cache miss effects. Without this, accurate comparison between algorithms becomes difficult. However, perhaps we can add a couple of caveats to the premature optimization rule:
- If you use the right algorithm for the job in the first place then optimization will not be required; in some cases,
memcpy
is the right algorithm. - If a component is going to be reused in many places by many people, then optimizations may well be worthwhile where they would not be for a single case. In other words, the likelihood that the optimization will be absolutely necessary somewhere, sometime is much higher. Just as importantly, the perceived value of the stock implementation will be higher: There is no point standardizing an algorithm if users reject it on the grounds that there are better, more heavily optimized versions available.
Pair of References
The optimized copy
example shows how type traits can be used to perform optimization decisions at compile time. Another important use of type traits is to allow code to compile that otherwise would not do so unless excessive partial specialization were used. This is possible by delegating partial specialization to the type-traits classes. Our example for this form of usage is a pair that can hold references. (John Maddock and Howard Hinnant have submitted a compressed_pair library to Boost, which uses a technique similar to the one described here to hold references. Their pair also uses type traits to determine if any of the types are empty, and will derive to conserve space instead of contain; hence the name "compressed.") First, we'll examine the definition of std::pair
, omitting the comparison operators, default constructor, and template copy constructor for simplicity; see Listing Five.
Listing Five
template <typename T1, typename T2> struct pair { typedef T1 first_type; typedef T2 second_type; T1 first; T2 second; pair(const T1 & nfirst, const T2 & nsecond) :first(nfirst), second(nsecond) { } };
This pair cannot hold references as it currently stands, because the constructor would require taking a reference to a reference, which is currently illegal. (This is actually an issue with the C++ Core Language Working Group, issue #106, submitted by Bjarne Stroustrup. The tentative resolution is to allow a "reference to a reference to T" to mean the same thing as a "reference to T," but only in template instantiation, in a method similar to multiple cv-qualifiers.) Consider in Table 2 what the constructor's parameters would have to be to allow pair
to hold nonreference types, references, and constant references.
Table 2: Allowing pair to hold nonreference types, references, and constant references.
A little familiarity with the type-traits classes lets you construct a single mapping that allows you to determine the type of parameter from the type of the contained class. The type-traits classes provide a transformation add_reference (Table 3), which adds a reference to its type, unless it is already a reference.
Table 3: The transformation add_reference adds a reference to its type.
This lets you build a primary template definition for pair
that can contain nonreference types, reference types, and constant reference types; see Listing Six.
Listing Six
template <typename T1, typename T2> struct pair { typedef T1 first_type; typedef T2 second_type; T1 first; T2 second; pair(boost::add_reference<const T1>::type nfirst, boost::add_reference<const T2>::type nsecond) :first(nfirst), second(nsecond) { } };
Add back in the standard comparison operators, default constructor, and template copy constructor (which are all the same), and you have a std::pair
that can hold reference types. This same extension could have been done using partial template specialization of pair
, but to specialize pair
in this way would require three partial specializations, plus the primary template. Type traits let you define a single primary template that adjusts itself to any of these partial specializations, instead of a brute-force partial specialization approach. Using type traits in this fashion lets you delegate partial specialization to the type-traits classes, resulting in code that is easier to maintain and easier to understand.
Conclusion
Templates have enabled C++ users to take advantage of the code reuse that generic programming brings. However, generic programming does not have to sink to the lowest common denominator, and templates can be optimal as well as generic. Therein lies the value of type traits.
Acknowledgments
We would like to thank Beman Dawes and Howard Hinnant for their helpful comments when preparing this article.
John is an independent programmer in England with an interest in generic programming. Steve, who lives in Michigan, currently works on a variety of systems for Control Engineering Company. They can be contacted at john_maddock@compuserve .com and [email protected], respectively.