Anyone who has been programming for any length of time is familiar with Boolean option flags: treating a group of options as a single unit by packing them into one word, using a single bit for each choice. So, for example, to set the permissions on a Unix file, you might write something like the following:
chmod("my_file", S_IWUSR | S_IRUSR | S_IRGRP | S_IROTH);
Each of those constants corresponds to a single bit; by combining them with the bitwise-or operator, you can specify many options at once.
Packing multiple option bits into a single word is extremely common. This technique is used in many places in the Unix and Win32 APIs, in the ios_base format flags in the C++ Standard library, and some form of it tends to show up in most large programs. Collections of bits are important.
Its not hard to see why this technique is common: the alternative, an array or a struct with a different field for each option, would be clumsier and would be wasteful of memory. However, there are also times when this technique can be a nuisance. First, some operations can be awkward: setting a named bit is straightforward enough (flags |= S_IRGRP), but clearing a bit (flags &= ~S_IWGRP) is somewhat uglier. You can test whether a bit is set by using it as a mask, as in if (flags & S_IWUSR), but woe to you if you make the mistake of clarifying the test by writing it as if ((flags & S_IWUSR) == true), or, still worse, if (flags & S_IWUSR == true). Referring to a numbered bit, as opposed to a named one, is also awkward: you wind up with expressions like flags &= ~(1 << n), usually sprinkled with casts. Finally, this technique is hard to scale to cases where you have many options: once the number of flags exceeds the number of bits in a long, you need to do something else.
Because collections of bits are important, the C++ Standard library provides explicit support for them in fact, several kinds of support. Sometimes youll still want to use low-level bit manipulation operations (and sometimes youll have to, if youre interfacing with a C API), but in most cases the C++ librarys versions are better. They have a few minor quirks, but most of them are easily worked around.
bitset
The class std::bitset appears with associative containers in chapter 23 of the C++ Standard. Thats not really the right place for it, since bitset has nothing to do with the associative containers like set and map; it doesnt satisfy even the most basic requirements for an STL container. Its better to think of bitset as an integer, each of whose bits you can access individually except its an integer that isnt restricted to the size of a long. The size of a bitset is determined at compile time (the number of bits is a template parameter), but there is no upper limit: a bitset<32> is 32 bits long, and a bitset<1000> is 1000 bits.
All of the integer bitwise operations that youre already used to are still available with bitset, and, for the sake of convenience, a few more as well. So, for example, you can perform bitwise-xor by writing b1 ^ b2 (at least if b1 and b2 are the same length). There are two different interfaces for manipulating individual bits: you can set the nth bit with b.set(n), clear it with b.reset(n), and test it with if (b.test(n)); or, almost equivalently, you can think of a bitset as an array and write these same operations as b[n] = true, b[n] = false, and if (b[n]). (Almost, because there is one small difference: the array version isnt range checked, but the set/reset/test version is. If the argument you pass to set, reset, or test is too large, youll get an out_of_range exception.)
If your bitset is modestly sized, theres a very direct sense in which you can think of it as an integer: theres a constructor that creates a bitset from an unsigned long, and a member function, to_ulong, that extracts an unsigned long from a bitset. Of course, you cant directly use this constructor to initialize more bits than unsigned long supplies; similarly, you cant use to_ulong to extract more bits than will fit in an unsigned long. (If you try, and if any of the high-order bits beyond the range of unsigned long are set, then to_ulong will throw an exception.) If necessary, however, you can use shifting and masking to get around those restrictions:
const int N = sizeof(unsigned long) * CHAR_BIT; unsigned long high = 0x7B62; unsigned long low = 0x1430; std::bitset<2*N> b = (std::bitset<2*N>(high) << N) | std::bitset<2*N>(low); ... const std::bitset<2*N> mask((unsigned long)(-1)); low = (b & mask).to_ulong(); high = (b >> N).to_ulong();
The zeroth bit is defined to be the least significant bit, so, for example, if you write:
std::bitset<4> b(0xA);
the bits that are set are b[1] and b[3].
Its easy enough to use bitset as a replacement for the traditional version of option flags: just declare bitset objects in the header file, instead of integer constants. Weve already mentioned two advantages of using bitset: you can have more flags than will fit into a long, and you get easier and safer methods of manipulating individual bits. Another is that bitset gives you a conversion mechanism to and from a textual representation.
First, bitset provides the usual I/O operators. This program,
#include <bitset> #include <iostream> int main() { std::bitset<12> b(3432); std::cout << "3432 in binary is " << b << std::endl; }
gives the obvious result:
3432 in binary is 110101101000.
The input operator works the same way: it reads a string of ones and zeros, converting them to a bitset.
Second, you can convert bitsets to and from strings: theres a constructor that takes a string argument, and a bitset<>::to_string member function. Alas, while these conversions are useful, the details turn out to be extremely inconvenient. Both the string constructor and the to_string member function are member templates, because the library class std::basic_string is itself a template; the ordinary string class, std::string, is an alias for basic_string<char>.
The generality of these member templates interacts in unfortunate ways with some of C++s more obscure language rules. You have to write
std::bitset<6>(std::string("110101"));
as opposed to just
std::bitset<6>("110101");
The obvious version, where you just pass the string literal "110101", will give you a compile-time error because the compiler wont know which version of the member template to instantiate. Similarly, if b is a bitset, you cant just write
std::string s = b.to_string();.
You have to use this truly horrible form instead:
std::string s = b.template to_string<char, std::char_traits<char>, std::allocator<char> >();
(Yes, that funny looking template keyword in there really is necessary.)
In practice, of course, you shouldnt pollute your code with things like that. Unless you really need to work with alternate character types, you can encapsulate these ghastly syntactic details in helper functions:
template <std::size_t N> std::bitset<N> from_string(const std::string& s) { return std::bitset<N>(s); } template <std::size_t N> std::string to_string (const std::bitset<N>& b) { return b.template to_string<char, std::char_traits<char>, std::allocator<char> >(); } vector<bool>
A bitset does have one major limitation: it has a fixed length. You may have a bitset thats much larger than a long, but you still have to specify its size in advance. Thats fine for something like a set of option flags, but its less appropriate for other purposes. Suppose, for example, that youre processing a large collection of items in a complicated order, and you need to keep track of which ones youve already seen. This still calls for an array of Boolean values, and it still makes sense to use a packed array that uses a single bit for each element, but bitset is no longer an appropriate choice. The number of items youre processing may not be known until run time, and items may even be added or removed.
The C++ Standard librarys other mechanism for managing collections of bits is vector<bool>, a specialization of the vector<> template. In some ways vector<bool> is very much like bitset: it uses one bit for each element, and it lets you use array syntax (e.g., v[3] = true) to access individual bits. The difference is that while bitset has special mechanisms of its own, vector<bool> uses the ordinary STL interface. You can change the number of elements in a vector<bool> by using the resize member function, or add new elements by using push_back, just as you can with any other vector<T>.
Although vector<bool> has no special support for bitwise operations, you can still perform those operations using ordinary STL algorithms and function objects. Instead of writing v3 = v1 & v2, for example, you could write:
std::vector<bool> v3(v1.size()); std::transform(v1.begin(), v1.end(), v2.begin(), v3.begin(), std::logical_and<bool>());
Similarly, to write a vector<bool> in the same format that operator<< uses when writing a bitset, you can again use an STL algorithm:
std::copy(v.rbegin(), v.rend(), std::ostream_iterator<char> (std::cout));
(This code relies on the fact that, by default, bool output uses '1' and '0' rather than true and false. Note also that were using rbegin and rend so that we copy the vector<bool> in reverse order. Thats how bitset output works: the leftmost digit in the printed representation of a bitset is b[N-1], not b[0].)
You should always use bitset in preference to vector<bool> when possible: a fixed-size data structure is going to have better performance, both in space and in time, than a data structure that supports the general-purpose vector interface. (In one timing test that I performed, bitset was faster than vector<bool> by a factor of almost five.) You need to use vector<bool> if youre managing a collection of bits whose size cant be known in advance.
It may also seem that theres another time when you need to use vector<bool> instead of bitset: when its important to interoperate with STL algorithms. STL algorithms use iterators, and, while vector<bool> provides iterators (weve seen v.begin() and v.end()) in the examples above), bitset does not. You can access individual bits of a bitset with array syntax, but bitset does not have begin and end member functions.
However, you shouldnt let that lack stop you! While bitset doesnt have the STL container interface, its still a perfectly good (fixed-size) container. If it makes sense for you to use a bitset, and if you also need iterators, then you can define a simple index iterator adaptor that translates iterator notation like *i into array notation like b[n].
The implementation is straightforward: maintain an index and a pointer to a container. The details, most of which are boilerplate that we would need for any random access iterator, are shown in Listing 1. We also define some nonmember helper functions, begin and end, which take bitset arguments. (The iterator weve shown in Listing 1 isnt quite as general as it might be: if we were willing to accept a slightly more cumbersome interface, we could define a class that worked with arbitrary array-like types. A general purpose index iterator adaptor is often useful when dealing with pre-STL container classes, and sometimes even when dealing with STL containers like vector.)
Using bitset_iterator, bitset can now be made to interoperate with STL components: you can, for example, copy a bitset to a vector<bool> by writing:
std::bitset<10> b; ... std::vector<bool> b(begin(b), end(b));
If youve looked closely at Listing 1, however, you may have noticed one problem with bitset_iterator: the name is a lie, because bitset_iterator isnt really an iterator! If i is an iterator, then *i is supposed to return a reference to the thing that i points to. Thats not what bitset_iterator does: a const bitset_iterator returns bool, not const bool&, and a modifiable bitset_iterator returns a proxy reference of type bitset<>::reference, not bool&.
Since bits arent individually addressable, this is the best we can do; in fact, vector<bool>::iterator behaves in exactly the same way which, again, means that vector<bool> isnt really an STL container. Its not quite right to say that bitset_iterator or vector<bool>::iterator is an iterator, but, in both cases, theyre close enough to being iterators so that they can be used in many (not all!) places where iterators are expected.
Conclusion
Arrays of Boolean values are common in large programs, and the C++ Standard library provides several ways to represent such arrays. I havent exhausted all of the possibilities: you can use a valarray<bool>, for example; in some cases it might even be appropriate to represent a sparse bit vector as a set<size_t>.
Much of the time, however, the easiest solution is to use std::bitset. If you know at compile time how large your Boolean array is going to be, or if you can at least establish a reasonable upper bound, then bitset will be simpler and more efficient than a complicated dynamic data structure. There are a few annoying quirks in bitsets interface, but, with a few helper functions, theyre easy to work around.
Matt Austern is the author of Generic Programming and the STL and the chair of the C++ standardization committees library working group. He works at AT&T Labs Research and can be contacted at [email protected].