For simple things, C++ I/O is simple. To send a value to an output stream os, you write
out << "The value is" << x << std::endl;
To read a value, you write
in >> x;
I havent said what xs type is, and I havent said whether in and out are standard input and standard output or whether they connect to files. Thats intentional; this code works the same way regardless of those details.
If your I/O needs are modest, you may never need to know much more than that. Once you know about the standard streams cin and cout, the >> operator for input, the << operator for output, and the endl manipulator to terminate a line, youre most of the way there. With a few more details getline for line-oriented input, the width and precision and setbase manipulators to control how numbers get displayed, and the ifstream and ofstream classes to connect to files you have something that begins to look like a complete I/O library. At this level, its easy to fit a description of C++ I/O into less than a page.
Of course, the C++ Standard takes many pages to describe the standard I/O library; sometimes you need to know more than the capsule summary in the last paragraph. For example, suppose you arent working with high-level types like strings and integers, but with individual characters. If you were working in C, you would read a character with getc or fgetc. In C++ your first thought might be that you should read a character by writing
char c; in >> c;,
but thats subtly wrong or maybe not so subtly. Its the moral equivalent of scanf("%c", &c), not of getc. The >> operator performs formatted input, so, if the input stream contains the characters "a b c", youll never see the spaces. (Similarly, youll never see tabs or newlines.) Formatted input skips whitespace.
If formatted input is the wrong choice, maybe you should try unformatted input: c = in.get();. Thats not exactly wrong, but if youre expecting reasonable performance youll probably be disappointed. It certainly isnt the equivalent of getc! While getc is a tiny function that does a little bit of pointer manipulation (it might even be a macro, rather than a function), istream::get is quite complicated in the implementations I have looked at, it takes several dozen lines of code. If you write (say) a file-copying routine in terms of istream::get, you can expect it to be dreadfully slow. As a general rule, get does not belong in a tight loop.
But now theres a problem. If you shouldnt use formatted input and you shouldnt use unformatted input, then whats left? The istream class doesnt have any member functions that are any simpler than get.
If performance is important, you have to go beyond istream and ostream and learn a bit more about how the C++ I/O library is put together. The lower level functions that get and getline and operator>> are built on top of arent part of istream, but part of streambuf. The closest equivalent of getc in the C++ library is a somewhat formidable looking construct: in.rdbuf()->sbumpc().
Streambufs
The C++ I/O library is quite complicated, but the general ideas behind its architecture are simple. Formatting decisions are made by locale facets, and character buffering, and transport of characters to and from their ultimate source or destination, is performed by stream buffers. The stream classes themselves, istream and ostream, are surprisingly unimportant. Theyre wrappers that tie locales to stream buffers; they contain user formatting flags and some rudimentary state information; and they provide a convenient syntax (the << and >> operators) for simple kinds of I/O.
The most innovative aspect of the C++ I/O library is that data formatting has been decoupled from character manipulation; understanding that decoupling is crucial for anything but basic use of the library.
A stream buffer class is a class that inherits from std::streambuf. It understands only one data type: the character [1]. Interpreting those characters is someone elses job. For example, a streambuf might tell you that the next three characters are '7', 'F', and ' '. If you want to interpret those characters as the number 127 terminated by whitespace, you have to call another function. The stream classes do that automatically: when you write
int n; in >> n;,
the istream gets characters from a stream buffer and passes those characters to a locale facet so that they can be interpreted as an int.
Every stream buffer has the same interface, but manages a different kind of I/O or buffering. Thus std::istream contains a pointer of type streambuf* (you write in.rdbuf() to obtain the pointer), but, as usual with pointers to base classes, istream doesnt have to know the exact type of the object that it points to. For example, the rdbuf pointer might point to a std::filebuf, a stream buffer for file I/O; or to a std::stringbuf, a stream buffer that manages characters in a string; or to a user-defined networking stream buffer.
This may seem complicated and inefficient. If were using an interface thats based on reading and writing single characters, and if were accessing that interface by a pointer to a polymorphic base class, does this mean we have to call a virtual function for every character? Fortunately, no. While std::streambuf is a base class, its not an abstract base class: its functionality is cleverly divided between virtual and nonvirtual functions, so that streambufs interface is simple (much simpler than the words in the Standard make it appear!) and also efficient.
If youve ever used the low-level functions in the C I/O library, the stream buffer interface should look familiar to you: youll be working with member functions instead of global functions, and the names are slightly different, but the ideas are the same. If p is a streambuf*, then p->sputc(c) writes the single character c; it returns EOF if the write fails, and something else if it succeeds. Input is slightly more complicated, because there are more choices: p->sbumpc() returns the current character (again, it returns EOF if the read fails) and moves to the next read position, p->sgetc() returns the current character without moving to the next read position, and p->snextc() increments the read position and returns the next character. Finally, p->sputbackc(c) "unreads" a character and pushes it back onto the input.
The same type, streambuf, is used both for reading and writing. If a stream buffer is read-only, then writes will always fail; if a stream buffer is write-only, then reads will always fail.
The key point is that all of those functions are non-virtual in fact, theyre probably just a line or two long and theyre probably declared to be inline. The read functions get characters from an internal buffer declared in the streambuf base class; only when all the characters in the buffer have been consumed does streambuf need to invoke any virtual functions. Similarly, sputc puts character in an internal buffer and invokes the appropriate virtual function whenever that buffer needs to be flushed. We thus have both flexibility and efficiency.
All of streambufs virtual member functions are protected. You need to know about them if youre planning to write your own stream buffer class, but not if all youre planning to do is use preexisting stream buffers Im not even going to mention those virtual functions names in this column! To use stream buffers, you just need to know about the member functions Ive already mentioned: sputc, sbumpc, sgetc, snextc, and sputbackc. For input, you have the choice of reading a character with sbumpc or else reading it with sgetc and then moving to the next with snextc. Which version you use is partly a matter of taste; I tend to find sbumpc more convenient, but the differences are small.
Are there any gotchas to watch out for when youre mixing low-level streambuf I/O with the high-level stream functions? Not really. You dont have to worry about buffers or positioning information getting out of sync, because streams have no such information on their own. The biggest issues deal with error reporting, and even those issues are smaller than they might appear. You might worry, for example, that if you work directly with a streams underlying buffer, errors wont be reported back to the stream. Here there is a real concern streams do keep track of error state but its a minor one. Suppose that in is an istream, and suppose that you keep calling in.rdbuf()->sbumpc() until you encounter EOF. Its certainly true that in wont have its end-of-file marker set, but that doesnt really matter: itll get set anyway the next time you try reading from in again.
A slightly more serious concern is that the virtual functions in a class thats derived from streambuf might throw exceptions; perhaps, for example, a user-defined class for network I/O might throw an exception when the underlying connection has been lost. High-level istream and ostream functions will catch those exceptions and translate them into an error state within the stream (thats one of the reasons that seemingly innocent functions like istream::get are so complicated), but if youre working with stream buffers directly theres nobody to catch exceptions for you. If you work with stream buffers, you should make sure that your code is exception safe.
Streambuf Iterators
I havent yet presented any code samples that use streambufs. One reason is that theres very little new to say: you can use sputc and sbumpc in just the same way as you use putc and getc. Another reason, however, is that I havent yet described one last library component that, in many real cases, is the easiest way of working with streambufs.
If youre reading a character, youre probably going to read more than one character: character input is mostly important in loops. But a loop where you read one value after another, doing things with each of those values, is a pattern thats dealt with elsewhere in the C++ Standard library: its just what iterators are for. Accordingly, the standard library defines the types istreambuf_iterator and ostreambuf_iterator, which use streambufs to read and write characters. If i is an istreambuf_iterator, then *i returns the current character (just like sgetc), and ++i moves to the next character (just like sbumpc). Upon reaching end of file, an istreambuf_iterator becomes equal to a special end-of-stream iterator that you create with the default constructor. So, for example, you can process all of the characters that you read from a stream by writing a loop that looks something like this:
std::streambuf* p = in.rdbuf(); std::istreambuf_iterator<char> i(p); std::istreambuf_iterator<char> eos; while (i != eos) { char c = *i; // Do something with c }
(Actually, this is just slightly more verbose than necessary. We could pass in to istreambuf_iterator directly; theres a constructor that will call in.rdbuf() for us.)
But, of course, the real value of iterators isnt that you can use them in loops; its that you can combine them with generic algorithms that operate on arbitrary iterator types. You can pass an istreambuf_iterator to any algorithm that accepts input iterators, and you can pass an ostreambuf_iterator to any algorithm that accepts output iterators. If youre lucky, you may not have to write any loops on your own: someone may already have written a generic algorithm that does what you want. Even if youre not quite that lucky, you can write your own generic algorithm, using the iterator formalism to separate out the data processing from the mechanics of I/O.
Combining streambuf iterators with other parts of the standard library makes it possible to do some surprisingly sophisticated things in just a few lines. For example, suppose that you need to read the entire contents of a file into memory. If you dont know the length of the file ahead of time, this can be messy: you have to use dynamic allocation, but you cant know how much memory to allocate until after youve read everything. The best strategy is to allocate a buffer of some arbitrary size, and then expand its size when necessary. But we shouldnt have to do that explicitly: vector can handle the tedium of memory management for us. We also dont need to write a loop to read characters: in terms of iterators, the characters we want begin with an iterator that points to the beginning of the file and continue up until the end-of-stream iterator. This is a range of iterators, and vector has a constructor that takes a range of iterators [2]. The code is shorter than the explanation:
std::ifstream in(fname); std::istreambuf_iterator<char> i(in); std::istreambuf_iterator<char> eos; std::vector<char> v(i, eos);
Or, if the vector v already exists and you want to append the contents of a file to it, thats equally easy: just replace the last line with
v.insert(v.end(), i, eos);
or with
std::copy(i, eos, std::back_inserter(v));
Similarly, its easy to copy one file to another, performing character-level transformations along the way: just combine istreambuf_iterator, ostreambuf_iterator, and the appropriate generic algorithm. This snippet, for example, creates a copy of a file in which every space character is replaced with a newline:
std::ifstream in_file(in_fname); std::ofstream out_file(out_fname); std::istreambuf_iterator<char> in(in_file); std::istreambuf_iterator<char> eos; std::ostreambuf_iterator<char> out(out_file); std::replace_copy(in, eos, out, ' ', '\n');
Conclusions
The C++ Standard library segregates high-level and low-level I/O operations into different classes: high-level operations in the stream classes, low-level operations in streambuf and the classes that inherit from it. The low-level operations have sometimes had a tendency to get lost, since its easy to think of streambuf as just an implementation detail. Thats unfortunate: sputc and sbumpc are no more obscure or complicated than the C library functions putc and getc are. If you would ever consider writing part of your program in terms of putc and getc, you should also consider writing it in terms of stream buffers and streambuf iterators.
You should use stream classes, with the << and >> operator, when youre working with high-level data types. You should use stream buffers, either directly or through streambuf iterators, when youre performing character-by-character I/O and when performance matters. Discussions of C++ I/O rightly begin with istream and ostream, but ought not to end there.
Notes
[1] A character isnt necessarily the same as char. The C++ I/O library is templatized; streambuf isnt a class, but just a typedef for basic_streambuf<char, char_traits<char> >. The library also provides wstreambuf, which is a typedef for basic_streambuf<wchar_t, char_traits<wchar_t> >, and in principle you can use basic_streambuf for your own character type. For the purposes of this column, parameterization by character type doesnt matter.
[2] This constructor uses member templates. Its part of the C++ Standard, but you may find that member templates are poorly supported on some older compilers.
Matt Austern is the author of Generic Programming and the STL and the chair of the C++ standardization committees library working group. He works at AT&T Labs Research and can be contacted at [email protected].