There is more than one way to "string" an integer. Find your favorite here.
All animals are equal, but some animals are more equal than others.
George Orwell, Animal Farm [1]
Consider the following C code that uses sprintf to convert an integer value to a human-readable string representation, perhaps for output on a report or in a GUI window:
// Example 1: // Stringizing some data in C, // using sprintf(). // PrettyFormat() takes an integer, // and formats it into the provided // output buffer. For formatting // purposes, the result must be at // least 4 characters wide. // void PrettyFormat( int i, char* buf ) { // Heres the code, neat and simple: sprintf( buf, "%4d", i ); }
The $64,000 question [2] is: How would you do this kind of thing in C++? Well, all right, thats not quite the question because, after all, Example 1 is valid C++. The true $64,000 question is: Throwing off the shackles of the C90 Standard [3] on which the C++98 Standard [4] is based, if indeed they are shackles, isnt there a superior way to do this in C++ with its classes and templates and so forth?
Thats where the question gets interesting, because Example 1 is the first of no fewer than four direct, distinct, and standard ways to accomplish this task. Each of the four ways offers a different tradeoff among clarity, type safety, run-time safety, and efficiency. Moreover, to paraphrase George Orwells revisionist pigs, All four choices are standard, but some are more standard than others and, to add insult to injury, not all of them are from the same standard. They are, in the order Ill discuss them:
Finally, as though thats not enough, theres a fifth not-yet-standard-but-liable-to-become-standard alternative for simple conversions that dont require special formatting:
- boost::lexical_cast [6]
Enough chat; lets dig in.
Option #1: The Joys and Sorrows of sprintf
The code in Example 1 is just one example of how we might use sprintf. Im going to use Example 1 as a motivating case for discussion, but dont get too tied to this simple PrettyFormat one-liner. Keep in mind the larger picture: were interested in looking at how we would normally choose to format non-string values as strings in the general case, perhaps in code thats more likely to change and grow over time than the simple case in Example 1.
Im going to list the major issues involved by analyzing sprintf in more detail. sprintf has two major advantages and three distinct disadvantages. The two advantages are as follows:
Issue #1: Ease of use and clarity. Once youve learned the commonly used formatting flags and their combinations, using sprintf is succinct and obvious, not convoluted. It says directly and concisely what needs to be said. For this, the printf family is hard to beat in most text formatting work. (True, most of us still have to look up the more rarely used formatting flags, but they are after all used rarely.)
Issue #2: Maximum efficiency (ability to directly use existing buffers). By using sprintf to put the result directly into an already-provided buffer, PrettyFormat gets the job done without needing to perform any dynamic memory allocations or other extra off-to-the-side work. Its given an already-allocated place to put the output and puts the result directly there.
Caveat lector: of course, dont put too much weight on efficiency just yet; your application may well not notice the difference. Never optimize prematurely, but optimize only when timings show that you really need to do so. Write for clarity first, and for speed later if necessary. In this case, never forget that the efficiency comes at the price of memory management encapsulation Issue #2 is phrased here as you get to do your own memory management, but the flip side is you have to do your own memory management!
Alas, as most sprintf users know, the story doesnt end quite there. sprintf also has these significant drawbacks:
Issue #3: Length safety. Using sprintf is a common source of buffer overrun errors if the destination buffer isnt big enough for the whole output [7]. For example, consider this calling code:
char smallBuf[5]; int value = 42; // er, well, sort of okay PrettyFormat( value, buf ); assert( value == 42 );
In the above case, the value 42 happens to be small enough so that the five-byte result " 42\0" happens to fit into smallBuf. But the day the code changes to:
char smallBuf[5]; int value = 12108642; PrettyFormat( value, buf ); // oops // likely to fail assert( value == 12108642 );
well start scribbling past the end of smallBuf, which may be into the bytes of value itself if the compiler chose a memory layout that put value immediately after smallBuf in memory.
We cant easily make Example 1 much safer, though. True, we could change Example 1 to take the length of the buffer and then check sprintfs return value, which will tell after the fact how many bytes sprintf ended up writing. This gives us something like:
// BAD: A not-at-all-improved PrettyFormat(). // void PrettyFormat( int i, char* buf, int buflen ) { // This is no better: if( buflen <= sprintf( buf, "%4d", i ) ) { // uh, what to do? by the time its // detected, weve already corrupted // whatever we were going to corrupt } }
Thats no solution at all. By the time the error is detected, the overrun has already occurred, well already have scribbled on someone elses bytes, and in bad cases our execution may never even get to the error-reporting code [8].
Issue #4: Type safety. For sprintf, type errors are run-time errors, not compile-time errors, and they may not even manifest right away. The printf family uses Cs variable argument lists, and C compilers generally dont check the parameter types for such lists [9]. Nearly every C programmer has had the joy of finding out in subtle and not-so-subtle ways that they got the format specifier wrong, and all too often such errors are found only after a pressure-filled late-night debugging session spent trying to duplicate a mysterious crash reported by a key customer.
Granted, the code in Example 1 is so trivial that its likely easy enough to maintain now when we know were just throwing a single int at sprintf, but even so, its not difficult to go wrong if your finger happens to hit something other than d by mistake. For example, c happens to be right next to d on most keyboards; if wed simply mistyped the sprintf call as:
sprintf( buf, "%4c", i ); // oops
then wed probably see the mistake quite quickly when the output is some character instead of a number, because sprintf will silently reinterpret the first byte of i as a char value. Alternatively, s is also right next to d, and if wed mistyped it as:
sprintf( buf, "%4s", i ); // oops again
then wed probably also catch the error quite quickly because the program is likely to crash immediately or at least intermittently. In this case, sprintf will silently reinterpret the integer as a pointer to char and then happily attempt to follow that pointer into some random region of memory.
But heres a more subtle one: What if wed instead mistyped d as ld?
sprintf( buf, "%4ld", i ); // a subtler error
In this case, the format string is telling sprintf to expect a long int, not just an int, as the first piece of data to be formatted. This too is bad C code; the trouble is that not only wont this be a compile-time error, but it might not even be a run-time error right away. On many popular platforms, the result will still be the same as before. Why? Because on many popular platforms ints happen to have the same size and layout as longs. You may not notice this error until you port the above code to a platform where int isnt the same size as long, and even then it might not always produce incorrect output or immediate crashes.
Finally, consider a related issue.
Issue #5: Templatability. Its very hard to use sprintf in a template. Consider:
template<typename T> void PrettyFormat( T value, char* buf ) { sprintf( buf, "%/*what goes here?*/", value ); }
The best (worst?) you could do is declare the base template and then provide specializations for all the types that are compatible with sprintf:
// BAD: A kludgy templated PrettyFormat(). // template<typename T> void PrettyFormat( T value, char* buf ); // note: base template is not defined template<> void PrettyFormat<int>( int value, char* buf ); { sprintf( buf, "%d", value ); } template<> void PrettyFormat<char>( char value, char* buf ); { sprintf( buf, "%c", value ); } // ... etc., ugh ...
In summary, heres sprintf:
Standard? | Yes | |
C90 | Yes | |
C++98 | Yes | |
C99 | Yes | |
Ease of use and clarity? | Yes | |
Efficient? | Yes | |
Length safe? | No | |
Type safe? | No | |
Usable in template? | No |
The other solutions well consider next choose different tradeoffs among these considerations.
Option #2: snprintf
Of the other choices, sprintfs closest relative is of course snprintf. snprintf only adds one new facility to sprintf, but its an important one: the ability to specify the maximum length of the output buffer, thereby eliminating buffer overruns. Of course, if the buffer is too small, then the output will be truncated.
snprintf has long been a widely available nonstandard extension present on most major C implementations. With the advent of the C99 Standard [5], snprintf has come out and gone legit, now officially sanctioned as a standard facility. Until your own compiler is C99-compliant, though, you may have to use this under a vendor-specific extension name such as _snprintf.
Frankly, you should already have been using snprintf over sprintf anyway, even before snprintf was standard. Calls to length-unchecked functions like sprintf are banned in most good coding standards, and for good reason. The use of unchecked sprintf calls has long been a notoriously common problem causing program crashes in general, and security weaknesses in particular [10].
With snprintf, we can correctly write the length-checked version we were trying to create earlier:
// Example 2: // Stringizing some data in C, // using snprintf(). // void PrettyFormat( int i, char* buf, int buflen ) { // Heres the code, neat and simple // and now a lot safer: snprintf( buf, buflen, "%4d", i ); }
Note that its still possible for the caller to get the buffer length wrong. That means snprintf still isnt as 100-percent bulletproof for overflow-safety as the later alternatives that encapsulate their own resource management, but its certainly lots safer and deserves a Yes under the Length safe? question. With sprintf, we have no good way to avoid for certain the possibility of buffer overflow; with snprintf we can ensure it doesnt happen.
In every other way, sprintf and snprintf are the same. In summary, heres how snprintf compares to sprintf:
snprintf | sprintf | ||
Standard? | |||
C90 | No (only a common extension) | Yes | |
C++98 | No (only a common extension) | Yes | |
C99 | Yes | Yes | |
C++0x (speculation) | Likely | Yes | |
Ease of use and clarity? | Yes | Yes | |
Efficient? | Yes | Yes | |
Length safe? | Yes | No | |
Type safe? | No | No | |
Usable in template? | No | No |
Guideline: Never use sprintf. If you decide to use C stdio facilities, always use length-checked calls like snprintf even if theyre only available as a nonstandard extension on your current compiler. Theres no drawback, and theres real benefit, to using snprintf instead.
When I presented this material as part of a talk at Software Development East in Boston this summer, I was shocked to discover that only about 10 percent of the class had heard of snprintf. However, one of those who had immediately put up his hand described how, on his current project, theyd recently discovered a few buffer-overrun bugs. They globally replaced sprintf with snprintf throughout the project and found during testing that not only were those bugs gone, but suddenly several other mysterious bugs had also disappeared bugs that had been reported for years, but that the team hadnt been able to diagnose. As I was saying: never use sprintf.
Option #3: std::stringstream
The most common facility in C++ for stringizing data is the stringstream family. Heres what Example 1 would look like using an ostringstream instead of sprintf:
// Example 3: // Stringizing some data in C++, // using ostringstream. // void PrettyFormat( int i, string& s ) { // Not quite as neat and simple: ostringstream temp; temp << setw(4) << i; s = temp.str(); }
Using stringstream exchanges the advantages and disadvantages of sprintf. Where sprintf shines, stringstream does less well:
Issue #1: Ease of use and clarity. Not only has one line of code turned into three, but weve needed to introduce a temporary variable. This version of the code is superior in several ways, but code clarity isnt one of them. Its not that the manipulators are hard to learn theyre as easy to learn as the sprintf formatting flags but that theyre generally more clumsy and verbose. I find that code littered with long names like << setprecision(9) and << setw(14) is a bear to read (compared to, say, %14.9), even when all of the manipulators are arranged reasonably well in columns.
Issue #2: Efficiency (ability to directly use existing buffers). A stringstream does its work in an additional buffer off to the side, so it will usually have to perform extra allocations for that working buffer and for any other helper objects it uses. I tried the Example 3 code on two popular current compilers and instrumented ::operator new to count the allocations being performed. One platform performed two dynamic memory allocations, and the other performed three.
Where sprintf breaks down, however, stringstream glitters:
Issue #3: Length safety. The stringstreams internal basic_stringbuf buffer automatically grows as needed to fit the value being stored.
Issue #4: Type safety. Using operator<< and overload resolution always gets the types right, even for user-defined types that provide their own stream insertion operators. No more obscure run-time errors because of type mismatches.
Issue #5: Templatability. Now that the right operator<< is automatically called, its trivial to generalize PrettyFormat to operate on arbitrary data types:
template<typename T> void PrettyFormat( T value, string& s ) { ostringstream temp; temp << setw(4) << value; s = temp.str(); }
In summary, heres how stringstream compares to sprintf:
stringstream | sprintf | ||
Standard? | |||
C90 | | Yes | |
C++98 | Yes | Yes | |
C99 | | Yes | |
C++0x (speculation) | Yes | Yes | |
Ease of use and clarity? | No | Yes | |
Efficient? | No | Yes | |
Length safe? | Yes | No | |
Type safe? | Yes | No | |
Usable in template? | Yes | No |
Option #4: std::strstream
Fairly or not, strstream is something of a persecuted pariah. Because it has been deprecated in the C++98 Standard, the top C++ books at best cover it briefly [11], mostly ignore it [12], or even explicitly state they wont cover it because of its official second-string status [13]. Although deprecated because the standards committee felt it was superseded by stringstream, which better encapsulates memory management, strstream is still an official part of the standard that conforming C++ implementers must provide [14].
Because strstream is still standard, it deserves mention here for completeness. It also happens to provide a useful mix of strengths. Heres what Example 1 might look like using strstream:
// Example 4: // Stringizing some data in C++, // using ostrstream. // void PrettyFormat( int i, char* buf, int buflen ) { // Not too bad, just dont forget ends: ostrstream temp( buf, buflen ); temp << setw(4) << i << ends; }
Issue #1: Ease of use and clarity. strstream comes in slightly behind stringstream when it comes to ease of use and code clarity. Both require a temporary object to be constructed. With strstream, you have to remember to tack on an ends to terminate the string, which I dislike. If you forget to do this, then you are in danger of overrunning the end of the buffer when reading it afterwards if youre relying on its being terminated by a null character; even sprintf isnt this fragile and always tacks on the null. But at least using strstream in the manner shown in Example 4 doesnt require calling a str function to extract the result at the end. (Of course, alternatively, if you let strstream create its own buffer, the memory is only partly encapsulated; you will need not only a str call at the end to get the result out, but also a .freeze(false) else the strstreambuf wont free the memory.)
Issue #2: Efficiency (ability to directly use existing buffers). By constructing the ostrstream object with a pointer to an existing buffer, no extra allocations at all need be performed; the ostrstream will store its result directly in the output buffer. This is an important divergence from stringstream, which offers no comparable facility for placing the result directly in an existing destination buffer and thereby avoid extra allocation [15]. Of course, ostrstream can alternatively use its own dynamically allocated buffer if you dont have one handy already; just use ostrstreams default constructor instead [16]. Indeed, strstream is the only option covered here that gives you this choice.
Issue #3: Length safety. As used in Example 4, the ostrstreams internal strstreambuf buffer automatically checks its length to make sure it doesnt write beyond the end of the supplied buffer. If instead we had used a default-constructed ostrstream, its internal strstreambuf buffer would automatically grow as needed to fit the value being stored.
Issue #4: Type safety. Fully type-safe, just like stringstream.
Issue #5: Templatability. Fully templatable, just like stringstream. For example:
template<typename T> void PrettyFormat( T value, char* buf, int buflen ) { ostrstream temp( buf, buflen ); temp << setw(4) << value << ends; }
In summary, heres how strstream compares to sprintf:
strstream | sprintf | ||
Standard? | |||
C90 | | Yes | |
C++98 | Yes, but deprecated | Yes | |
C99 | | Yes | |
C++0x (speculation) | Possible, probably still deprecated | Yes | |
Ease of use and clarity? | No | Yes | |
Efficient? | Yes | Yes | |
Length safe? | Yes | No | |
Type safe? | Yes | No | |
Usable in template? | Yes | No |
Its, um, slightly embarrassing that the deprecated facility shows so strongly in this side-by-side comparison, but thats life.
Option #5: boost::lexical_cast
If you havent yet discovered Boost at <www.boost.org>, my advice is to discover it. Its a public library of C++ facilities thats written principally by C++ standards committee members. Not only is it good peer-reviewed code written by experts and in the style of the C++ Standard library, these facilities are explicitly intended as candidates for inclusion in the next C++ Standard and are therefore worth getting to know. Besides, you can freely use them today.
One of the facilities provided in the Boost libraries is boost::lexical_cast, which is a handy wrapper around stringstream. Indeed, Kevlin Henneys code is so concise and elegant that I can present it here in its entirety (minus workarounds for older compilers):
template<typename Target, typename Source> Target lexical_cast(Source arg) { std::stringstream interpreter; Target result; if(!(interpreter << arg) || !(interpreter >> result) || !(interpreter >> std::ws).eof()) throw bad_lexical_cast(); return result; }
Note that lexical_cast is not intended to be a direct competitor for the more general string formatter sprintf. Instead, lexical_cast is for converting data from one streamable type to another, and it competes more directly with Cs atoi et al. conversion functions as well as with the nonstandard but commonly available itoa et al. functions. Its close enough, however, that it definitely would be an omission not to mention it here.
Heres what Example 1 would look like using lexical_cast, minus the at-least-four-character requirement:
// Example 5: // Stringizing some data in C++, // using boost::lexical_cast. // void PrettyFormat( int i, string& s ) { // Perhaps the neatest and simplest // yet, if its all you need: s = lexical_cast<string>( i ); }
Issue #1: Ease of use and clarity. This code embodies the most direct expression of intent of any of these examples.
Issue #2: Efficiency (ability to directly use existing buffers). Since lexical_cast uses stringstream, its no surprise that it needs at least as many allocations as stringstream. On one of the platforms I tried, Example 5 performed one more allocation than the plain stringstream version presented in Example 3; on the other platform, it performed no additional allocations over the plain stringstream version.
Like stringstream, in terms of length safety, type safety, and templatability, lexical_cast shows very strongly.
In summary, heres how lexical_cast compares to sprintf:
lexical_cast | sprintf | ||
Standard? | |||
C90 | | Yes | |
C++98 | No | Yes | |
C99 | | Yes | |
C++0x (speculation) | Possible | Yes | |
Ease of use and clarity? | Yes | Yes | |
Efficient? | No | Yes | |
Length safe? | Yes | No | |
Type safe? | Yes | No | |
Usable in template? | Yes | No |
Summary
There are issues weve not considered in detail. For example, all the string formatting herein has been to normal narrow char-based strings, not wide strings. Weve also focused on the ability to gain efficiency by using existing buffers directly in the case of sprintf, snprintf, and strstream. However, the flip side to you get to do your own memory management is you have to do your own memory management, and the better encapsulation of memory management offered by stringstream, strstream, and lexical_cast may matter to you. (No typo, strstream is in both lists; it depends how you want to use it.)
Putting it all together, we get the side-by-side comparison summarized in Table 1. Given the considerations were using to judge the relative merits of each solution, there is no clear unique one-size-fits-all winner for all situations.
From Table 1, Ill extract the following guidelines, also summarized in Table 2:
- If all youre doing is converting a value to a string (or, for that matter, to anything else!): prefer using boost::lexical_cast by default.
- For simple formatting, or where you need wide string support or templatability: prefer using stringstream or strstream; the code will be more verbose and harder to grasp than it would be with snprintf, but for simple formatting it wont be too bad.
- For more complex formatting, and where you dont need wide string support or templatability: prefer using snprintf. Just because its C doesnt mean its off limits to C++ programmers!
- Only if actual performance measurements show that any of the above is really a bottleneck at a specific point in your code: in those isolated cases only, instead of the above consider using whichever one of the faster alternatives strstream or snprintf makes sense.
- Never use sprintf.
Finally, a last word about the pariah strstream: it offers an interesting combination of features, not the least of which being that its the only option that allows you to choose whether to do your own memory management or to let the object encapsulate it. Its lone technical drawback is that it is somewhat fragile to use because of the ends issue and the memory management approach; its only other drawback is social stigma because its been shunted aside and doesnt get invited to parties much any more. You should be aware that theres a slight possibility that both the standards committee and your compiler/library vendor may really take it away from you at some time in the future.
Its a bit strange to see a deprecated feature showing so well. Although a particular animal may have distinct merits, even in the standard some animals are more equal than others...
Acknowledgments
Thanks to Jim Hyslop and the participants in the ACCU discussion thread about sprintf that got me thinking about this topic, to Martin Sebor at Rogue Wave Software for the non-Windows timing results, and to Bjarne Stroustrup, Scott Meyers, Kevlin Henney, and Chuck Allison for their comments on drafts of this article.
References
[1] George Orwell. Animal Farm (Signet Classic, 1996).
[2] Remember that, when the phrase was coined, $64,000 was enough to buy several houses and/or retire.
[3] ISO/IEC 9899:1990(E), Programming Languages - C (ISO C90 and ANSI C89 Standard).
[4] ISO/IEC 14882:1998(E), Programming Languages - C++ (ISO and ANSI C++ Standard).
[5] ISO/IEC 9899:1999(E), Programming Languages - C (ISO and ANSI C99 Standard).
[6] Kevlin Henney. C++ BOOST lexical_cast, <www.boost.org/libs/conversion/lexical_cast.htm>.
[7] A common beginners error is to rely on the width specifier, here 4, which doesnt work because the width specifier dictates a minimum width, not a maximum width.
[8] Note that in some cases you can mitigate the buffer length problem by creating your own formats at run time. As Bjarne Stroustrup puts it in [17], speaking of a similar case:
The expert-level alternative is not one Id care to explain to novices:
char fmt[10]; // create a format string: plain %s can overflow sprintf(fmt,"%%%ds",max-1); // read at most max-1 characters into name scanf(fmt,name);
[9] Using lint-like tools will help to catch this kind of error.
[10] For example, for years it was fashionable for malicious web servers to crash web browsers by sending them very long URLs that were likely to be longer than the web browsers internal URL string buffer. Browsers that didnt check the length before copying into the fixed-length buffer ended up writing past the end of the buffer, usually overwriting data, but in some cases overwriting code areas with malicious code that could then be executed. Its surprising just how much software out there was, and is, using unchecked calls.
[11] Nicolai Josuttis. The C++ Standard Library (Addison-Wesley, 1999), page 649.
[Later note: timing sure is weird sometimes, and everything old is new again. I wrote the above in mid-August, when the Code Red II worm was all over the press, but only later did I notice Microsofts description of the security vulnerability Code Red II exploited: A security vulnerability results because idq.dll contains an unchecked buffer in a section of code that handles input URLs. An attacker who could establish a web session with a server on which idq.dll is installed could conduct a buffer overrun attack and execute code on the web server.? Havent we learned this lesson yet? (<www.microsoft.com/technet/treeview/default.asp?url=/technet/security/bulletin/MS01-033.asp>)]
[12] Bjarne Stroustrup. The C++ Programming Language, Special Edition (Addison-Wesley, 2000), page 656.
[13] Angelika Langer and Klaus Kreft. Standard C++ IOStreams and Locales (Addison-Wesley, 2000), page 587.
[14] What does deprecated mean, in theory and in practice? When it comes to standards, deprecated denotes a feature that the committee warns you may disappear anytime in the future, possibly as soon as the next revision of the Standard. To deprecate a feature amounts to normative discouragement its the strongest thing the committee can do to discourage you from using a feature without actually taking the feature away from you immediately. In practice, its hard to remove even the worst deprecated features because, once the feature appears in a standard, people write code that depends on the feature and every standards body is loath to break backward compatibility. Even when a feature is removed, implementers often continue to supply it because they, too, are loath to break backward compatibility. Oftentimes, deprecated features never do disappear from the Standard. Standard Fortran, for example, still has features that have been deprecated for decades.
[15] stringstream does offer a constructor that takes a string&, but it simply takes a copy of the strings contents instead of directly using the supplied string as its work area.
[16] In Table 1s performance measurements, strstream shows unexpectedly poorly on two platforms, BC++ 5.5.1 and VC7 beta. The reason appears to be that on those implementations for some reason some allocations are always performed on each call to Example 4s PrettyFormat (although both implementations still actually do perform fewer allocations when given an existing buffer to work with as is done in Example 4, than when the strstream has to make its own buffer). The other environments, as expected, perform no allocations.
[17] Bjarne Stroustrup. Learning Standard C++ as a New Language, C/C++ Users Journal, May 1999.
Herb Sutter (<www.gotw.ca>) is secretary of the ISO/ANSI C++ standards committee and author of the acclaimed books Exceptional C++ and More Exceptional C++ (available summer 2001). Herb is also one of the featured instructors of The C++ Seminar (<www.gotw.ca/cpp_seminar>).