A New String Class
In creating a new string class, I wanted to be able to:
- Avoid heap allocation and copying completely; just to set an internal pointer to some C string.
- Avoid heap allocation for the nonheap variables; just copy the value to the buffer inside the object.
- Force heap allocation and copying for any string object (as with a
std::string
implementation, minus reference counting/COW).
Since it supported the last two of these three requirements, I used SString
(see Thinking In C++, by Bruce Eckel; Prentice-Hall, 1995, ISBN 0-13-917709-4) as the basis for my new string class. I then implemented the fastest and most important mode, which doesn't do any allocation or copying. It also occurred to me that the string class features should be selectable by programmers who instantiate the string objects, not the string implementation. Consequently, it's up to users of this class (see Listing Three) to instantiate particular string objects in particular modes. The complete implementation of this new CGenStringT<>
class (again, Eckel's SString
was used as a basis) is in util.h (also available electronically in the source code file listed at the top of this article).
Listing Three
template class CGenStringT { public: CGenStringT ( const char *S = "", int bAlloc = 1 ) : m_s(m_buf) { if ( !bufsize ) ... private: char *m_s; int m_bAlloc; char m_buf[bufsize+1]; };
The mode of the string being created in Listing Three is set by the combination of two parameters: template parameter bufsize,
and constructor parameter bAlloc. CGenStringT
is a template with one template parameter, bufsize
, that governs the size of the internal buffer. In addition to specifying the size of the internal buffer, this parameter defines the mode of the string in conjunction with a bAlloc
constructor's parameter. The possible combinations are:
- Mode 1:
bufsize ==
0 &&bAlloc
== 0. No heap allocation, no copying, just setting the pointer (fastest). - Mode 2:
bufsize >
0. No heap allocation, copying into the internal buffer (slower). - Mode 3:
bufsize ==
0 &&bAlloc ==
1. Heap allocation and copying (slowest).
Once you understand the idea, the implementation is straightforward: Constructor, copy constructor, assignment operator, and destructor are doing the right thing for the three different modes (like allocation, copying, deleting), or none of them. For example, the copy constructor in Listing Four is a good example of handling all three modes (only ASCII null-terminated strings are currently supported).
Listing Four
... ... CGenStringT ( const CGenStringT& rv ): m_s(m_buf) { if ( !bufsize ) { if ( m_bAlloc = rv.m_bAlloc ) { // make on heap m_s = new char [ strlen(rv.m_s)+1 ]; strcpy ( m_s, rv.m_s ); } else // just set a pointer m_s = rv.m_s; } else { // make on stack m_buf[bufsize] = 0; strncpy ( m_s, rv.m_s, bufsize ); } #ifdef MEM_ALLOC_DEBUG printf ("CGenString copy constructor called: this=%x allocation=%s" " ptr=%x\n", this, (!bufsize && m_bAlloc ) ? "yes":"no", m_s ); #endif } ...
Right after the CGenStringT
class definition there's a bunch of useful typedef
s and a define
:
typedef CGenStringT<0> CGenString;
typedef CGenStringT<100> CStackString100;
#define FAST_STRING(id,str) CGenString id(str,0);
CGenString
is a Mode 3 string (allocation and copying), CStackString100
is a Mode 2 string (copying) with an internal buffer of 100 bytes, and FAST_STRING defines a Mode 1 string variable ID (no alloc
, no copying) and sets its value to str
.
The "stack" word is being used in comments throughout the CGenStringT
implementation and in CStackString100
to denounce Mode 2 strings (with an internal buffer). In general, this term is incorrect, because the internal buffer is on the stack only when the string variable itself is a stack variable. If this variable is created on the heap (using new()
), then the buffer is on the heap as well. So, in general, the buffer as a part of the string object is always allocated in the same class of memory as the string object that owns it.
Performance Testing
To gauge performance, I start by testing the slowest mode (forced heap allocation and copying) to see if the removal of reference counting helps. Teststring3.cpp (available electronically) tests one thread, while teststring4.cpp (also available electronically) tests 10 threads. They are the same, respectively, as teststring1.cpp and teststring2.cpp, except that std::string
is replaced with CGenString
.
Example 3(a) shows the result of running teststring3, and Example 3(b) the results for teststring4. Compared to std::string
, this isn't bad at all for the slowest versionit's 45 percent faster with one thread (59 versus 108) and it's more or less 100(!) times faster with 10 threads running (~100 versus ~10000). And since the heap is still being hit, it looks like std::string
reference counting hurts big time!
(a) Thread <1024>: Creating/destroying 100000 strings... Thread <1024>: total time=59 millisecs, average=0.000590 (b) Thread <1026>: Creating/destroying 100000 strings... Thread <9226>: Creating/destroying 100000 strings... Thread <9226>: total time=60 millisecs, average=0.000600 Thread <4101>: Creating/destroying 100000 strings... Thread <5126>: Creating/destroying 100000 strings... Thread <5126>: total time=65 millisecs, average=0.000650 Thread <4101>: total time=125 millisecs, average=0.001250 Thread <6151>: Creating/destroying 100000 strings... Thread <7176>: Creating/destroying 100000 strings... Thread <8201>: Creating/destroying 100000 strings... Thread <10251>: Creating/destroying 100000 strings... Thread <6151>: total time=130 millisecs, average=0.001300 Thread <1026>: total time=81 millisecs, average=0.000810 Thread <10251>: total time=94 millisecs, average=0.000940 Thread <7176>: total time=166 millisecs, average=0.001660 Thread <8201>: total time=195 millisecs, average=0.001950 Thread <3076>: Creating/destroying 100000 strings... Thread <3076>: total time=60 millisecs, average=0.000600 Thread <2051>: Creating/destroying 100000 strings... Thread <2051>: total time=60 millisecs, average=0.000600
Example 3: (a) The results of running teststring3.cpp; (b) results of running teststring4.cpp.
Moving to the faster Mode 2 strings (using automatic string variables so a buffer for every string is on the stack: no heap allocations, just copying), teststring5.cpp and teststring6.cpp (both available electronically) are the same tests as beforebut using CStackString100
.
Example 4(a) shows the results of running teststring5, while Example 4(b) shows the results of running teststring6. By eliminating the need to allocate on the heap, you get significantly faster performance: for one thread, 23 versus 59 (Mode 3) versus 108 (std::string
), and for 10 threads, ~40 versus 100 (Mode 3) versus 10,000 (std::string
).
(a) Thread <1024>: Creating/destroying 100000 strings... Thread <1024>: total time=23 millisecs, average=0.000230 (b) Thread <1026>: Creating/destroying 100000 strings... Thread <1026>: total time=23 millisecs, average=0.000230 Thread <3076>: Creating/destroying 100000 strings... Thread <4101>: Creating/destroying 100000 strings... Thread <3076>: total time=35 millisecs, average=0.000350 Thread <5126>: Creating/destroying 100000 strings... Thread <5126>: total time=23 millisecs, average=0.000230 Thread <4101>: total time=71 millisecs, average=0.000710 Thread <6151>: Creating/destroying 100000 strings... Thread <6151>: total time=24 millisecs, average=0.000240 Thread <7176>: Creating/destroying 100000 strings... Thread <8201>: Creating/destroying 100000 strings... Thread <9226>: Creating/destroying 100000 strings... Thread <9226>: total time=23 millisecs, average=0.000230 Thread <10251>: Creating/destroying 100000 strings... Thread <10251>: total time=23 millisecs, average=0.000230 Thread <7176>: total time=80 millisecs, average=0.000800 Thread <8201>: total time=80 millisecs, average=0.000800 Thread <2051>: Creating/destroying 100000 strings... Thread <2051>: total time=23 millisecs, average=0.000230
Example 4: (a) The results of running teststring5.cpp; (b) results of teststring6.cpp.