C/C++

C++ String Performance

By Lev Kochubeevsky, October 01, 2003

Lev analyzes C++ string implementations, then presents a string class with the speed and efficiency of the automatic C strings, plus the convenience of the C++ string objects that are STL friendly.

It's easy to imagine the reaction of seasoned C++ programmers upon seeing yet another string class implementation: "So why do we need another string class? We had too many in the pre-standard C++ Library world, and now that we finally have std::string, why not just use it?"

I agree, but only if the standard implementation doesn't introduce significant performance problems. In this article, I show how the standard string implementation isn't always good enough, particularly for multithreaded applications that create lots of temporary strings. That said, it is worth noting that there's no such thing as a standard string implementation—they all differ. (For examples of the different implementations, see item 15 in Scott Meyers's Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library, Addison-Wesley, 2001.) Here, I analyze one of the more popular string implementations—the one that comes with g++ 2.96 on Red Hat Linux 7.x. To this end, I create a number of multithreaded tests showing the Standard C++ string implementation (which comes with g++ 2.96 on Red Hat Linux 7.3) can have real performance problems for multithreaded applications that need to create lots of unrelated temporary string objects. To address these performance problems, I present a string class that gives you the speed and efficiency of automatic C strings, plus the convenience of C++ string objects that are STL friendly.

`std::string` Class Design

All string implementations—including g++ 2.96—usually share the same design principles:

Memory for the string value is always allocated on the heap. (Okay, Scott Meyers describes one implementation that doesn't allocate memory for the very short strings.) Think about it. You have a string object on the stack, but there is always a heap allocation for the value of the string unless it is initialized through a copy constructor and it's reference counted (which generates other problems).
String values are reference counted; so if you assign string s1 to s2, their internal pointers point to the same location in memory. It's apparently done to make copy constructors and assignment operators very fast. They don't need to allocate memory and copy the value, just set the pointer and increment a reference counter. The same logic makes a destructor very fast. In most cases, it just decrements the reference counter.
Closely related to reference counting is copy-on-write (COW) optimization that allocates/copies when one of the string objects pointing to the same value changes.

The first item is usually a bad idea when you have lots of temporary strings as automatic variables in a multithreaded application. The heap is shared between threads, so some locking is inevitable; otherwise, the heap's internal data is corrupted. This approach defeats the idea of the fast non-blocking creation/access of the stack variables in multithreaded applications (every thread has its own nonshared stack). It looks deceptive to programmers, too. For instance, in Example 1, mystring is an automatic (stack) variable with no indication that memory for "Some C value" is allocated on the heap (unfortunately, that is exactly what happens).


int myfunc()
{
  std::string mystring ( "Some C value" );
  ...
}

Example 1: This defeats the notion of fast, nonblocking creation/access of stack variables in multithreaded applications.

Likewise, the second and third items may sound good until you realize they're a bad fit for multithreaded applications. Value and reference counters are shared between multiple string variables and potentially between multiple threads that absolutely require access serialization (locking) to the reference counter. Atomic increment/decrement/check can be used if these operations are supported by the CPU's command set; see "Optimizations that Aren't (In a Multithreaded World)," by Herb Sutter, C/C++ Users Journal, June 1999.

Listing One (teststring1.cpp) creates 100,000 temporary std::string objects on the stack (most of the memory is allocated on the heap). It has only one thread running, but was built as a multithreaded application, so it does all the required locking (refer to the Makefile, available at the top of this article). I ran this and subsequent tests on a 2-GHz Dell desktop with 1 GB of RAM under Red Hat Linux 7.3. Example 2(a) shows the results.

Listing One

#include 
#include "util.h"
unsigned int createStrings ( int n )
{
 static char *samples[]={"something1","something2","something3","something4"};
 printf ("Thread <%d>: Creating/destroying %d strings...\n",pthread_self(),n);
 unsigned int start = utilGetTickCount ();
 for ( int i=0; i: total time=%d millisecs, average=%f\n",
           pthread_self(), end-start, (float)(end-start)/n );
 return ( end - start );
}
int main ( int argc, char **argv )
{
  if ( argc > 2 )
  {
    printf ( "teststring1  \n" );
    return 0;
  }
  int n = 100000;
  if ( argc > 1 )
    n = atoi ( argv[1] );
  createStrings ( n );
  exit(0);
}


(a)
Thread <1024>: Creating/destroying 100000 strings...
Thread <1024>: total time=108 millisecs, average=0.001080

(b)
Thread <1026>: Creating/destroying 100000 strings...
Thread <2051>: Creating/destroying 100000 strings...
Thread <3076>: Creating/destroying 100000 strings...
Thread <4101>: Creating/destroying 100000 strings...
Thread <5126>: Creating/destroying 100000 strings...
Thread <6151>: Creating/destroying 100000 strings...
Thread <7176>: Creating/destroying 100000 strings...
Thread <8201>: Creating/destroying 100000 strings...
Thread <9226>: Creating/destroying 100000 strings...
Thread <10251>: Creating/destroying 100000 strings...
Thread <1026>: total time=7804 millisecs, average=0.078040
Thread <2051>: total time=8749 millisecs, average=0.087490
Thread <10251>: total time=10455 millisecs, average=0.104550
Thread <6151>: total time=10643 millisecs, average=0.106430
Thread <8201>: total time=10718 millisecs, average=0.107180
Thread <9226>: total time=10724 millisecs, average=0.107240
Thread <7176>: total time=10755 millisecs, average=0.107550
Thread <5126>: total time=10760 millisecs, average=0.107600
Thread <4101>: total time=10765 millisecs, average=0.107650
Thread <3076>: total time=10867 millisecs, average=0.108670

Example 2: (a) The results of running teststring1.cpp; (b) results of running teststring2.cpp.

Listing Two (teststring2.cpp) is the same test, but with 10 threads running—each creating/destroying 100,000 automatic string objects. (Again, every one of them allocates memory on the heap and employs some locking and copying.) Example 2(b) presents the less-than-encouraging results of the teststring2.cpp test. With just 10 threads, average automatic string object handling (creation/destruction) increased approximately 100 times (10000/108). Since these results seem unacceptable for high-load multithreaded applications, I decided to find out where time is being spent, then create a new string class that doesn't have those deficiencies. Possible performance culprits include locking introduced by the reference counting optimization, heap allocations (and it's locking) and copying of the value.

Listing Two

#include 
#include "util.h"
void *createStrings ( void *n_void )
{
 int n = (int) n_void;
 static char *samples[]={"something1","something2","something3","something4"};
 printf ("Thread <%d>: Creating/destroying %d strings...\n",pthread_self(),n);
 unsigned int start = utilGetTickCount ();
 for ( int i=0; i: total time=%d millisecs, average=%f\n", 
           pthread_self(), end-start, (float)(end-start)/n );
 return NULL;
}
int main ( int argc, char **argv )
{
  if ( argc > 3 )
  {
    printf ( 
       "teststring1 [] []\n" );
    return 0;
  }
  int n = 100000;
  int n_thr = 10;
  if ( argc > 1 )
    n = atoi ( argv[1] );
  if ( argc > 2 )
    n_thr = atoi ( argv[2] );
  pthread_attr_t attr;
  pthread_attr_init ( &attr );
  pthread_t *tid = new pthread_t[n_thr];
  int i;
  // create threads
  for  ( i=0; i\n", i );
      exit(-1);
    }
  }
  // wait for all threads to terminate
  for  ( i=0; i

1 2 3 4 5 6 7 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

C/C++

C++ String Performance

`std::string` Class Design

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

C/C++

C++ String Performance

std::string Class Design

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

`std::string` Class Design