Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Smart Pointers Reloaded (IV): Finale


April 04: Smart Pointers Reloaded (IV): Finale Comparing the costs of design trade-offs

It's funny how the language we're using influences our way of thinking and, in particular, the kind of things we worry about. Kudos to Sapir and Whorf [3]. Fortunately, I'm past my "efficient C" days when I would reuse a stack variable if I could, in spite of the predictable maintenance nightmare; but I was young and sharpeyed. Still, I must admit that I do worry about an extra copy of an std::string — but I'm the same guy who would coldbloodedly allocate a slew of objects on the heap and copy them all over the place when programming in Java, JavaScript, or some shell scripting language. And if the JavaScript is to be downloaded — hey, that one extra character would cost (absent compression and at 10 Mbps) one millisecond to transport, which is a coffee break in processorspeak.

Such is human nature. "Microefficiency" remains an unconfessed, furtive mental preoccupation for many of us C++ programmers, in spite of the increasing evidence that, really, most apps are I/O bound, your design dictates performance much more than minute optimizations, and when you do have to micro-optimize, you only need to do so on a small (and hard to guess) portion of your code.

I'm mentioning that because there are still many people who are apprehensive of replacing the bug-radiating bald pointers with smart pointers, because "they are not efficient." To help this situation, Dave and I present some measurements that compare the costs of various design trade-offs when using smart pointers. The best testbed for such tests is nothing other than smart_ptr, with its easily mixable and matchable design choices pertaining to construction, ownership, and checking — definitely the most important aspects of a smart pointer's performance.

Mailcontainer

Following our last column on construction tracking [2], we received a number of e-mails with great observations. Herb Sutter mentions:

After the first code example, you correctly write: "Inside the catch block, our code can't tell which of b1_ and b2_ failed to initialize!" That's correct, but it's not the most fundamental point: Inside the catch block, neither b1_ nor b2_ exists (if b1_ was constructed, it has already been destroyed) and you can't even so much as mention the names of b1_ or b2_ because their names are not in scope.

(Many thanks are due to Norbert Riedlin, who sent in a comment on the same issue.) Let's pull that code again:

class A {
   int tracker_;
   B b1_;
   B b2_;
public:
   A()
   try
   : tracker_(0)
   , b1_((tracker_ = 1, "hello" ))
   , b2_((tracker_ = 2, "world" ))
   {
      assert(tracker_ == 2);
         ... constructor body ...
   }
   catch (...) {
      if (tracker_ == 0) { //can't
         ... none initalized ...
      } else {
         ... only b1_ initialized ...
      }
   }
};

Dave, myself, GCC, como, and Microsoft Visual C .NET 7.1 disagree with the part that "their names are not in scope." Unfortunately, that's true but uninteresting because, while the name tracker_ is lexically in scope, it has untouchable status as per 15.3 paragraph 10 of the C++ Standard, which mentions:

Referring to any nonstatic member or base class of an object in the handler for a function-try-block of a constructor or destructor for that object results in undefined behavior.

It's the worst of all worlds: You can't touch any member, yet three of today's compilers give you no compile-time diagnostic. Fortunately, our past article abandoned that bad path and took the approach of using a constructor parameter, which has a guardian angel in the form of section 15.3 paragraph 12, which proclaims:

The scope and lifetime of the parameters of a function or constructor extend into the handlers of a function-try-block.

But the philosophical point Herb raises remains valid, because he continues:

But, returning to #1, all of this is pointless, isn't it? I can't imagine what useful recovery code you could put into the catch block. [...] There's nothing you can usefully do with that information other than log the error or rethrow a different type of exception instead, whereas the point of your article was to perform some cleanup action, presumably based on b1_ and b2_ and not some global state (which wouldn't be reentrant, etc.).

There could be situations when you can do cleanup just by using the arguments passed to the constructors, but in our case (with the storage and ownership), the approach is not applicable. So what can I say. Touché. Good thing smart_ptr didn't choose this approach.

A Little Interesting Optimization

To practice terminology a little, remember that "external reference counting" is the reference-counting technique that keeps the counter separate from the pointed-to object, and "intrusive reference counting" means that the counter is embedded within the pointed-to object. The latter is faster and consumes less memory because there's no extra allocation, and the disadvantage is that it is, well, intrusive.

The speed difference between intrusive and external reference counting is quite sensible (as our measurements show), so we'd like to avoid that extra allocation if at all possible.

There's a little optimization that could improve things a tad. The idea is to defer allocating the external reference count until the first copy. The usual way of doing things is:

  • When first grabbing the resource, the pointer to the reference count (call it pCount_) is allocated from the heap and *pCount_ is initialized to 1.
  • Each copy just increments *pCount_.

  • The destructor decrements *pCount_. When *pCount_ transitions to 0, pCount_ is deallocated.

The alternative approach is:

  • When first grabbing the resource, assign pCount_ = NULL.
  • First copy allocates pCount_ and initializes *pCount_ to 2. Other copies just increment it.

  • In the destructor, the test of unique ownership is pCount_ == NULL or — *pCount_ == 0.

This is not a clever optimization, and we chose to not make use of it in our tests because it would have introduced false good results. The truth is that, more often than not, smart pointers will be copied, so optimizing the one hopeful case when a pointer stays unique just doesn't bring much benefit. Worse, each copy and each destructor must perform one extra test. But this little idea opens the door to a second one.

Two for the Price of One

We read so many books and articles about new and improved reference counting mechanisms back when the Nasdaq was 5000, that we readily admit that the following is hardly original. However, Google refuses to reveal the first author of the technique we present, and at least we might be the first to apply it within the policy-based smart_ptr.

A more beneficial optimization would avoid allocating a counter for up to two smart pointers instead of one, which is a case plausibly more frequent. For example, creating a smart pointer and sticking it into an std::vector requires two copies of the smart pointer for a short time.

It is possible to lay things out portably so that one and two pointers don't require any extra dynamic memory allocation. To explain how, take a look at the following simple smart-pointer layout. (We strip out templates and policies and all, focusing solely on fields and state for now.)

class smart_ptr {
   union {
      mutable smart_ptr* pOther_;
      mutable unsigned int count_;
   };
   T* pointee_;
   ... functions ...
};

On a 32- or 64-bit desktop system, this structure consists of two pointers. (Integers are seldom larger than pointers on today's systems. We don't rely on any relationship between the sizes of the two types, but the scheme works best when they are of the same size.) Now imagine the following reference counting strategy:

  • When a smart_ptr first acquires a T*, it initializes pOther_ to NULL (Figure 1).
  • When a smart_ptr is copy-constructed from another smart pointer that's the unique owner of the resource, the copy constructor makes them point at one another (Figure 2). So now what you really have is two smart pointers that both point to the same T object, and the pOther_ fields point to one another. Interesting.

  • When the third copy comes along, the strategy is changed entirely: The copy constructor allocates a new smart_ptr on the heap, assigns NULL to its pointee_, and uses its count_ field as the reference counter. All three smart_ptrs point to that newly allocated smart_ptr. (Interestingly, the same structure is used as the reference count and as the smart pointer itself.) From here on, it's reference-counted business as usual.

smart_ptr's copy constructor detects what state its argument is in and takes action appropriately. Similarly, during destruction, the smart pointer again figures out the state to properly unlink things. We now have a fat counter (two words in size) because the pointee_ field of the counter is "slack." So we reached a design with the following trade-offs:

  • The smart pointer size is about the same (two words) as in straight external reference counting. This is good because, presumably, there are more smart pointers than tracked objects.
  • The shared reference counter is about twice as large compared to the usual implementation. There are two factors that mitigate this waste: First, with many memory managers, an allocation request for 4 bytes might get you 8 or more bytes anyway; second, the number of reference counts is the same as the number of tracked objects, which, as said, is presumably smaller than the number of smart pointers. So the overhead is "in the right place."

  • Now, if you have knowledge about pointer layout, you can eliminate that overhead at the expense of portability. For example, with i386 systems, addresses are (as in this case) always a multiple of four; it follows that two bits are not used, a fact that can be exploited to distinguish between a pointer to smart_ptr versus a pointer to the shared counter. Many other interesting setups are conceivable.

  • Of course, the main advantage of the whole scheme is that it doesn't allocate a dynamic counter until the third copy comes along. You could conceive schemes that admit 3-cliques instead of the 2-cliques in Figure 2 (see Figure 3). We believe that the increased computational overhead would offset the added advantage; we haven't tried this idea.

  • The communication between smart pointers is rather intricate, so in a multithreaded system, we expect this approach to need locks and therefore perform poorly.

Measuring Smart Pointer Speed

When trying to make timing tests, it's easy to make a fool of yourself — good, relevant numbers are really hard to obtain and good simulations are hard to put together. But we bit the bullet and made some measurements that bring a few insights into the relative costs of various decisions related to smart pointer design.

Memory hierarchies (various levels of caches), compiler optimizations, speculative execution, superscalar processors — all these make it hard to devise proper simulations and speed tests. But we sure can try.

The first thing we did before measuring anything was to fragment the memory. You see, when starting with a clean slate, most (if not all) memory allocators initially perform well. It's only later when you can really see allocation costs either in the form of searching (for allocators that try to fill the holes produced by fragmentation) or paging (for allocators that avoid searching and prefer expansion).

So first we wrote a simple function that allocates many chunks of random sizes and then frees about half of them. That function runs before any actual tests are made.

Second, we wrote a baseline for our tests, which represents the overhead of looping, manipulating timers, calling a function, and so on. In essence, the baseline consists of all the additional costs that come with the work that we want to measure. Here, we had to fight the optimizer, which was all too eager to optimize out of existence loops that do nothing and functions that just return. A good solution for today's compilers is to just place the functions in different modules, in which case, the compiler (GCC in our case) doesn't have the opportunity to inline.

Third, we created a simple class that is supposed to represent a real-world class and, of course, there's no class more representative than the mythical Employee with the oh-so predictable fields name, social security number, and all those good things.

We timed the most important policy combinations that affect speed (see Table 1). The ownership policies we measured were reference counting, optimized reference counting (the technique just shown), and reference linking [1]. Also, we timed the cost of nullness checking, which smart_ptr supports via the checking policy. We compare all of these tests with the corresponding code written with bare pointers. This way, we try to not only give an image of the relative costs of various design trade-offs, but also an estimate of the true cost of the comfort that smart pointers bring.

The good news is, the costs are not prohibitive, so hopefully this might move a few skeptics' hearts. First off, the overhead incurred by initialization is very low. This is because Employee has a couple of std::string fields that already need to allocate memory, as is often the case in the real world; testing smart pointer initialization with ridiculously small and/or simple pointee classes would have yielded unrealistically pessimistic results. Notice how improved reference counting made the cost of creation practically immeasurable.

The second column shows the overhead of a "roundtrip" copy — that is, the construction and destruction of a copy of a smart_ptr. This is exactly what happens when, say, you pass a smart pointer as an argument to a function. Here the "improved" reference counting loses because of the complex logic in its copy constructor. By and large, the rule of thumb is that copying a smart pointer costs about 4.5 as much as copying a raw pointer.

Checking pessimizes dereference by about 30-40 percent, which should be more than acceptable for a large range of applications. Plus, with smart_ptr, you can always replace the checking policy with an unsafe one in a few select cases.

The unlikely star of these tribulations is the reference-linked policy. In that design, each smart pointer keeps prev_ and next_ pointers to complete a doubly linked list. So the smart pointer is bigger, but there's no dynamic allocation and (due to a few tricks) no tests during the construction and destruction. A good lesson Dave and I learned was that linear execution is significantly faster than even the simplest tests and branches. So for smart pointers, it looks like simple policies are good. As Dave peremptorily put it: "ref_linked may be fat, but it's stupid, and that's why it's fast." Now try to explain to your spouse how that sentence actually makes sense in computer engineering.

Conclusion

We hope you enjoyed the smart pointers tetralogy, and that this article could be one more argument to your unwisely efficiency-preoccupied manager to replace bald pointers with smart pointers. This concludes our treatment of smart pointers. Dave and I are still excited and full of ideas on improving reference-counting policies, so we'll let you know if we have interesting results. Until then, fear not: There are way too many exciting things to do and share. It's now time to rest our smart pontes [4] a little and close the subject for now. May you be referenced for a long, long time.

Acknowledgments

Thanks to Herb Sutter and Norbert Riedlin for their feedback. Also, thanks to the Boost community for helping out on smart_ptr's design, and to Jon Turkanis who got smart_ptr working on Intel 7.1, 8.0, Metrowerks 8.0, Como 4.3.3 (VC 7.1 backend), GCC 3.2, and bcc 5.5.1 and 5.6.4. That's nine compilers!

References

[1] Andrei Alexandrescu. Modern C++ Design: Generic Programming and Design Patterns Applied, Addison-Wesley Longman, 2001. ISBN 0-201-70431-5.

[2] Andrei Alexandrescu and David B. Held. "Generic<Programming>, Smart Pointers Reloaded (III): Construction Tracking." C/C++ Users Journal, February 2004.

[3] Sapir, Edward. The Status of Linguistics as a Science in Language, 1929.

[4] Not a typo. The pontes are nerve fibers in the brain.


Andrei is a graduate student in Computer Science at the University of Washington and author of Modern C++ Design. David is a consultant specializing in custom software development. They can be contacted at [email protected] and [email protected], respectively.



Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.