Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

From Mechanism to Method: A Fair Share, Part 1


October 2002 C++ Experts Forum/From Mechanism to Method


List processing has a long and venerable history stretching back from the STL through a range of object-oriented collection libraries of the past and, ultimately, to Lisp, the original list processing language. The multitude of interfaces across this history almost hides the similarities. In most cases, though, the popular array type is put to shame.

I still have a deep fondness for the Lisp model. It is simple, elegant, and something with which all developers should have an infatuation at least once in their programming life. When I first worked with C, it was the Lisp model I turned to for the design of my linked lists. When I started working with C++, it was the Lisp model that I wrapped up in a class. Slowly the influence of other libraries affected and evolved my approach to writing and using lists. I noticed a creeping featurism that started to populate the interface of mine and others' classes, normally in the name of some nebulous concept of generality or mistaken pursuit of reuse, which seemed to displace the pragmatic reality of use [1].

And then came the STL. This was a breath of fresh air that blew away cobwebs and overweight interfaces. It articulated a minimal but comprehensive and extensible model of containers that included, unsurprisingly, lists. Its orthogonal design allowed a range of expression broader than could be achieved with the broad interfaces of more ambitious classes, all thanks to the combination of a few containers, iterators, algorithms, and function objects. (We will pass politely and quietly over the failed experiment in policy-based design that constitutes allocators.)

But the STL is not the end of history. Last year I was leafing through Koenig and Moo's excellent but often overlooked Ruminations on C++ [2] relocating a particular passage when I stumbled across a section on lists:

    The USL list class has 70 members and one friend function.... The implementation is more than 900 lines of source code.... All this is nice, but it sometimes leaves one wishing for less. If we can have lightweight processes and reduced instruction set computer (RISC) chips, why not a RISC (reduced instruction set container) class? For inspiration, I went back more than a third of a century and looked at Lisp.

Although we're all STL now, this inspiration was an inspiration. It was time to get reinfatuated.

STL with a Lisp

If you have not come across Lisp lists, here is a quick overview:

  • A list is a singly linked sequence of nodes.
  • Lists may share representation with other lists, but, because lists and their elements are immutable, this sharing is transparent.
  • Memory management is automatic.
  • The empty list is known as nil.
  • The head of a list (i.e., its initial element) is returned using the car function.
  • The tail of a list (i.e., all the elements excluding its head) is returned using the cdr function. nil is returned for a single-element list.
  • A new list is constructed using the cons function, which prepends a new element to a list. To create a list of one element, you cons that element with nil. Once you have that, you're under way: you can cons up another list with that first list as the tail, and so on.

There's a bit more to it than this — in Lisp, lists are not just data structures: they are also the program — but that's enough for what we want from them in C++.

A Basic Wrapper

Leaving aside a couple of the features, it is easy to get a feel for Lisp-like programming in C++ by writing some functions around the std::list template:

enum nil_type
{
   nil
};

template<typename type>
std::list<type> cons(
   const type &head, const std::list<type> &tail)
{
   std::list<type> result = tail;
   result.push_front(head);
   return result;
}

template<typename type>
std::list<type> cons(const type &head, nil_type)
{
   return cons(head, std::list<type>());
}

template<typename type>
type car(const std::list<type> &list)
{
   return list.front();
}

template<typename type>
std::list<type> cdr(const std::list<type> &list)
{
   return std::list<type>(++list.begin(), list.end());
}

The dummy nil_type is introduced to distinguish a convenience overload for cons. The difference in convenience is clear when you want to start building a list:

std::list<char> letters = cons('a', std::list<char>());

Versus:

std::list<char> letters = cons('a', nil);

To compose a list of two items, you apply cons again:

std::list<char> letters = cons('a', cons('e', nil));

And repeatedly for more:

std::list<char> letters =
   cons('a', cons('e', cons('i', cons('o', cons('u', nil)))));

Applied to this last example, car(letters) returns 'a' and cdr(letters) returns the list 'e', 'i', 'o', and 'u'.

Adding Some Convenience

You can determine whether a std::list is empty by using its empty member function. However, this cannot be uniformly applied to an empty list that is actually expressed as nil rather than a std::list with no elements. An implementation of the standard Lisp null function provides this uniformity:

template<typename type>
bool null(const std::list<type> &list)
{
   return list.empty();
}

bool null(nil_type)
{
   return true;
}

Similar reasoning applies to the use of the std::list equality operator and the need for a consistent equality function that works with nil:

template<typename type>
bool equal(
   const std::list<type> &lhs, const std::list<type> &rhs)
{
   return lhs == rhs;
}

template<typename type>
bool equal(const std::list<type> &list, nil_type)
{
   return list.empty();
}

template<typename type>
bool equal(nil_type, const std::list<type> &list)
{
   return list.empty();
}

bool equal(nil_type, nil_type)
{
   return true;
}

When translated into C++, the Lisp approach more naturally favors global functions over member functions for uniform access.

Using cons is marginally more convenient than calling push_back or push_front:

std::list<char> letters;
letters.push_back('a');
letters.push_back('e');
letters.push_back('i');
letters.push_back('o');
letters.push_back('u');

Had the STL chosen to support the idiomatic use of chaining, the economy of cons would have been even more marginal:

std::list<char> letters;
letters.push_back('a').push_back('e').push_back('i').
 push_back('o').push_back('u');

Lisp addresses this issue of convenience in a number of ways. The most appropriate to translate into C++ is the list function that effectively allows a list to be consed up like an array initializer. We could emulate this capability by introducing another overload dummy and overloading the comma operator (not a practice that is generally recommended, but certainly one that offers unrivaled opportunities for you to encrypt your code if you think there is a danger of anyone understanding it...). The usage syntax would actually be truer to Lisp than the C++ function-call style used for the functions implemented so far:

std::list<char> letters =
   (list, 'a', 'e', 'i', 'o', 'u');

This is a reasonably discreet use of the comma operator, but it may be more appropriate to resort to something closer to an existing C++ idiom, streaming, to which you can add parentheses to taste, should you wish:

std::list<char> letters = 
   list << 'a' << 'e' << 'i' << 'o' << 'u';

The implementation is straightforward:

enum list_type
{
   list
};

template<typename type>
std::list<type> operator<<(list_type, const type &head)
{
   return cons(head, nil);
}

template<typename type>
std::list<type> operator<<(
   const std::list<type> &list, const type &last)
{
   std::list<type> result = list;
   result.push_back(last);
   return result;
}

Opportunities for Further Generalization

How far can you take this? Well, a quick look at the whole of Common Lisp [3] suggests a very long way indeed — compared to the original Lisp 1.5, the standardized Common Lisp does not epitomize sufficiency in quite the same way.

You can keep on adding convenience functions for list construction and extraction. You could choose to pursue the comma-based syntax for all functions so that your Lispish C++ looked more like Lisp.

If you really feel that you want to recapture Lisp's more carefree approach to typing, you can use a variant type and constrain your wrapper functions to something like std::list<boost::any> [4, 5], or a suitable typedef. Alternatively, if you want to keep the typing strict but more open, you could free the function templates from working with std::list. A little bit of extra templating and some metaprogramming will support this liberalization. As an aside, you will find that many C++ compile-time techniques recall Lisp [6, 7].

Share and Share Alike

In spite of all the possibilities, I want to pursue a different course, away from the wrappers and back to STL. For all the attractions of Lispish, when I am working in C++, I want to work as closely to the model and idioms of that language as I can, importing stylistic idioms from elsewhere only when they clearly offer something better. The same thinking applies with any language — C++ is not Java, Java is not Smalltalk, etc. — which is why idioms are idioms and not universals. It's fun to do as an exercise, but unless you really are getting something that is novel but not just novelty, that's probably about as far as it goes. However, that does not stop me importing design ideas from elsewhere and representing them in a C++ style.

I would like to use the Lisp list model, but riddling my code with Lispisms is more of an affectation than a necessity. The model does not imply the syntax. And, as you may have noticed, all we have done so far is ape the syntax not the underlying model. std::list is not really up to the job, and not just because it's doubly linked rather than singly linked. If you were to use the common — but not yet standard — slist [8], you would encounter the same problems. Certainly, the memory management is automatic, but only as a result of excessive copying and temporary object creation. For anything but the most trivial of applications, cons and cdr might as well be spelt bottle and neck.

A std::list fully contains its internal node structure, and the closest it gets to sharing is the exclusive exchange of its contents through swap and splice member functions. True representation sharing would be inappropriate because it would not necessarily be transparent. If you were to cons one list from another and then use iterators to erase all the elements from the newly built list, what would be the effect on the old list? "Unfortunate and surprising" would be a modest understatement.

Immutability and Sharing

In contrast to the modern financial markets, the shares we want here shouldn't change. We want certainty and satisfaction of the principle of least astonishment, and we don't want performance to crash. I'm going to suggest a design similar to one that I previously outlined for sharing string representation [9]. We need a list type that holds immutable elements and supports only a limited range of modification operations that are guaranteed not to break the transparency of the sharing.

Let's clarify how far this transparency should go before we examine the interface of such a list abstraction. const suggests everything that we want, but does not necessarily guarantee it. Thanks to mutable, const_cast, and the freedom to change anything at a level of indirection without a cast in a const member function, a const-qualified object is not actually guaranteed to leave the physical state of an object immutable. In a multithreaded environment, an unsynchronized or non-atomic change may corrupt the state of an object; no matter how discreet or well intentioned such a change would be in a single-threaded context.

A working definition of immutability is that snapshots of an object obj's full state at one point in time, call it mem_t0, and another, call it mem_t1, should always yield memcmp(mem_t0, mem_t1, sizeof(obj)) == 0, and that this should apply recursively through any level of indirection or any non-member state used to supplement the object's own representation.

In the absence of threads or interrupts, or with synchronized and atomic changes, a less strict rule is to require logical or conceptual const-ness of the elements in the list. This means that for all intents and purposes any physically altered state makes no impression on the externally observed functional behavior of an object. If such an object has a mutable member, it should be used for caching or some other optimization, but otherwise it should have no externally observable effect.

These two levels of requirement are not too draconian: they cover the range of how you should be using const and mutable anyway. Therefore, if we stipulate that the elements of the list are const qualified — conforming to either true immutability or just conceptual const-ness, depending on the context of use — then the only types that will break the requirements of our data structure are the ones that are broken in the first place.

The const requirement as stated means you can freely share an element between two lists, eliminating the cost of copying when creating a new list by prepending an existing one. It can only work with singly linked lists: a change downstream will not affect existing lists that share elements, but a change upstream would. The only way that you can detect the sharing is if you start taking the address of elements in different lists and comparing them. However, that is not a concern: as with other STL containers, the proposed list is value based, and identity is an uninteresting property for values [5].

An Interface

So let's describe the interface to this class template, lispt, parameterized on a value type that will be treated as const so that lispt<type> is effectively the same as lispt<const type>.

Allocators are fortunately not required in order to conform to container requirements and, as they serve little useful role, we can omit them easily. This simplification means that we can concentrate on the mandatory container requirements, the mandatory sequence requirements, and the appropriate optional sequence requirements that would make the type a bona fide STL sequence [10]. Leaving aside all the typedefs required of a container and the superfluous max_size operation, but including a couple of convenience operations, such as assign, gives the following interface:

template<typename type>
class lispt
{
public: // structors

    lispt();
    lispt(const lispt &);
    lispt(const type &, const lispt &);
    template<typename input_iterator>
      lispt(input_iterator, input_iterator);
    explicit lispt(std::size_t, const type & = type());
    ~lispt();

public: // assignment

    lispt &operator=(const lispt &);
    template<typename input_iterator>
      void assign(input_iterator, input_iterator end);
    void assign(std::size_t, const type &);
    void swap(lispt &);

public: // iteration

    class const_iterator;
    typedef const_iterator iterator;
    const_iterator begin() const;
    const_iterator end() const;

public: // access

    const type &front() const;
    void pop_front();
    void push_front(const type &);

public: // capacity

    bool empty() const;
    std::size_t size() const;
    void clear();
    void reserve();

private: // representation
    ...
};

// comparison
template<typename type>
  bool operator==(
    const lispt<type> &, const lispt<type> &);
template<typename type>
  bool operator!=(
    const lispt<type> &, const lispt<type> &);
template<typename type>
  bool operator<(
    const lispt<type> &, const lispt<type> &);
template<typename type>
  bool operator<=(
    const lispt<type> &, const lispt<type> &);
template<typename type>
  bool operator>(
    const lispt<type> &, const lispt<type> &);
template<typename type>
  bool operator>=(
    const lispt<type> &, const lispt<type> &);

In addition to the self-explanatory parts of the interface, there are some things to note:

  • Because of its singly linked nature, a lispt is not a reversible container. Its iterators are forward iterators only, so there is no support for reverse iteration or even a reverse operation.
  • The immutability of each element is assured. The front operation returns a const reference and does not exist in an overloaded non-const form. Because only the capability for const_iterator is defined — like many implementations of std::set, iterator and const_iterator are the same type — only browsing access is provided for iteration.
  • A sleight of hand allows the insert and erase operations to be omitted: they would have to be operations on a const_iterator type, for which insert and erase are not required. Their inclusion would successfully torpedo any transparent shareability.
  • Only the push_front and pop_front dispensing operations are supported. These can be supported in constant time and without violating the shareability of the rest of the list.
  • Operations that work on the list as a whole, such as swap and assignment, are the only other modifiers supported. These can be expressed without violating any sharing constraints because they operate on the aggregating object rather than the individual elements.
  • The reserve operation unshares a copy of the elements for a lispt.
  • The extra constructor, taking a value and a lispt, is effectively cons.

A First Implementation

The implementation is mostly straightforward. Starting with the internal representation:

template<typename type>
class lispt
{
    ...
private:
    struct link
    {
        link(const type &value, link *next = 0)
          : value(value), next(next)
        {
        }
        const type value;
        link *next;
    };
    link *head;
    std::size_t length;
};

lispt uses a fairly conventional link, holding a pointer to the head of a chain of links plus a count of the chain length. Many of the operations are quite simple:

template<typename type>
class lispt
{
    ...
    void swap(lispt &other)
    {
        std::swap(head, other.head);
        std::swap(length, other.length);
    }
    const type &front() const
    {
        return head->value;
    }
    void pop_front()
    {
        --length;
        head = head->next;
    }
    void push_front(const type &new_front)
    {
        head = new link(new_front, head);
        ++length;
    }
    bool empty() const
    {
        return length == 0;
    }
    std::size_t size() const
    {
        return length;
    }
    void clear()
    {
        head = 0;
        length = 0;
    }
    ...
};

Because they are aliasing, operations such as copy construction and assignment can be generated by the compiler. However, there is one quite involved constructor:

template<typename type>
class lispt
{
    ...
    template<typename input_iterator>
    lispt(input_iterator begin, input_iterator end)
      : head(0), length(0)
    {
        if(begin != end)
        {
            head = new link(*begin++);
            ++length;

            for(link *current = head; begin != end;
                ++begin)
            {
                current->next = new link(*begin);
                current = current->next;
                ++length;
            }
        }
    }
    ...
};

Many other operations can be defined in terms of the existing operations:

template<typename type>
class lispt
{
    ...
    lispt(const type &front, const lispt &tail)
      : head(tail.head), length(tail.length)
    {
        push_front(front);
    }
    explicit lispt(
        std::size_t initial_size, 
        const type &fill = type())
      : head(0), length(0)
    {
        while(initial_size-- > 0)
            push_front(fill);
    }
    template<typename input_iterator>
    void assign(
        input_iterator begin, input_iterator end)
    {
        lispt copy(begin, end);
        swap(copy);
    }
    void assign(
        std::size_t new_size, const type &fill)
    {
        lispt copy(new_size, fill);

        swap(copy);
    }
    ...
};

All of the remaining operations depend on iterators, so let's take a look at the nested const_iterator class:

template<typename type>
class lispt
{
    ...
    class const_iterator :
        public std::iterator<
            std::forward_iterator_tag, type, 
            std::ptrdiff_t, const type *, 
            const type &>
    {
    public:
        explicit const_iterator(const link *start = 0)
          : current(start)
        {
        }
        const type &operator*() const
        {
            return current->value;
        }
        const type *operator->() const
        {
            return ¤t->value;
        }
        const_iterator &operator++()
        {
            current = current->next;
            return *this;
        }
        const_iterator operator++(int)
        {
            const_iterator former_self = *this;
            ++*this;
            return former_self;
        }
        bool operator==(
            const const_iterator &rhs) const
        {
            return current == rhs.current;
        }
        bool operator!=(
            const const_iterator &rhs) const
        {
            return current != rhs.current;
        }
    private:
        const link *current;
    };
    ...
};

This lightweight iterator type is used by the following member functions:

template<typename type>
class lispt
{
    ...
    const_iterator begin() const
    {
        return const_iterator(head);
    }
    const_iterator end() const
    {
        return const_iterator();
    }
    void reserve()
    {
        lispt copy(begin(), end());
        swap(copy);
    }
    ...
};

Nothing unusual to note, except perhaps that a default constructed iterator plays the role of the end marker. Unlike doubly linked lists, which require a dummy element at the end to allow reverse iteration, a singly linked list can make do with a simple and traditional null to terminate the list.

The only remaining operations are the non-member relational operators:

template<typename type>
bool operator==(
    const lispt<type> &lhs, const lispt<type> &rhs)
{
    return lhs.begin() == rhs.begin() ||
           (lhs.size() == rhs.size() &&
            std::equal(
                lhs.begin(), lhs.end(), rhs.begin()));
}

template<typename type>
bool operator<(
    const lispt<type> &lhs, const lispt<type> &rhs)
{
    return std::lexicographical_compare(
                lhs.begin(), lhs.end(), 
                rhs.begin(), rhs.end());
}

The equality and less-than operator demonstrate the essential anatomy; the others can be defined in terms of them and also make use of the equality shortcut based on identity.

It is left as an exercise for the reader to go back and rewrite cons, car, cdr, etc. in terms of lispt instead of std::list.

Conclusion

Readers with an eye for running code through the virtual machine in their head will spot a surprising omission from the code so far: memory management. Sure, lots of things are created, but what about tidying up? Where are the delete expressions to counterbalance those new expressions? If alarm bells were not ringing in your head, get a fresh coffee and look back through the code.

I don't normally like to program or demonstrate in terms of broken code without reaching a resolution in the same session. This time I will break with that. Just comfort yourself with the knowledge that (1) I will deal with that next time, (2) there is often more to effective memory management than meets the eye, and (3) the orthogonality of the STL and the minimalism of the Lisp list model make a lovely couple.

References

[1] Kevlin Henney. "The Imperial Clothing Crisis," Overload 47, February 2002, <www.curbralan.com>.

[2] Andrew Koenig and Barbara Moo. Ruminations on C++ (Addison-Wesley, 1996).

[3] Guy L. Steele. Common Lisp: the Language, 2nd edition (Digital Press, 1990), <www-2.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/html/cltl/cltl2.html>.

[4] Boost C++ Libraries, <http://boost.org>.

[5] Kevlin Henney. "Valued Conversions." C++ Report, July-August 2000, <www.curbralan.com>.

[6] Andrei Alexandrescu. Modern C++ Design (Addison-Wesley, 2001).

[7] Krzysztof Czarnecki and Ulrich W. Eisenecker. Generative Programming (Addison-Wesley, 2000).

[8] Matthew H. Austern. Generic Programming and the STL (Addison-Wesley, 1999).

[9] Kevlin Henney. "From Mechanism to Method: Distinctly Qualified," C/C++ Users Journal C++ Experts Forum, May 2001, <www.cuj.com/experts/1905/henney.htm>.

[10] International Standard: Programming Language - C++, ISO/IEC 14882:1998(E), 1998.

Kevlin Henney is an independent consultant and trainer specializing in C++, Java, OO design, patterns, and software architecture. He can be reached at [email protected].


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.