Wherein the readers rise up, though hardly as one, and smite the messenger severely about the head and shoulders.
Copyright © 1997 Robert H. Schmidt
As I write, it is the week of Christmas, and while the Jolly One may get the proverbial ton [1] of mail, he's got nothing on my virtual mailbag. You-all just seem to love telling me about October's Hungarian Notation column (and its spin off in December). To sate your collective appetite, I present more Hungarian pontification, and dedicate the resulting column in finest PBS tradition to Diligent Readers Like You.
Respecting Privacy Revisited
Perhaps the most insistent mail came as response to a quasi-sidebar I included in December's column, where I recommended that you declare all data private. Brad Vender (whose name I couldn't recall at the time) rejoined that, ala MFC, he declared some data protected. In response, I mentioned almost in afterthought that you could effectively promote protected data to public via an access declaration:
class base { protected: int i; }; class derived : public base { public: base::i; // access declaration }; int main() { base b; b.i = 0; // error, 'i' is protected derived d; d.i = 0; // OK, 'i' is public return 0; }
thereby removing the protection implied by the protected keyword. protected really requires good faith from derived classes. Much as a protected-mode OS like Windows NT has to trust its privileged processes to play nicely, protected members have to trust that privileged (derived) classes won't abuse them. Such members are truly protected only to the extent derived classes choose to honor the protection contract. This is why I wrote that protected is a hint by the base class writer, not a mandate.
What I hadn't expected is that so many of you would dash off to try the above example, only to report back that your compiler doesn't support it. My surprise came twofold. First, I was reminded of the old Marx Brothers script:
Patient: "Doc, it hurts when I do
this."
Doctor: "Then don't do that!"
That is, if someone showed me a horrid example that could lead to my program malfunctioning and my company's stock price plummeting, I don't know if the first thing I'd do is run off and try it out.
Second, because the code is in alignment with the C++ Working Paper, I didn't dream that this example would fail to compile on so many systems. Some background ...
Last October, I gave a talk on C++ namespaces at the Software Development Conference. In that talk I discussed using declarations, which allow select members of a namespace to act as if they were declared elsewhere. Such declarations are also semantic replacements for access declarations, since the C++ Standard considers the latter deprecated [2] . Reflecting this equivalence, the above example becomes
class base { protected: int i; }; class derived : public base { public: // using declaration using base::i; };
Here's where life gets tricky. Because the C++ Standard considers access declarations deprecated, it does not bother to document them fully. Instead, the Standard refers you to the new-and-improved using declarations, and the documentation therein. So, while the Standard does not specifically address my access declaration example, it indirectly addresses it through an example of using declarations [3] :
class A { private: void f(char); public: void f(int); protected: void g(); }; class B : public A { using A::f; // error: A::f(char) // is inaccessible public: using A::g; // B::g is a public // synonym for A::g };
Examples in the Standard are not normative; that is, they are not gospel, and you cannot directly infer rules from them. However, examples are meant to illustrate the Standard's intent. From this and the lack of explicit counter-prose, I deduce protected-to-public using declarations are allowed. Given that, and given the equivalence of using declarations access declarations, I am guessing protected-to-public access declarations are also allowed.
"Allowed" is certainly not the same as "encouraged." I do not know the Committees' intent here. After talking with Dan "be const-correct but don't use const to pass objects by value" Saks, I'm persuaded the Standard is libertarian that not expressly forbidden is implicitly permitted [4] .
Not Compiled Here
Dave Abrahams muses that my example should be wrong because the ARM [5]
says so in section 11.3:
An access declaration may not be used to restrict access to a member that is accessible in the base class, nor may it be used to enable access to a member that is not accessible in the base class.
Before I went spelunking in the Standard, I believed the same as Dave for the same reasons. It was only while researching my SD talk that I discovered the dissonance between the Standard and the ARM. In my view (and writing), when the two disagree, the Standard trumps.
Steven G. Morgan, David L. Moore, Paul E. Russell, and Rich Slabbekoorn wrote that my access declaration example does not compile on their systems. Among them, they tried implementations from Borland, GNU, Sun, Tandem, and Watcom.
Sorry guys, but unless I am completely misreading the Standard, my example should work on ANSI-conforming implementations. However, as my example most definitely will not work on ARM-conforming implementations, I concede these compilers may be correct assuming they intentionally hew to the ARM.
David also shows that you can do an end run by redeclaring the class members with different access in different translation units:
// // translation unit 1 // class base { protected: int i; }; extern void f(base &); int main() { base b; f(b); return 0; } // // translation unit 2 // class base { public: int i; // redefine access }; void f(base &b) { b.i = 0; // OK since 'i' is public }
While such code is not well-formed, I suspect many (most?) translators will not catch it, at least at compile time. David opines that
"This is the price you pay for using a language without modules. Anybody who does this in real code, of course, should be taken out and gradually shot."
Variations on a Theme
As several readers pointed out, you can also add thin wrapper functions to get around the protection:
class base { protected: int i; }; class derived : public base { public: int get_i() const { return i; void set_i(int i) { this -> i = i; } }; int derived::get_i() const { return i; } void derived::set_i(int i) { this->i = i; } int main() { derived d; d.set_i(0); int i = d.get_i(); return 0; }
With a little tweak, this example becomes the streamlined
class base { protected: int i; }; class derived : public base { public: int &i() { return base::i; // must qualify by :: } }; int main() { derived d; d.i() = 0; int i = d.i(); return 0; }
This is the usage syntax from the original access declaration example, with the addition of () after each i reference. I am not necessarily promoting this method I offer it mainly as an alternative to the more common get/set spin pair. In later months, I'll show how this technique can overcome some problems with static data members.
I'd like to bring up a twist here. So far, my examples have shown a derived class increasing the visibility of a base-class member. But what about the other way around, so that a derived class decreases the member's visibility? Consider
class base { public: int i; }; class derived : public base { protected: base::i; };
As it turns out, even this usage is not foolproof! I'll explore the ramifications in more detail when I resume my series on abstraction, but ponder this meantime:
int main() { derived *d = new derived; d->i = 0; // error ((base *) d)->i = 0; // OK (!?) return 0; }
Warts and All
Several of you have sent opinions on and variants of Hungarian notation. While I don't have room to share all of them, I would like to offer you a cross section of what others had to suggest.
Dave Abrahams, whom I cited above, had several other comments. Among them:
"While you'll get fist-pounding agreement from me on nearly all of your arguments against warts (and absolutely all of your arguments against Microsoft-style hungarian), I have to concur with Steven Lee on the use of a tiny wart for member variables."
Dave's thesis is that my suggestion of this-> to prefix member data doesn't work in a member initializer list:
class c { public: c(int); private: int i; }; c::c(int i) : this->i(i) // error { }
Dave's right, of course; this is an angle I hadn't considered. Possible solutions include putting a wart on the member data, such as
int m_i; // Microsoft style
or
int i_; // my typical style
Conversely, you can put the wart on the constructor's parameter:
c::c(int p_i) : i(p_i) // OK { }
Globals
The preceding example brings up an interesting anomaly in C++ naming rules: you can't explicitly qualify (via ::) an object at block scope. Consider the following:
int x; // global scope, called ::x namespace n { // namespace scope, called n::x int x; class c { // class scope, called n::c::x int x; }; } // block scope, no :: name void f(int x) { for (;;) { // 2nd inner block scope int x; // // How do we refer to parameter 'x'? // } }
Both John Love-Jensen and C. Keith Ray use a g or g_ prefix to designate global objects (such as ::x above). I personally don't use this style for a couple of reasons:
- I have few or no global names in my C++ code, preferring instead to put declarations in a named scope (class or namespace). About my only exceptions lately are for library classes (such as MFC) that require certain names to be declared globally.
- Global names are really considered members of the unnamed global namespace, allowing them to be qualified with a null "name" before the ::. Thus, to be consistent, if you give other namespace data members a wart, you should give global names the same wart.
When I modify code that already has global objects, I throw all those declarations into a class or namespace called hack or something equally descriptive. This lets me and other readers know the code is in transition, and that these names will not stay as they are for long.
lpcsz-EIEIO
A couple of readers disagreed with my interpretation of MFC's wart lpcsz. I claimed it meant "CString implemented as a zero-terminated ASCII char array." Both Keith D. Alexander and Lee Dohm wrote that lpcsz actually means "long pointer to a constant zero-terminated ASCII char array." They suggest my confusion comes from CString turning itself into a char const *, which is typedefed in Microsoftese as LPCSZ.
I think Keith and Lee are correct in their read of lpcsz. However, I must tell you, my interpretation was given to me by a couple of fellow programmers on the last contract gig I did at Microsoft [6] .
Other Standards
Moving from Microsoft's standards to IBM's, readers Alex L. Bangs and the prolific Dave Abrahams recommend Taligent's Guide to Designing Programs. By the time you read this, I should have obtained this guide, and hope to have sentient comment later this year.
Dave Walker offered some standards he culled from John Lakos' Large Scale C++ Software Design, which Dave says are used at Mentor Graphics:
- d_ prefix for member data.
- s_ prefix for "static variables" (I don't know if this means static members or internal linkage).
- _p suffix for pointers.
Finally, Alan Stern suggests that what we call Hungarian actually predates Simonyi's C usage "provided we take the wider viewpoint, according to which Hungarian notation simply refers to a scheme for indicating a variable's type as part of its name." He cites Microsoft's original BASIC, which had various non-alphanumeric characters (like a trailing !, $, or %) to designate a variable's primitive type. Further back still, FORTRAN by default reserved identifiers starting with I through N as integers.
Counterpoint
Not everyone agreed with my approach in the October column. Most vociferous was probably Paul W. Baim, who wrote in part:
"I have a deep seated distrust of purely proscriptive articles ('I won't tell you what's good, just what's bad...'). Unfortunately, the variant of Hungarian notation you suggest as the correct variety is in fact Hungarian gone mad. I refer you to Microsoft's own Steve McConnell, who, in Code Complete (page 206), clearly warns against using Hungarian "warts" for builtin data types and suggests that this all-too-common use produces much work and little value. Your suggestion that reading 'warty code' is like 'wading through waist-deep mud' confuses effect for cause. It's probably bad code to begin with. Your readers would have been better served by at least a passing reference to the fact that Hungarian notation does not make up for poor coding practice, and that there are reasoned arguments in favor of it such as McConnell's."
Interesting perspective. In hindsight, I probably was so proscriptive because my overall solution to a Hungarian-laden program could well require a substantial code rewrite and redesign. That is, while it was relatively easy for me to say "don't do this," to have turned around and said "do this instead" would have taken months. In a sense, that's what my abstraction column series is about.
Both Paul and Chris Conover made reference to Steve McConnell's book. I've owned that book for a couple years now, and found it a quite informative (if sometimes long) read. I still haven't fully unpacked from my move, and don't know where my copy is. Once I find it, I'll reread parts to see how it fits in to this discussion.
From Hungarian to German
Hermann Steib from Wien Austria writes that I goofed the spelling of my German allusion last year. Where I had written
char *sturm; int drung; sturm += drung;
I should have written
char *sturm; int drang; // "u" changed to "a" sturm += drang; // ditto
Hermann notes that this spelling makes for correct "German Notation."
For the curious: the name of a late 18th-century German play, "Sturm und Drung" literally, "Storm and Stress" describes romantic literature of high emotion, pitting passionate genius against stolid convention. Goethe and Schiller are representative authors; both influenced Ludwig van Beethoven, whose music (and personality) fairly oozes this style. Not to be confused with today's narcissism posing as tragic passion (e.g., a character on "Friends" having a bad hair day).
Odds and Ends
Alessandro Vesely brought up three points I hadn't considered:
- warts for member
- warts for template typedefs
- the root of the problem in ASCII
On the first point, I admit I've never really thought much about giving a wart to any functions, class members or otherwise. I'll have to mull this over more and start paying attention to my future designs, but I'm guessing I've never found any confusion while mixing calls to member and non-member functions. Further, warts are so unaesthetic, and the use of member functions (relative to use of member data) so widespread, that I don't want to pollute my class interfaces this way.
Ale suggests the common prefix get, as in
class String { public: // ... size_t get_length() const; };
is a form of Hungarian too. That is, the get tells you the function is a class member (and further, gives some hint of how that class member is used).
I have to disagree some here, since some variation of get has been used for aeons in C APIs and libraries; as an example, crack open Microsoft's Windows SDK docs sometime and wade through the torrent of API functions starting with Get.
Hungarian evolved before Ale's point 2, so I don't know if I've ever seen this addressed. He mentions an example such as
struct object { // ... }; typedef std::vector<object> v_object; void f() { object x; v_object v_x; }
This gets a bit subtle, since the notion of what a v_object is/does changes depending on your perspective. You could say it is a vector of objects, with all the properties an STL vector implies. Or, at a more abstract level, you could say it is just an ordered collection of objects, with its underlying vector nature an unimportant artifact. Indeed, from this latter perspective, I'd almost be inclined to write
typedef std::vector<object> objects;
My experience with collection classes is still sufficiently new that I haven't developed a mature style on this yet. Once I get knee-deep into my MAPI example later this year, I'll have cause to define some container classes; by then, perhaps I'll have a better grip on the naming conventions.
On his last point, Ale writes (slightly edited):
"Finally, I wonder if the question isn't rooted in the limitations of the ASCII character set. I've done a whole course in mathematics without ever feeling the need to use more than one letter for a name. Now that we have multibytes, why don't we use Greek letters, tilde, overscore, bold, etc.?"
Classic C was written from an American/ASCII perspective. Standard C and C++ genuflect towards larger target character sets, but essentially keep the ASCII basis for source characters [7] . The C/C++-ish Java supports Unicode escapes (of the form \uXXXX) directly in the source; this is a little friendlier, but these escape sequences still manifest in the code editor as ASCII.
A solution: Standard C and C++ source metacharacters that don't affect the token stream seen by the compiler, but do manifest in an aware editor as colors, fonts, non-English, hyperlinks, and so on. We currently have editors that hack around the current source limitations with syntax-driven coloring and fonts or, as P.J. Plauger wrote in January, by jumping to links in comments. However, these solutions are non-portable and incomplete: they are inherent to the implementation, not to the source itself.
In that same Editor's Forum, Plauger went on to say "I can't imagine writing documentation ... in any form but HTML these days." Someday I'd like to say "I can't imagine writing program source in any form but HTML (or another suitable meta-language)."
Syntax Coloration
Speaking of syntax coloration, in December I mentioned ObjectMaster as a syntax-coloring editor. At the time, I was told it was available on the Mac. Now Alex L. Bangs (one of the Taligent advocates) and Andy Evans say it is available for Windows (they don't say if it's for Win16 or Win32). According to Alex, the vendor of ObjectMaster is ACI, and can be reached as http://www.acius.com.
Caveat emptor: I have not run this program or spelunked this web site. I am only a messenger for what your CUJ brethren and sisteren have told me.
Closing Chords
Unless I receive spectacularly scintillating commentary on this column, this is the last I'll be writing about Hungarian for a while. Next month, I resume my exploration of C and C++ abstraction via the wonderful world of MAPI.
Notes
[1] Or "tonne" for those readers whose money still features the original QE II.
[2] Standard C++ Working Paper (Committee Draft 2), page 178, section 11.3, paragraph 1, footnote 98, hut-hut-hut.
[3] Same Draft, page 119, section 7.3.3, paragraph 16. Memorize these numbers. With all the mail I've received on this, I envision bar bets springing up worldwide.
[4] Or, to paraphrase Article IX of the U.S. Bill of Rights: The powers not delegated to the C++ language by the Standard, nor prohibited by it to the implementors, are reserved to the implementors respectively, or to the users.
[5] Margaret A. Ellis and Bjarne Stroustrup. The Annotated C++ Reference Manual (Addison-Wesley 1995).
[6] I also think lpcsz may be an abberant abbreviation for Lipizzaner, as in the stallions that would have to drag me to use this wart.
[7] More strictly speaking, the basis is the ISO 646 subset of ASCII, which excludes a handful of ASCII characters.
Bobby Schmidt is a freelance writer, teacher, consultant, and programmer. He is also a member of the ANSI/ISO C standards committee, an alumnus of Microsoft, and an original "associate" of (Dan) Saks & Associates. In other career incarnations, Bobby has been a pool hall operator, radio DJ, private investigator, and astronomer. You may summon him at 3543 167th Ct NE #BB-301, Redmond WA 98052; by phone at (206) 881-6990, or via Internet e-mail as [email protected].