Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Cuisinarts, E-Commerce, and ... Controlled Vocabularies


WebReview.com: Cuisinarts, E-Commerce, and ... Controlled Vocabularies

My friend related this interesting e-commerce experience to me recently.

He wanted to buy a food processor. Being the sensible guy that he is, he wanted quality but didn't want to pay an arm and a leg for it. A used Cuisinart would fit the bill quite nicely. So my pal fired up his browser and went on his happy way to eBay, where he proceeded to enter a search for exactly what he thought he was looking for:

"Cusinart."

Now, my friend is a pretty bright guy. He has a PhD and a host of publications to his credit. He's been a successful entrepreneur and academic. Nowadays he's helping the federal government get up to speed on e-commerce. So misspelling Cuisinart is obviously independent of intelligence—something that could happen to anyone.

It gets better. He submits his search and, voila, eBay has a "Cusinart" available. He bids successfully, and the misspelled "Cusinart" shows up on his doorstep a few days later.

You think that's the end of story. It's not. My friend eventually realizes his mistake and out of curiosity goes back to eBay. This time he searches for the correctly spelled "Cuisinart" and finds lots of them listed—as well as lots of bids pending. More bidders mean higher prices. The upshot is that if my friend had spelled the word correctly in the first place, he'd probably would have spent roughly triple the amount for a "Cuisinart" rather than his bargain-priced "Cusinart."

So what's my point? Well, it's not so much about how to make the most of your eBay experience (besides, there are already whole books on playing eBay to your advantage). Instead, it's just another illustration of how using controlled vocabularies can be the difference between a productive, efficient site and ... well, giving away your old Cuisinart at a third of its worth.

Controlled vocabularies and the guessing game

Whether you realize it or not, you're already familiar with controlled vocabularies. The Library of Congress subject headings and Yahoo's search criteria are a couple of examples. So, as you've probably guessed by now, controlled vocabularies are predetermined sets of terms that fit together to describe a specific domain such as kitchen appliances, nuclear engineering, or dirt biking.

The terms are standardized because language is ambiguous. People use different terms to say the same thing all the time. Or, worse yet, the same terms can mean different things. Sometimes folks just honestly screw up—like my friend did.

By predetermining the terms that make up a controlled vocabulary, and using those terms to describe your site's content, you can minimize the negative effects that variants, synonyms, and various other annoyances can have on your site and its users.

Here's an example. Let's say you're a webmaster at AT&T. Your site describes a huge host of products; one of them is the One Rate plan. There are many pages in your site that deal with One Rate. The problem is that there isn't a standardized spelling for it. So, some pages are about the "One Rate" plan, others describe it as "1 Rate" and on and on. Here are some of the possible references:

  • One Rate Plan
  • OneRate Plan
  • 1 Rate Plan
  • The One Rate Plan

In this case, a patient user might eventually guess the right variant. But, as we all know, patience is a rare commodity on the Web. Without an effective controlled vocabulary strategy, users who enter the wrong term would find nothing. Consequently AT&T would lose a number of new business opportunities. This is where e-commerce degrades into e-guessing.

The guessing gets a lot more difficult with synonyms. Let's say a user visits a financial services site to find information on "income dividends." Would he or she have guessed that much of the content was listed under the following synonymous terms?

  • Dividend income
  • Income returns
  • Investment income

It's unrealistic to expect users to even bother trying to make guesses. Which, if you've invested any amount of time or money into building your web site, is really bad news.

Users who are browsing a site benefit from controlled vocabularies because they will find the information they need in one place under one heading. It also makes your life as a webmaster easier since you don't have to make up a new label for each piece of information you want to add to your site—just choose from the controlled vocabulary.

An even better approach (though more work) is to consider expanding your controlled vocabulary into a thesaurus that includes variant, related, broader, and narrower terms, as well as glossary definitions. That way, a user browsing your site could, for example, look for "investment income" and be directed to content indexed under the standard term, "income dividends." Searching can be similarly improved—the user's query, "1Rate," would be enriched to automatically include "One Rate" and the variants that have already been mapped to "One Rate," terms that the user hadn't considered.

Resources

If you are going to explore using a controlled vocabulary, here are a few resources. Visit the American Society of Indexers web site, which includes a listing of web-based resources. You might even consider joining ASI, or hiring its members. You might also read Peter Morville's Web Architect column on developing a thesaurus where he defines "a controlled vocabulary that leverages synonymous, hierarchical, and associative relationships among terms to help users find the information they need."

Tips

  • Consider using multiple controlled vocabularies to describe the same content. Organize each around a theme like product names, subjects, processes, audience types, and so on. Avoid mixing these themes together because that's mixing apples and oranges, making them confusing for users to understand and more difficult for you to maintain.
  • Balance the desire to use multiple vocabularies with the overhead involved in applying and maintaining each. Just as the domain changes over time, so should the terms used to describe it.
  • Plagiarize (I mean, borrow from) good vocabularies that are already found in similar web sites, not to mention the examples you'll find linked from the ASI site. You might even find some useful candidates in your organization's printed materials.
  • As described above, automatically enriching a search query behind the scenes can improve retrieval. But that doesn't mean it should be transparent to users who might not understand what's happening. Always provide at least a basic explanation of how your site's searching system works.
  • Finally, remember that a good controlled vocabulary not only describes a domain's content, but should also reflect the language of users. If you don't have a good feel for the kinds of words your site's users commonly use, then analyze your search engine's query log.

Junk in, junk out

Looking around the Web and you'll find a wasteland of chaotic vocabularies. It's almost amusing: with all the hype about e-commerce these days (somehow I remember commercial web sites existing long before I ever heard that term), webmasters are going nuts for taxonomies to describe their products and services. But you don't hear too much about what terms should be used to actually populate those taxonomies.

Same thing goes for the corporate portal and the Yahoo-ized intranet. Vendors of portal software, XML-based approaches, and other products espouse metadata as solutions to the challenges of searching and browsing. Again, their solutions are only halfway there. They provide you with descriptive metadata fields, but you'll still need standardized terms to enter into those fields. Otherwise, it's simply another case of junk in, junk out.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.