Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Going Global (Web Techniques, Sep 2000)


Going Global (Web Techniques, Sep 2000)

Going Global

Hungry for New Markets

By Howard Schwartz

By now it's probably crossed your mind. Maybe your Web site has registered double and triple digit growth in hits every month. Maybe you're taking orders like there's no tomorrow. And maybe you're downloading so many free music tracks to visitors that your servers are straining. But you've probably realized that this growth won't go on forever unless your Web site can speak to visitors who don't speak English.

If globalization hasn't crossed your mind, it will. Today more than 50 percent of Web users are from outside the U.S.—and they have money to spend. IDC predicts that non-U.S. Internet commerce will explode from comprising 26 percent of worldwide spending to 46 percent by 2003. Shortly thereafter spending will overtake that in the U.S., meaning that if your site is available only in English, you'll be effectively ignoring more than half of the market. Of course, lots of people living outside of the U.S. speak English, but countries in which English is the native language represent only 8 percent of the world's population. Forrester found that visitors spend twice as long and are three times more likely to buy from a site with information in their native language. That has relevance for the U.S. market as well, where companies are increasingly seeking new markets.

However, there are many details to consider when you decide to make your Web site, e-business, intranet, or extranet multilingual.

More Than Translation

Creating a Web presence that addresses the needs of users from around the world is complex, but not impossible. It might help to start with a few definitions; the first is globalization. Globalization is the broad term that encompasses everything you need to do to make your e-business, including Web sites, capable of meeting the needs of users in different countries. You'll consider different languages, currencies, logistics, and even specialized support issues. Attention to these elements and others will make the experience seem as if it were designed with each user in mind.

Internationalization (commonly shortened to I18N because of the 18 letters between the I and the N) and localization (similarly, L10N) both fall under globalization. Internationalization refers to the process of reengineering software so it can recognize and process any language. It also involves making changes in the code so the software can understand other potential differences, such as multiple currencies and local date formats. Once a product has been internationalized, it's easy to translate the interface into the target language, with minimal reengineering.

Localization refers to the process of translating and culturally adapting software user interfaces, Web pages, help files, documentation, and other content for one specific language or locale. The concept of a locale is important—Canadian French is distinct from the type of French spoken in Paris, just as Latin American Spanish differs from the language spoken in Madrid. From a software perspective, localization may involve some minimal reengineering, such as supporting local tax rules. In terms of Web content, types of images, colors (in some cultures the cool black background found on so many sites has a sinister connotation), legal claims, and fonts are all fair game for localization.

Internationalization

Before embarking on a globalization project, identify all the back-end systems that interact with your Web site. It will do you little good if your site takes orders in Chinese but the database you're using for fulfillment can't read the characters. This means evaluating all of the application servers, databases, Java applications, content management systems, and tools. Make sure that they're all internationalized and capable of handling different language character sets, currencies, and other potential problems.

For instance, ASCII is a system used to represent the Latin alphabet for languages like English. Since ASCII and similar encoding schemes use only 7 or 8 bits, the number of different characters they can represent is limited. A newer encoding system called Unicode uses 16 bits, so it can represent 65,536 characters, sufficient to display character languages such as Japanese and Korean. International software must be able to understand 16-bit encoding systems, as well as other conventions for international computing.

Internationalizing code means ensuring that it's free of hard-coded, locale-specific references. Example 1 is a simple piece of code that needs to be internationalized. The English string "Hello World" is hard-coded into the software. Instead, this string should be externalized in a separate file and then translated. The software must be able to recognize the coding system of the newly translated string and know how to process that language. Also note that the date format is hard-coded as well. In some contexts, this order of month, day, year is inappropriate.

Example 2 provides a solution that's been rewritten to work in Japan, and utilizes a national language support (NLS) library. The new code is customized to the locale: the language, territory, code page in use, and sorting method. Each of these elements can vary by location. Software code that's internationalized lets the user specify a locale at installation. In the case of the Web, the site can either detect the user's language from his or her browser preferences, or from a previously placed cookie, or it can ask the user to select a language. Once the user has designated a proper locale, the software should execute flawlessly with the appropriate language, data, currencies, and collation.

Now, you're certainly not responsible for internationalizing software from third-party vendors. But today nearly every company is a software company by default, having written custom code to tie systems together or developed home grown back-end systems. Due to the complexity of this problem and the lack of in-house experience and resources to internationalize Web applications, many companies turn to an outside specialist. A word of advice, be selective. Only a handful of companies have experience in internationalizing complex n-tier systems like those that drive Internet businesses. Look for brand name or high-profile customers, a history of complex projects, and a company that can also offer localization.

Localization

Now that the back end is ready, what about content? After all, it's the site's content that will tell people what you do and keep them coming back. It's relatively easy to translate content. The real trick is to maintain a multilingual site as you make content changes. In addition, you need to think about a host of other questions. How much of the site should be translated? How much content should be the same across a region? How much should be created locally for a specific audience? These are all critical questions.

Managing a single-language site can be complicated enough. Now imagine the complexity multiplied several times as you support that site in French, German, Spanish, Portuguese, Japanese, and Chinese, to name only some of the more common languages that are appearing on the Web. How will you track those constant updates to the English site and ripple them out into multiple languages quickly and accurately? This content churn means you need an efficient way to translate and localize content on the fly, as well as technology to integrate with your Web site and make changes automatically.

Machine vs. Human Translation

You can approach translation in two ways, and each has its place. You may have visited a Web site such as AltaVista's Babelfish and typed in a foreign phrase or pasted in a non-English email and received an instant translation. It may not have been entirely accurate—in fact, machine translation has been known to produce quite comical errors—but you probably got the gist of what the message was about. And that's what machine translation is best for, quickly getting the gist of foreign language text. You should consider such solutions for realtime translation needs that don't require complete accuracy—message boards, email, and even chat.

If you're talking about most Web content, however, you want every sentence and word to be precise and targeted to a specific country's visitor. It's pointless to spend hours—or weeks or months—coming up with the perfect words for your site only to have them mangled for 92 percent of the planet. Grammatically accurate localized content requires human intervention.

Although technology can't always accurately perform the translation, it can make the process of creating and maintaining localized content easier, faster, and less costly. Repositories of multilingual content, for example, allow you to save and reuse previously localized content so translators don't have to retranslate repeated text. This enables faster turnaround on projects that have a few updates, but leave much of the page the same. At Uniscape, we've seen cases where translators using our ASP-based localization platform have reused 90 percent or more of the content. Since you usually pay for translation by the word, reuse can save you a good deal of money.

Content Management

Once you or your marketing department has decided which regions are top priority, it's a good idea to separate your content into categories. A common way to organize is by global, regional, and local content.

You'll find that there's some global content that will be pushed out to all markets and may not need much localization. This is material that may be translated, but not necessarily adapted for local markets. These are items like logos, trademarked names, company history and mission, and other content that you may want to keep consistent worldwide.

The second category is regional content—content relevant to a particular group of countries or a region, such as product information, certain marketing materials, and the site interface. This content is typically written once and then localized for each market's language and culture.

Finally, there's local content, which is locale specific. As you move forward, you'll have local content that is written from scratch for a particular market, rather than being adapted from other sources. This includes local office details, in-country promotions, management information, and other content written just for that market. Make sure you get a strong handle on who manages this content, how it'll be updated on the site, how much of it is generated, and how often it's refreshed.

You'll need to keep these types of content in mind when you start to localize. Not all content needs to be localized, and there will be some additional locale-specific content that you may never see in English.

Now you're ready to start the actual content localization. As I mentioned, localization encompasses more than just translation. It involves tailoring your message to the tastes of each market, addressing specific market needs in a way that appeals to residents of that country, checking for country-specific legal issues, adapting appearance, and other factors. For this reason it's often best to use in-country localization vendors as opposed to in-house translators or freelancers that live in your home country. In-house translators, even those with perfect fluency in a language, may be out of touch with trends in your target country's market, may not be up on current slang, or may not be aware of the technical jargon used in that country. For example, a translator fluent in Spanish who hasn't lived in Spain for 15 years has missed an entire generation of language. Think of how many new English words are in regular use now that you hadn't even heard of 15 years ago, including nearly every Internet-related word.

But using in-country localization vendors adds a level of complexity to the process. First you must identify what part of the content has changed, and then send it off to translators, Q&A, legal reviewers, cultural and marketing experts, country marketing managers, who may own responsibility for the local site, and other parties who must ensure that the content is accurate and appropriate. Finally, you have to put the new content into your site in the proper place. Managing all of this content and figuring out where it's supposed to go can be a nightmare.

The most common way companies have tried to solve this problem is by using a localization house that farms the work out to freelancers around the world. Here, some trade-offs arise between using in-house translators and a translation house. For one, while a translation house may offer one point of contact for all—or a large part—of your content, it's very difficult to track the project, monitor the quality of work, manage the process, and maintain consistency among translators. In addition, the process may not be efficient enough to keep up with the constant need for new content. It's also difficult to take advantage of translation memory, or past translation work, since the multiple in-country localization vendors may use different software tools. Because the localization house may not use the same translators for each job, it's difficult to ensure consistent translation of key concepts and words into the target language.

Using in-house translators can offer the advantage of control and visibility. It's easier to manage your brand and get consistent translation, and you can track the progress of jobs with less effort. But you may lose the benefit of in-country translation, and you'll have to manage the trail of documents yourself.

An alternate solution is an ASP approach that moves the translation repository onto the Web so that vendors around the world can share the same translation memory database. This can be combined with workflow and management technologies that route documents to the correct people automatically. A server-side application can then automatically track changes in a Web site, send out the appropriate multilingual content for localization, and insert it back in the site for you. This kind of Web-based approach lets companies centralize control while enabling distributed localization.

No matter which route you choose, consider how you will integrate your multilingual content with your Web site's content management software, such as Interwoven's TeamSite or Vignette V/5 Content Management Server. If you're using an automated technology to help manage the localization process, it should transfer content to and from your content management software seamlessly.

Get Going

The most important thing is to get started. Begin by evaluating the site's main objectives, then address how globalizing the site helps meet those objectives. This will help you determine exactly what kind of site you want to present to international visitors. The next thing to consider is the extent of the changes necessary to globalize your site. Is it an e-commerce site? Does the back end need to be internationalized? Do you just need to translate text? Are you asking people to register in their native languages? Can your database support this?

There are also many cultural and logistic issues to consider, such as appropriate colors, country-specific measurements, the preferred methods of payment in your target countries, and arranging to obtain local domains (for example, www.your.company.co.jp). You'll have to address these for each market you're targeting before you design the site.

Finally, set up a regular process that can manage the changes in your local site and ensure that the multilingual sites can be updated at the same speed. Partner with a company that has the technology and expertise to help guide you through the process, can work at realtime speed, and has the skills to provide quality globalization services—both internationalization and localization, if possible.

(Get the source code for this article here.)


Howard is vice president of marketing at Uniscape, which provides a Web-based solution for creating and maintaining multilingual Web sites. You can reach him at [email protected].


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.