Jared is a researcher at IBM Almaden Research center. He can be reached at [email protected].
If you are writing software for other people to use, whether it's an application or the driving technology behind web-delivered content or services, you can't ignore internationalization. In fact, the word "internationalization" has become so common among developers that it has been abbreviated in a way only they could appreciate i18n (i+18+n=20, the number of letters in the word "internationalization").
Latest estimates taken from a global survey of web usage show the amazingly diverse state of the World Wide Web. Current figures indicate that out of the 400 million people with access to the Internet, less than half are English speakers. Projections indicate that by 2003, only 29 percent of the 800 million people online will speak native English. That's a compelling argument for internationalization.
Fortunately for Java developers, a lot of the grunt work behind the internationalization of an application, applet, or web-delivered service has already been done and is available in the Java SDKs. Unicode support and the introduction of resource bundles demonstrate the commitment of Sun and its partners to provide support to programmers wishing to take their products to the global market.
Understanding Resource Bundles
Resource bundles provide the means for utilizing locale-specific information without having to maintain multiple versions of code for those various locales. They provide an indexing means for retrieving this information through simple lookups. Normally, these lookups involve text presented to the end user. For instance, a resource for greeting users could be found using the lookup key greeting. The bundle would associate phrases such as Hello, Guten Tag, Mabuhay, and so on with this key, and thus the Java code would only have to concern itself solely with one String the key leaving the translation to the ResourceBundle code.
There are two methods for defining resource bundles available in the standard release of the Java SDK, PropertyResourceBundle and ListResourceBundle. The first is a collection of text files stored in a manner parsable by the java.util.Properties class, and the second is a collection of Java class files. Both methods make use of a tree-based naming technique that exactly matches the method of defining locales in Java.
To create a resource bundle in either of these two methods, a base class file must first be created that contains all of the resource keys paired with default translations in some language, usually that most familiar to the development team. The name of a base class file for displaying weather information, for instance, could be called "weather.properties" or "weather.java," depending on the type of resource bundle being worked with. To add more language or locale support, new files are created in the same file directory of the base class file. These files are named similarly to the base class file, but with an underscore and a locale encoding appended before the file extension. The locale encoding is done in typical Java fashion, using ISO standard abbreviations for languages and countries in the form <language>[_<country>[_<variant>]]. Figure 1 is an example of these files in a directory and their corresponding tree representation. Figure 2 shows sample contents of a few of these files.
While the base class file must contain a value for each resource key used in the code, the other files are not similarly restricted. Instead, resource bundles use a fallback mechanism similar to the one used by Cascading Style Sheets (CSS) to find resource values if the key does not exist for a particular locale. For instance, if the locale were set to Canadian English (en_CA) and the resource bundles of Figure 2 were used to look up the translation corresponding to greeting2, the resource bundle would first look for the resource in the en_CA file. Not finding the resource, it would then check the en file, then finally rely on the value found in the base class file. This fallback mechanism adds a considerable amount of flexibility to resource bundles and makes them an attractive and easy-to-implement feature.
Accessing the translations from within Java code is simple. A single class consisting entirely of static methods for specifying locales and retrieving resource translations is all that is needed. Listing One is an example of such a class. Unless otherwise specified, the class uses the locale specified by the virtual machine as the default locale. It is also possible to set the locale using the setLocale() method to reflect a locale specified by users, obtained from a web client, or otherwise determined.
Resources are retrieved using the getTranslation() method. In most cases this method merely provides a string lookup into a hashtable of resources. Thus, calling getTranslation("greeting2") would return the String, "Welcome to the weather report."
Contextual Translations
While this translation method will most likely handle the majority of localization cases, problems may be encountered for text that is content driven. Suppose the greeting must contain the name of the person being greeted. It would be inappropriate to just append or prepend that name to the retrieved String, as different locales may require different behavior.
The java.text package, introduced in Java 1.1 and incorporating many methods designed to aid localization efforts, provides a technique for utilizing contextual, localized text. Resource values stored in the resource bundles are augmented with {#} text, where # represents a consecutive integer unique to the resource. When the resource is called for, the contextual information is substituted for the information contained in the curly brackets.
An example using the figures cited earlier will help demonstrate this technique. Notice that the resource greeting1 has two curly bracketed numbers. The first represents a name and the second represents extra greeting text to be taken from greeting2. The overloaded getTranslation() method can still be used, this time passing not just the resource key but an array of Strings to use in substitution. Hence, the two lines of code:
String array[] = {"Jane",
Resources.getTranslation("greeting2")};
System.out.println
(Resources.getTranslation("greeting1",array));
would produce from the base class:
Hello, Jane. Welcome to the weather report.
Managing Resource Bundles
The simplicity of the text format of resource bundles often acts as a double-edged sword for development teams. While the format is easy to use and understand, it can also be quite difficult to manage for large or even midsized projects.
Consider the example resource bundle files in Figure 1. Suppose the resource bundle made up of these files is being sent to an external company to create files in order to support five more languages. How is this company to determine the context of resources? Is the temperature always expressed in Fahrenheit? What exactly do the lookup terms "{0}" and "{1}" refer to? Is there a constraint on the size of the translated text? None of this information is readily available from the resource bundle, and as a result, translation of the files requires constant interaction between developers and translators, tying up valuable development time.
Translators and developers alike, who have experience working with these files, know that much of the work in developing and maintaining the resource bundle involves repetitive tasks that are often error prone. Typically, the editing of these files is done in a traditional text editor. This type of editing does not allow for checking to make sure resources are properly distributed across the bundle, that the resources are properly defined, or that resource keys are not duplicated in any given file.
Fortunately, software tools that address these concerns are available, and they are free. Sun Microsystems provides a free suite of applications known as the Java Internationalization and Localization Toolkit (available at http://java.sun.com/). Together these tools let you scan through existing code, looking for instances where localization code should be substituted for existing code, and building up a resource bundle as the process proceeds. The tool helps avoid some of the repetitive tasks and errors that would normally creep into resource bundle files and provides a good means of associating resources with existing code. Unfortunately, this tool still does not fully help the translator by providing contextual information about the translation.
Another tool, freely available from IBM, overcomes this problem by associating metadata seen only by developers and translators with each resource. This tool, called "Resource Bundle Manager" or "RBManager" (available at http://www.alphaworks.ibm.com/), provides, through a cross-platform GUI, a means for creating and managing resource bundles that reduces development and translation time. The tool associates developer comments, context of resource usage, creation and modification data, as well as a flag for each resource as to whether it has been translated or not. It also provides a means of grouping resources in a hierarchical structure that is not provided in standard resource bundle files, the ability to import and export to other internationalization formats, and reporting tools for monitoring translation progress.
Detecting Locales
Each Java Virtual Machine defines a set of locales for which it provides native support. In addition to these locales, developers may create their own locale by specifying language, country, and variant encodings. By creating locales not supported natively, you take on the responsibility of defining the locale-specific behavior that will be used by the code. In and of themselves, resource bundles do not require any special behavioral definitions, as resources are looked up merely by the encodings associated with the locale.
Finding the default locale for the virtual machine is simple; see the initBundle() method in Listing One. If a list of all of the virtual machine-supported locales is desired instead, a static method call to Locale.getAvailableLocales() returns an array of supported locales. This method can be useful in allowing users to select a locale, but restricting them to the locales that don't require any extra programming work.
If you are interested in presenting locale-specific content across the Web, the HTTP header value Accept-Language contains a list of the locale encodings that the user has specified for his or her browser. The first locale listed represents the preferred locale setting for the client. Listing Two is an example of defining a locale from an HttpServletRequest.
Internationalization and XML
Of course, a standard exists for defining locale translations in a single XML file. Developed by the LISA organization (http://www.lisa.org/), the TMX specification is an XML format for defining multilingual dictionaries. These dictionaries do not correspond exactly to resource bundles but they do share similar characteristics.
The advantage TMX offers is that resources from all languages supported by a resource bundle can be packaged up into one file and shared with relative ease among other programs. The number of applications currently supporting TMX is small, but growing. Hearty programmers will not find it too difficult to write their own Java class extending the ResourceBundle class that would support TMX files. Alternatively, the RBManager application can be used to import and export resource bundles to TMX format.
In addition to standardized forms for defining resources, XML provides a rich method for communication across both networks and applications. In web development, for instance, a common means for generating pages dynamically is to create an XML document containing dynamic content and then transform that XML document into HTML that can be rendered by a standard web browser. If you're using means similar to this, you'll find internationalization to be an easy ally to work with.
Localization through XML transformations can be accomplished in a number of ways. Arguably the most powerful method is through the XSLT, the XML style-sheet transformation language. Figure 3 shows the process of creating HTML from an arbitrary data source using XML and XSLT transformations.
Some of the dynamic information stored in the XML may need to be localized. Because the server does the XML generation, the localization can be done in exactly the same manner as it would be done in an application using Java method calls. Some of the displayed HTML, however, may need to be defined in the XSL files, which are not dynamically generated and, like source code, should be independent of the locale.
Fortunately, many XSLT implementations allow for method callbacks into the original Java code. An example of such an XSLT implementation is the Xalan, a tool available for free from the Apache project (http://www.apache.org/). Listing Three is an example of how callbacks would be used from an XSL file meant to be converted using Xalan to provide localized HTML generated from XML. To fully understand the listing, some familiarization with XSLT is necessary. Notice the namespace declaration at the top of the listing and that namespace's consequent usage throughout.
Conclusion
Internationalization is a subject serious developers can no longer ignore. Fortunately, the tools and libraries to aid development teams are readily available and convenient to use. A bit of forethought and planning can result in global projects with far deeper scope and little extra effort than their limited counterparts.
DDJ
Listing One
package com.myCompany.myProject; import java.io.*; import java.util.*; import java.text.MessageFormat; public class Translator { // The Resource Bundle super class private static ResourceBundle resource_bundle; // The current Locale from which to access translations private static Locale locale; // Initializes the values of the resource bundle and locale, by default // the locale is defined by the virtual machine. The resource bundle is // looked up by using a file name, in this case looking in the // 'Resources'directory using the base class name 'weather' private static void initBundle() { try { if (Translator.locale == null) Translator.locale = Locale.getDefault(); Translator.resource_bundle = ResourceBundle.getBundle("Resources" + File.separator + "sample", locale); } catch(MissingResourceException mre) { } } // Also provided is a method for changing locale from within a program. // The resource bundle location is unchanged. public static void setLocale(Locale locale) { try { Translator.locale = locale; Translator.resource_bundle = ResourceBundle.getBundle("Resources" + File.separator + "sample", locale); } catch (MissingResourceException mre) { } } // A method for retrieving translations from the resource bundle public static String getTranslation(String key) { if (key == null ) return ""; if (Translator.resource_bundle == null) initBundle(); try { return Translator.resource_bundle.getString(key); } catch (Exception e) { return key; } } // Returns contextually specific translations public static String getTranslation(String key, String[] lookups) { if (key == null ) return ""; if (resource_bundle == null) initBundle(); try { String retStr = resource_bundle.getString(key); return MessageFormat.format(retStr, lookups); } catch (Exception e) { return key; } } }
Listing Two
Public java.util.Locale getLocaleFromBrowser(javax.servlet.http.HttpServletRequest request) { // Define the default locale Locale DEFAULT_LOCALE = Locale.getLocale(); // Error Check if (request == null) return DEFAULT_LOCALE; // Retrieve the 'Accept-Language' HTTP header item String str_loc = request.getHeader("Accept-Language"); if (str_loc == null) retrun DEFAULT_LOCALE; String new_loc = null; // We are interested in the first language accepted int index = str_loc.indexOf(","); // Put the first locale encoding into new_loc if (index >= 0) { new_loc = str_loc.substring(0, index); } else { // Languages can be separated by either commas or semi-colons index = str_loc.indexOf(";"); if (index >= 0) { new_loc = str_loc.substring(0, index); } else { new_loc = str_loc; } } String language = new String(); String country = new String(); index = new_loc.indexOf("_"); if (index >= 0) { language = new_loc.substring(0, index); country = new_loc.substring(index+1); } else { language = new_loc; } // Create and return the locale return new Locale(language, country); }
Listing Three
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:translator="xalan://com.myCompany.myProject.Translator"> <xsl:output method="html"/> <!-- Precomputed localized text --> <xsl:variable name="greeting" select="translator:getTranslation(string('greeting2'))"/> <xsl:template match="/"> <HTML><BODY> <H1><xsl:value-of select="$greeting"/></H1> </BODY></HTML> </xsl:template> </xsl:stylesheet>