Pages

27 Feb 2007

Localization

#1 L10N overview

Internationalization and Localization are means of adapting products such as publications, hardware and software for non-native environments, especially other nations and cultures.

When you implement i18n, l10n automatically knocks your door to get implemented. Reason is that if you allow users from different countries and cultures to use your software, they will expect that apart from language transformation, real data should also get transformed into localized ways. This expectation is reasonable because it might be possible that different countries are using different standards for displaying dates, currencies, units etc. For example US people won't understand Kilometers because they use Mile as unit, however for Indian people Kilometer is quite familiar unit. The more the cultures, the more the varieties one can see in communication, displaying information etc.

#1.1 Locale and formats

Before implementing l10n we first need to have proper understanding of terms locale and format. Locale represents a whole culture that can contain information about how to display dates, how to show currencies, which measurements units are to be used for conversions etc. For example for Indian locale, it is like below:
  • Date display:DD-MM-YYYY 
  • Currency: 1,11,111
  • Unit to measure distance: Km
While for US, it can be like below:
  • Date display: MM-DD-YYYY
  • Currency: 111,111
  • Unit to measure distance: Mile
Hence locale should be seen in broader way as it represents set of various localized items. But sometimes apart from locale, users prefer more customization in locals, hence there comes the term format which means allowing 1 more level of customization. For example a US user might like to format date from 02-01-2007 to February 1st 2007. In short locale includes language, default format, glyph and other instruction set for particular locale while format is nothing but the different representation of same values. Hence overall a locale can have more than 1 formats.

#2 How to implement it?

While implementing l10n in softwares, software administrator or team members first need to determine that how many locales and formats should be used. Chosen locale and formats can be stored in file system or in database. Once it is decided, it can be implemented at 2 levels.

#2.1 Backend

Each software normally has administrative area from where whole software is managed. This area should be used to select number of formats for respective locales for particular software.

From selected formats, there should be chosen 1 default format for each l10n entity which will be used at client area. This default format is applicable to whole client area of software until it is overridden at user-level.

For certain entities like number format and currency format only 1 format should be set and user-level option may not be allowed. It should also be kept in mind that formats of one locale should not be used in other locale.

For softwares likes FS and Flog where client is registered from administrative area, user-level locales and formats could be selected directly.

#2.2 Frontend

At client area, if user is not given option to set his/her own locales & formats or if it is provided but user is not logged in then locale and formats set as default at administrative area should be used.

#2.2.1 Setting locale

While implementing l10n in PHP based softwares, developers need to set locales first. This locale can be decided upon selection of language. For example if English language is selected by user then locate should be set as en_US, for Finnish language locale should be set as fi_FI. To set locale in PHP, you can use function setlocale(). You can set locale for various categories like to display monetary items, or dates or messages etc. Please refer PHP manual for more details about how to use this function.

There are various PHP functions which behaves depending upon locales. Some of them are strcoll(), strftime(), date() etc.

#2.2.2 Displaying data in various formats based upon locales

Once locale is set for particular language, locale related functions behaves in different ways. For example below code will display day in different language for different locales. You can see that code remains same but information displays in different way.

// Displays “Wednesday” for English language.
setlocale(LC_TIME,'C');
echo strftime('%A');

// Displays “keskiviikko” for Finnish language.
setlocale(LC_TIME,'fi_FI');
echo strftime('%A');

// Displays “mercredi” for French language.
setlocale(LC_TIME,'fr_FR');
echo strftime('%A');

// Displays “Mittwoch” for German language.
setlocale(LC_TIME,'de_DE');
echo strftime('%A');

// Displays “बधवार” for Hindi language.
setlocale(LC_TIME,'hi_IN');
echo strftime('%A');

Similarly locales can be set for entities like currency, number format etc. To set locale for all entities, constant LC_ALL should be used.

At code level there might be problems during implementing different formats because for different locales default formats can be different. Hence above code doesn't actually serve our purpose. See example below.

// Displays 'Friday December 22 1978' in English.
setlocale(LC_ALL, 'en_US');
echo strftime('%A %B %d %Y', mktime(0, 0, 0, 12, 22, 1978))."\n";

// Displays 'perjantai 22 joulukuu 1978' in Finnish.
setlocale(LC_ALL, 'fi_FI');
echo strftime('%A %d %B %Y', mktime(0, 0, 0, 12, 22, 1978))."\n";

// Displays 'vendredi 22 décembre 1978' in French.
setlocale(LC_ALL, 'fr_FR');
echo strftime('%A %d %B %Y', mktime(0, 0, 0, 12, 22, 1978))."\n";

// Displays 'Freitag 22 Dezember 1978' in German.
setlocale(LC_ALL, 'de_DE');
echo strftime('%A %d %B %Y', mktime(0, 0, 0, 12, 22, 1978))."\n";

// Displays '22 दिसमबर शकरवार 1978' in Hindi.
setlocale(LC_ALL, 'hi_IN');
echo strftime('%d %B %A %Y', mktime(0, 0, 0, 12, 22, 1978))."\n";

In this example there are different formats for different locales, hence to make implementation easy at code level, we should store conversion specifier into database/file and using it directly into function. For example for Finnish languages locale, conversion specifier %A %d %B %Y would stored as string and should be used directly into function like above. Similarly this type of conversion specifiers can be used for all formats of all entities.

#3 Limitations of l10n

Native support of l10n in script or database is limited to display information into different format and glyphs only. It doesn't actually convert values according to localization. For example if price of any item is stored in $ currency, then that price, when displays to users who has selected Finnish language (or locale), wont get displayed automatically into his/her own chosen currency (i.e. ). This is because conversion rates between 2 units gets constantly changed.

For such issues, l10n should be implemented in customized way in your software where unit conversion functions can be built and used according to chosen format. However information should be stored in database in only one format and should be formatted only while displaying it to users.

However there is one one exception in displaying date and time, which can be displayed with different values if time zone related functions are used. Normally software logs date and time into it's own locale, but it could be possible that the user who is using that software located in different country where date/time is different than server time. Hence in such cases software should provide option to select timezone so that date/time can be displayed with localized values. Such option is essential for softwares that provides email services.

#4 Links

http://en.wikipedia.org/wiki/l10n
http://www.useit.com/alertbox/9608.html

No comments: