About internationalization
Access to a Web for All has been a fundamental concern and goal of the World Wide Web Consortium since the beginning, and is a natural requirement for Web-based applications, given that they can be accessed by people around the world. Unfortunately, it is easy to overlook the needs of people in cultures different to your own, or who use different languages or writing systems. If you do, you will build applications and content that, in fact, present barriers for the use of your technology or content by many people around the world.
What is Internationalization?
Translation and localization are NOT what we mean by 'internationalization'. Surprised? Let me explain.
If you internationalize, you design or develop your content, application, specification, and so on, in a way that ensures it will work well for, or can be easily adapted for, users from any culture, region, or language. This is where you address the first set of barriers: not the fact that your user can't read or relate to your product, but the barriers that make it difficult to adapt your product so that they can.
It's essentially a Quality approach: one that sees you taking action early in the development cycle so that you avoid costly and sometimes prohibitive obstacles when it comes to rolling out your product to new marketplaces.
A universal code base
Fundamental to internationalization is ensuring that your product supports text in any writing system of the world. You should ensure that your product is built on the universal character set, Unicode. This means not only the HTML page that you serve to your user, but all the backend databases, content management systems, scripts, and so forth. There are plenty of examples of beautiful user interfaces that handle deftly any language that you need, but that return gobbledegook after the data has been processed behind the scenes.
You'll also want to ensure that it's possible to easily swap in translations of any natural language text that will be read by humans (including error messages, JSON strings, etc.), but also carry metadata about the language and direction of that text. The language metadata is important to get the fonts right, and to allow for support of the different typographic styles used around the world (for things such as line-breaking, text justification, emphasis or other text decorations, text selection and units, etc.)
It's advisable to clearly separate semantics (markup) from styling (CSS), and to avoid hard-coding content that assumes a particular order of text, or a particular set of punctuation marks, etc.
Text direction
Did you know that the most widely used writing system in the world after the Latin script is Arabic? The script is used for many languages, often with variations in the way that vowels are represented, or with slightly different repertoires. But what all these languages have in common is that they are read predominantly from right to left. This also has implications for layout: things such as table columns, spreadsheets, graphs, cascading menus, and even web page layout, are normally mirror-images of content produced in English. So instead of using values like 'left' and 'right' in your style sheet, you should use logical values such as 'start' and 'end': that way, when the direction of a page changes, the mirroring happens automatically and without the need for the translator to mess with your code.
Actually, it's even more complicated than that. Arabic mixes right-to-left and left-to-right text on the same line, and it is important to be able to control the direction of the surrounding context for that to work properly. It's also important to handle data strings in a way that preserves information about their base direction, so that when they are used on the user interface they don't look mangled.
And it's not just Arabic. Right-to-left writing is used for Hebrew, for south Asian languages such as Dhivehi (Maldivian) and Rohingya, and for fast-growing African scripts such as Adlam and N'Ko.
Names, addresses, and such
If you are dealing with HTML forms or creating databases for information such as people's names and addresses, you will need to consider how to handle the many different approaches to formatting data that exist around the world.
In some countries people only use a single name, or write their name using the family then given order. They may have single letter names, or very long names. Street addresses in Japanese go from the general (country or prefecture) to the specific (house location) from top to bottom, and there are plenty of variations on that theme. (In fact, Japanese homes typically don't have house numbers at all.)
You'll need to consider how you'll cope with acquiring and storing this kind of data (and many others, with region-specific approaches). The more you can make your system flexible up-front, the easier a time you'll have when you want to support people in a new locale.
Oh, and by the way, these people don't speak or write in English, and they tend to sort their data in very different ways, so you'll also need to figure out whether that's going to cause a problem for your backstore or back office, and put plans in place to address it as you localize.
Time zones, currencies, dates, etc.
You will usually want to store data internally in one standard form, but display it in ways that look natural to local users. As well as the names and addresses already mentioned, does the person working with your app or content expect to see periods or commas for decimal points? How about the order of day, month, and year, or even which day begins the week in a calendar?
You may also need to support alternative calendars, time zones and daylight savings, in both native plus transliterated forms, etc. Did you know that there are numerous countries around the world that have local calendars, and use them on a regular basis? Birth dates are typically recorded in the Imperial calendar in Japan, and newspapers in Thailand usually carry the date in the Buddhist calendar (the Western year 2022 is 2565 in Thailand). Any app you create needs to be able to adjust information for the appropriate time zone.
If working with monetization, you'll need to consider how to handle users who work with a range of currencies. In addition to deciding how to format and represent monetary data when displayed to the user, you also should consider how to put in place mechanisms to manage diverse currency systems. How will you develop pricing models for different countries, which may have large variations in standard of living? How will you convert subscriptions and payments from one currency to another?
Cultural norms & expectations
You'll also want to do some homework in advance about the cultural preferences and habits of the marketplaces where you want your application to be used, and choose flexible content design technologies and processes so that you can later support others.
For example, symbolism can be culture-specific. The check mark means correct or OK in many countries, but in some countries, such as Japan, it can be used to mean that something is incorrect. Japanese localizers may need to convert check marks to circles (their symbol for 'correct') as part of the localization process.
If you want your product to appeal to users, you'll need to be using content management systems that give you the ability to flex colors, layout, and information structures, as well as introducing local color. But you'll also need to ensure that you are not hard-coding graphics or images that offend or alienate users in another region.
And then there are quite fundamental questions for monetization applications. Is the community one that is familiar with credit card transactions? Does the population you want to reach have access to sufficient bandwidth (or even to the internet at all) when they need to use your application? Do the banking or other systems that your application interacts with support the language of the user? And remember that a large majority of users these days interface with the Web via mobile devices.
And have you taken into account local regulatory and legal considerations in the various territories your application will reach to?
And then localize
The things we have discussed so far all need some attention and preparation while you are planning and building your application. Otherwise, you could be, instead, building barriers for yourself when it comes to the exciting phase where you translate and adapt your product for various local languages and markets.
The localization phase is where you actually adapt for different users. You change the language via translation; you change the graphics and colors, where appropriate; you flick that text direction switch; you make available alternative data collection forms and processes; you write locally-relevant content, and so on.
Internationalization means foreseeing and planning for that phase from the earliest possible moment, so that not only are you ready when the time comes, but you can avoid digging yourself into pit holes that may be costly to get out of later on down the line.
What does the W3C Internationalization Activity do?
The W3C Internationalization (I18n) Activity works with W3C working groups and liaises with other organizations to make it possible to use Web technologies around the world, regardless of language, writing system, or culture.
The work covers three main areas:
- Language enablement. The W3C needs to make sure that the text layout and typographic needs of scripts and languages around the world are built in to technologies such as HTML, CSS, SVG, etc. so that Web pages and eBooks can look and behave as people expect. We encourage experts from around the world to explain requirements and document gaps between what is needed and what is currently supported in browsers and ebook readers. Get more details. See examples of the work we do.
- Developer support. An important part of the mission of the Internationalization Activity at the W3C is to support developers by sharing advice and reviewing documents. The developers in question mostly include specification writers and browser implementers. (Content developers are mostly catered for by the Education & outreach work we do.) Get more details. See examples of the work we do.
- Education & outreach The W3C Internationalization Activity creates materials for content authors and developers that provides advice on how to create Web pages and applications that are internationalized. For guidelines related to spec developers and browser implementers, see Developer Support. Get more details. See examples of the work we do.
The Project Radar provides an overview of projects the Internationalization Working Group is currently working on.
Additional information about the work we are doing and the resources we make available can all be accessed via the Internationalization Activity home page. If you are new to internationalization, you may find our Getting Started page a useful place to begin. See also the list of groups below.
Where can I learn about internationalization?
This site offers a number of resources to help content authors and developers understand internationalization requirements and techniques, and build those into their work.
A good starting point is the Learn to internationalize page. For more general reference, go to the Activity home page and follow links from there.
Groups
Active groups
-
Internationalization Working Group
Overseeing the language enablement work, reviewing specifications, providing internationalization guidance to Working Groups, & creating educational materials for content authors.
Home page
GitHub: w3c/bp-i18n-specdev • w3c/charmod-norm • w3c/i18n-actions • w3c/i18n-activity • w3c/i18n-checker • w3c/i18n-discuss • w3c/i18n-drafts • w3c/i18n-glossary • w3c/i18n-issues • w3c/i18n-request • w3c/i18n-tests • w3c/i18n-translations • w3c/its2req • w3c/localizable-manifests • w3c/ltli • w3c/mlw-metadata-us-impl • w3c/predefined-counter-styles • w3c/string-meta • w3c/string-search • w3c/timezone • w3c/type-samples • w3c/typography • w3c/unicode-xml
Notification list: www-international
Other lists: public-i18n-core • public-i18n-translation
Group-only list: member-i18n-core* -
Internationalization Interest Group
Group membership is based on mailing list participation. Most of the traffic is composed of notifications about changes to GitHub issues, which is where the technical discussions take place. The IG is also the parent group for all the task forces listed below.
Home page
GitHub: w3c/character_phrase_tests • w3c/klreq • w3c/line_paragraph_tests • w3c/text_direction_tests
Notification lists: www-international, public-i18n-its-ig -
African Language Enablement
Identify and address barriers to use of the Web in any African language or script.
Home page
Discussion threads
Currently pending questions: Adlam • N’Ko • Ajami
GitHub: w3c/afrlreq
Notification list (public-i18n-africa): archive • subscribe
Group-only list: public-afrlreq-admin -
Americas Language Enablement
Identify and address barriers to use of the Web for languages of the Americas.
Home page
Discussion threads
Currently pending questions: Cherokee • Inuktitut • Cree • Osage
GitHub: w3c/amlreq
Notification list (public-i18n-americas): archive • subscribe!
-
Arabic Language Enablement
Identify and address barriers to use of the Web for Arabic & Persian languages.
Home page
Discussion threads
Currently pending questions: Arabic • Kashmiri • Persian • Uighur • Urdu
GitHub: w3c/alreq
Notification list (public-i18n-arabic): archive • subscribe!
Group-only list: public-alreq-admin
Related list (public-i18n-bidi): public-i18n-bidi -
Chinese Language Enablement
Identify and address barriers to use of the Web for Simplified & Traditional Chinese.
Home page
Discussion threads
Currently pending questions: Chinese
GitHub: w3c/clreq
Notification list (public-i18n-chinese): archive • subscribe!
Group-only list: public-clreq-admin
-
Ethiopic Language Enablement
Identify and address barriers to use of the Web for Ethiopic-script languages.
Home page
Discussion threads
Currently pending questions: Ethiopic
GitHub: w3c/elreq
Notification list (public-i18n-ethiopic): archive • subscribe!
Group-only list: public-elreq-admin -
European Language Enablement
Identify and address barriers to use of the Web for European languages.
Home page
Discussion threads
Currently pending questions: Dutch • French • Georgian • German • Greek • Hungarian
GitHub: w3c/eurlreq
Notification list (public-i18n-europe): archive • subscribe!
Group-only list: public-eurlreq-admin -
Hebrew Language Enablement
Identify and address barriers to use of the Web for Hebrew.
Home page
Discussion threads
Currently pending questions: Hebrew
GitHub: w3c/hlreq
Notification list (public-i18n-hebrew): archive • subscribe!
Group-only list: public-hlreq-admin
Related list (public-i18n-bidi): public-i18n-bidi -
Indian Language Enablement
Identify and address barriers to use of the Web for languages of India.
Home page
Discussion threads
Currently pending questions: Bengali • Hindi • Gujarati • Punjabi • Tamil
GitHub: w3c/iip
Notification list (public-i18n-indic): archive • subscribe!
Group-only list: public-ilreq-admin -
Japanese Language Enablement
Identify and address barriers to use of the Web for Japanese.
Home page
Discussion threads
Currently pending questions: Japanese
GitHub: w3c/jlreq • w3c/jlreq-d • w3c/simple-ruby • w3c/ruby-t2s-req
Notification list (public-i18n-japanese): archive • subscribe!
Group-only list: public-jlreq-admin -
Mongolian Language Enablement
Identify and address barriers to use of the Web for the Traditional Mongolian script.
Home page
Discussion threads
Currently pending questions: Traditional Mongolian
GitHub: w3c/mlreq
Notification list (public-i18n-mongolian): archive • subscribe!
Group-only list: public-mlreq-admin -
Southeast Asian Language Enablement
Identify and address barriers to use of the Web for SE Asian languages & scripts.
Home page
Discussion threads
Currently pending questions: Balinese • Javanese • Khmer • Lao • Myanmar • Sundanese • Thai
GitHub: w3c/sealreq
Notification list (public-i18n-sea): archive • subscribe!
Group-only list: public-sealreq-admin -
Tibetan Language Enablement
Identify and address barriers to use of the Web for Tibetan.
Home page
Discussion threads
Currently pending questions: Tibetan
GitHub: w3c/tlreq
Notification list (public-i18n-tibetan): archive • subscribe!
Group-only list: public-tlreq-admin
Former groups
-
ITS (Internationalization Tag Set) Interest Group Home page • List: public-i18n-its-ig. The mailing list is still open, but now operates under the Internationalization Interest Group.
-
Japanese Layout Multi-Group Task Force Home page • Lists: public-i18n-cjk, member-japanese-layout-en*, member-japanese-layout-ja*
-
MLW-LT (MultilingualWeb Language Technology) Working Group defined the Internationalization Tag Set (ITS) 2.0. This delivers metadata for web content (mainly HTML5) and "deep Web" content (for example a CMS or XML file from which HTML pages are generated). The metadata facilitates interaction with multilingual technologies and localization processes. They also produced reference implementations. The group was closed on 17 January 2014, having successfully published the Internationalization Tag Set (ITS) 2.0 specification as a Recommendation. the Working Group has started discussing ITS 2.0 best practices topics within the Internationalization Tag Set Interest Group. This is an open forum aiming to generate discussion around future possible work in this area. To participate contribute to the ITS IG wiki and the ITS IG mailing list. [Home page] [Charter]
-
Internationalization GEO Working Group worked to make the internationalization aspects of W3C technology better understood and more widely and consistently used through guidelines, education and outreach. This WG was closed when the work was merged into that of the Internationalization Working Group in 2007. [Home page] [Charter]
Contacts
- Addison Phillips (addisoni18n @ gmail.com), I18n Core Working Group Chair
- Fuqiao Xue (xfq @ w3.org), Activity Lead, Staff Contact for Core Working Group
- Richard Ishida (ishida @ w3.org), Staff Contact for Core Working Group
- Martin Dürst (duerst @ it.aoyama.ac.jp), Interest Group Chair
- Liaisons (member-only link)