Internationalized Domain Name (IDN) is an attractive feature for Internet users, especially non-English speakers. World Wide Web, being the most popular platform on the Internet, naturally becomes the first target for the use of internationalized domain names. While the IETF is working on a standard for IDN, Mozilla is getting ready for it. This paper presents the challenges and design issues facing the implementation of IDN support in Mozilla.
Mozilla is an open-source web browser, designed for standards compliance, performance and portability. It is based on the first developer release of the source code to Netscape Communicator by Netscape Communications Corp. The project is coordinated by The Mozilla Organization.
The need for using characters outside the ASCII repertoire to identify Internet resources is essential. Internationalized Domain Name (IDN) is the natural extension to the existing Domain Name System (DNS) used for addressing hostnames. The IDN working group at the Internet Engineer Task Force has been working on a standard for IDN. After about two years, the IDN working group has agreed on the IDNA proposal, along with companion drafts Nameprep and AMC-ACE-Z. Essentially, IDN hostnames can be used and inter-operate on the Internet by having the applications (instead of the network layer) process it in a manner specified in IDNA. IDNA defines a deterministic process -- ToASCII (part of which includes Nameprep and AMC-ACE-Z) to convert an IDN hostname to a legal DNS name. The reverse operation -- ToUnicode, is also specified in the document.
The diagram above shows that IDNA is observed between the application and the
operating system resolver. An example of encoding an internationalized domain
name using AMC-ACE-Z may well result in the following (example taken from AMC-ACE-Z
draft):
Chinese (simplified):
Code points: u+4ED6 u+4EEC u+4E3A u+4EC0 u+4E48 u+4E0D u+8BF4 u+4E2D u+6587
AMC-Z: ihqwcrb4cv8a8dqg056pqjye
Domain name:
.com
ToASCII result: zq--ihqwcrb4cv8a8dqg056pqjye.com
The "zq--" is an ACE prefix, used to distinguish IDN domain names from normal ASCII domain names. The exact string to be adopted will be announced in the future.
As we know, the WWW and Email are two of the most important applications on the Internet, Mozilla supports both. IDN would help many non-English speakers get onto the Internet by offering domain names in their own languages. So, Mozilla is a natural target for early IDN implementation.
i-DNS.net has been working with the Mozilla team to prepare the Mozilla browser for the imminent IDN standard from the IETF. The goal is to enable Mozilla to support and use IDN as second nature. Currently, Mozilla is among the first popular Internet applications to support IDN. Some important work has been done for the IDN support, but there are many yet unresolved issues. At the time of this writing, Mozilla already supports IDN web browsing but not E-mail and other protocols.
Mozilla is a huge application. In order to incorporate IDN handling logic into Mozilla, it is essential to understand how the relevant parts work together.
The above diagram illustrates how Mozilla loads a URL.
It is worthy to note that this is only a simple use case scenario, there are many sources of URL input such as HTML anchor reference, bookmark, HTTP-redirect, etc.
Mozilla already has extensive support for Unicode, and uses double-byte characters wherever possible. However, some components (e.g. DNS resolver) expects single-byte characters as input and does not provide any mechanism for character set representation. This is a direct consequence of the existing DNS framework which has no built-in internationalization. Because of this, IDNA suggests that IDN be dealt with at a layer higher than the network so as not to disturb the existing DNS infrastructure.
After some analysis of the application architecture, the work to be done falls into three aspects:
The above diagram shows the locations where modifications need to be done for each of the aspects.
Perhaps the most common place in which IDNs appear is the URL. Therefore, it is practical to zoom in on the way URLs are stored. In Mozilla, URL segments (including the hostname) are stored as URL escaped single-byte strings. URLs may be passed as function arguments in two forms, as an object or as a string.
According to [STD13], hostname labels are restricted to contain only Letters, Digits, and Hyphens (LDH). An IDN hostname after the ToASCII must conform to [STD13], making it suitable for transportation through non-IDN aware applications.
Mozilla in its original behavior does not enforce the LDH restriction, and does not do any special processing to the hostname. So, when an IDN URL is encountered, the IDN host name gets passed on to the OS resolver, and is usually rejected immediately. Even if the name gets resolved correctly, it would not behave in accordance to the proposed standards.
[IDN-URI] specifies that an IDN should be represented in URL in UTF-8 encoding,
URL escaped. In order to handle IDN correctly, storage is the first issue to
consider. There are several options for storing the IDN:
1. Convert the hostname to Unicode, and store it.
2. Store both the raw hostname, and the original character encoding.
3. Perform the IDNA ToASCII operation on the hostname, and store the AMC-ACE-Z
version.
4. Store both 1 and 3 above.
Careful contemplation is needed for a decision, since each of these options has its pros and cons. The solution adopted at the time of this writing is a modified option 1. An IDN hostname encountered would be converted from the original character set to UTF-8, and stored in URL escaped form. The reason for this is to minimize changes to the URL object -- a very crucial component of Mozilla.
IDNA suggests that applications should avoid displaying the AMC-ACE-Z encoded hostname to the user. As far as possible, IDN hostnames should be rendered correctly. Places in which IDNs may appear are the URL and status bar.
Before IDN, we have seen no need to render non-ASCII characters on the URL bar. Fortunately, the Mozilla URL bar (text widget) is Unicode-aware. So, in order to display the IDN hostname, the Unicode value must be used, and the user must have the appropriate fonts installed.
Currently, Mozilla can only display the AMC-ACE-Z encoding on the URL bar. The problem is being worked on and a solution will be available soon. The hurdle lies in some architectural decision and other factors.
URLs are a common object in Mozilla used by many components. Some components do not use the hostname, they must not be affected by any changes we make. However, some components' behavior need to be modified in order to properly handle IDNs within the URL. These components include the DNS and network protocols, display code, etc. The IDN component of a URL needs to be treated differently from the others, so there is a special case to be taken into consideration. Generally, IDN is subject to:
Transformation -- converting strings from one character encoding to another. Most of the time, it is useful to transform from a native encoding to a Unicode-based encoding and vice versa. Mozilla is already equipped with that.
IDNA ToASCII -- This is the most important IDN operation. And yet, the IDNA and companion documents being work-in-progress, they are expected to change. This part is not built into Mozilla. See 3.4 for further information.
URL escaping and unescaping -- URLs need to conform to the [URI] syntax. This means that escaping and unescaping need to be done correctly.
The requirement is for Mozilla to handle IDN properly, from the UI to the core. However, for ASCII domain names it should behave exactly as before, with little or no performance degradation. It is also a requirement that IDN resolution be an option for the user.
The first goal was to make IDN resolution work, at the network level. In the first patch,
These modifications have made the use of IDN generally possible. However, since IDN support is a fairly experimental feature, occasionally it breaks and some bugs are reported.
As explained above, an API is defined but not implemented within Mozilla. i-DNS.net has provided a downloadable component library that implements the IDNA functionality, called XPIDN.
The Mozilla IDN Initiative home page <http://playground.i-dns.net/mozilla/>
contains up-to-date instructions. Therefore, readers are advised to consult
the web site before proceeding below.
In order to test the IDN-enabled Mozilla:
The Mozilla IDN Initiative is an ongoing effort to maintain and develop the IDN support in Mozilla. The web site -- http://playground.i-dns.net/mozilla/ serves as an open forum for the discussion of technical design and development issues. Anyone with an interest in the topic is invited to join the group in its mission.
This project has provided many insights to the IDN specifications and the Mozilla application. So far, the modifications to Mozilla has been very careful, minor and made in small steps. Performance and stability are two of the most strictly observed properties when new patches are made. There are many issues yet to be resolved, and it is hoped that the implementation experience could serve as a feedback to the IETF on the effectiveness of the specifications.
There are interests to develop a new layered-approach to solving the IDN problem, which involves inserting a directory layer above the DNS ([DNS-SEARCH]). The Internet Resource Name Search Service (IRNSS) BOF may well be established as an official IETF working group soon enough. This is closely related to IDN, and would therefore be kept in view.
The author would like to thank James Seng from i-DNS.net for his guidance and advice, and Brendan Eich from Mozilla.org, the Netscape team especially Naoki Hotta, Darin Fischer, Frank Tang, Bob Jung for their engineering advice.
[AMC-ACE-Z] Adam Costello, "AMC-ACE-Z version 0.3.1", draft-ietf-idn-amc-ace-z.
[DNS-SEARCH] Klensin, J., "A Search-based access model for the DNS", work in progress, draft-klensin-dns-search.
[IDNA] Paul Hoffman & Patrik Faltstrom, "Internationalizing Host Names in Applications (IDNA)", Internet Draft, draft-ietf-idn-idna.
[IDN-URI] Martin Duerst, "Internationalized Domain Names in URIs and IRIs", draft-ietf-idn-uri.
[IRI] L. Masinter, M. Duerst, "Internationalized Resource Identifiers (IRI)", Internet Draft, November 2001, draft-masinter-url-i18n, work in progress.
[Nameprep] P. Hoffman, M. Blanchet, "Stringprep Profile for Internationalized Host Names", Internet Draft, draft-ietf-idn-nameprep.
[STD13] Paul Mockapetris, "Domain names - concepts and facilities" (RFC 1034) and "Domain names - implementation and specification" (RFC 1035), STD 13, November 1987.
[URI] T. Berners-Lee, R. Fielding, L. Masinter. RFC2396, "Uniform Resource Identifiers (URI): Generic Syntax." August 1998.