Standardization of Internationalized Domain Name at IETF
description
Transcript of Standardization of Internationalized Domain Name at IETF
Standardization of Internationalized Domain Name
at IETF
24 Jan 2002
Yoshiro YONEYA <[email protected]>
JPNIC
24 Jan 2002 APAN2002 Conference 2
What is IDN?
• Internationalized Domain Name.– Current domain name is represented with
ASCII alpha-numeric and hyphen characters.– IDN is a technical challenge to represent
domain name with not only ASCII but also NON-ASCII characters.
24 Jan 2002 APAN2002 Conference 3
What is Internationalization?
• Framework to extend character repertoire for domain names.
• Need to be a Global Standard not to lose global communication.
• IETF IDN (Internationalized Domain Name) WG is doing the work.
• Some confusion by using the word ‘Multilingualization’.– Character is just one of a component of languages.– Multilingual domain name is a service level’s aspect.
24 Jan 2002 APAN2002 Conference 4
Internationalized Domain Names
华人 .公司 .cn 華人 .商業 .tw
高島屋 . 会社 .jp
삼성 . 회사 .kr 三星 . 회사 .krم. االهرام
viagénie.qc.caקום.ישראל
ที�เอชนิ�ค.พาณิ�ชย์ .ไทีย์
現代 .com ヤフー .comhttp://www.jdna.jp/activities/event/jdn-tutorial/IDNSDK.pdf
24 Jan 2002 APAN2002 Conference 5
Why IDN?
• Increases of the Internet users who are not familiar with English.– Easy to memorize, type in, etc.
• Drastic changes of usage of domain name.– Domain name is now used as not only host
name but also signboard.
• Creates new business opportunities.– Many ventures began services.
24 Jan 2002 APAN2002 Conference 6
Drawback of IDN
• Loses global acceptability at end-user interface.– Hard to type in or display NON-ASCII characte
rs without appropriate I/O devices and / or softwares.
• Cause impact to the operation.– Requires software update and / or additional pr
ocessing.– Deployment issue.
24 Jan 2002 APAN2002 Conference 7
History of IDN WG
• Established on Jan 2000.– Mainly discussion is done on mailing list.
• Had 1st meeting at 47th IETF at Adelaide.– From then, having meeting every IETF.
• Decided WG’s solution at last (52nd) IETF.– IDNA, NAMEPREP and Punycode (formerly k
nown as AMC-ACE-Z).– Waiting for WG last call.
24 Jan 2002 APAN2002 Conference 8
Scope and priority of IDN WG
• Provide standard.– Not to divide the global connectivity and communication
of the Internet.
• Backward compatibility.– Compatibility with current DNS and application protocols
to work with current Internet infrastructure.
• No localization.– Independent from certain regions, countries and / or
languages– Refer to existing universal standards– Common framework essential to internationalization
24 Jan 2002 APAN2002 Conference 9
IDNA(Internationalizing Domain Names In Applications)
draft-ietf-idn-idna-06.txt
• An architecture denotes how to process IDN.– Use Unicode which is upper compatible with ASCII as
a character codeset.– Normalize internal representation of characters which h
as multiple code points such as upper/lower, full-width/half-width and composing characters, into a single representation not to fail matching.
– Represent NON-ASCII characters which inputted or displayed at user interface as an ASCII Compatible Encoding (ACE) string on the Network.
– Those processes be performed in application software.
24 Jan 2002 APAN2002 Conference 10
Important point of IDNA
• Representation at the user interface layer and the network layer is different.– Though the same for ASCII domain names.
• Application solution.– Least impact to the Internet infrastructure.
24 Jan 2002 APAN2002 Conference 11
Image of the IDNA
User
InternalRepresentation
UI
API
Application servers
End system
Application
Local
Int’l
Resolver
DNS servers
NAMEPREPTo/From Unicode
To/From ACE
NAMEPREP
To/From ACE
To/From Unicode
24 Jan 2002 APAN2002 Conference 12
NAMEPREP(Stringprep Profile for Internationalized Host Name
s) draft-ietf-idn-nameprep-07.txt
• Profile for STRINGPREP (Preparation of Internationalized Strings)– draft-hoffman-stringprep-00.txt
• Some scripts such as alphabet have multiple representation for a character.– Domain name is case insensitive.
• Normalization process to unify representation of strings that is the same in meaning or displaying into a single representation.– Case (upper / lower)– Compatible character (full / half width)– Composing character
24 Jan 2002 APAN2002 Conference 13
Important point of NAMEPREP
• Normalize representation of Internationalized domain name string to match correctly.– ‘a’ vs ‘A’– ‘u’+‘¨’ vs ‘ü’– ‘ ア’ vs ‘ ’ア
24 Jan 2002 APAN2002 Conference 14
Processes in NAMEPREP
1. map• Case folding of upper/lower characters
(UTR#21)
2. normalize• Normalize representation of string (UAX#15
NFKC)
3. prohibit• Check out inappropriate character as domain
name.
24 Jan 2002 APAN2002 Conference 15
ACE(ASCII Compatible Encoding)
• Represent NON-ASCII characters by ASCII characters.– Easy to apply current DNS.– Least impact to current applications.
• Decreases maximum characters in each label.– Penalty of using only 5bit to represent 8bit data.– Requires some sort of compression algorithm.
24 Jan 2002 APAN2002 Conference 16
ACE Identifier
• Requires explicit ACE-identifier.– For reverse conversion.– Choice of ACE-ID is political issue.
• ACE-ID itself is ASCII string, so that if any proposal for ACE-ID is raised, it will be registered as ASCII domain name.
• Actually happened at gTLD.
• IANA will assign the ACE-ID.
24 Jan 2002 APAN2002 Conference 17
Criteria of ACE selection
• Simple algorithm.– For ease implementation.– Interoperability.
• Effective compression results for practical IDNs.– To accommodate characters as much as possible.
• bilateral corresponding between encoding and decoding.– To avoid existence of alternative encoded representatio
n for one IDN.– Security consideration.
24 Jan 2002 APAN2002 Conference 18
Comparison of ACE proposals
RACE BQ--3BS6KZZMRKPDBSJQ4EYKIMHTKQGYUZU2CM.JP
Punycode ZQ--ECKWD4C7C777U7MWO4BOV4JIOAU09J.JP
Encoding sample of ‘ 日本語ドメイン名試験 .JP’
Evaluation resultfrom existingJapanese JPdomain names
24 Jan 2002 APAN2002 Conference 19
Punycode draft-ietf-idn-punycode-00.txt
• Selected ACE of IDN WG.• Compression algorithm.
– Extract characters by ascending order of codepoint.– Encode difference of codepoint from previously proces
sed character’s and the position into an integer.– Extract Letters, Digits and Hyphen as bootstring.
• ASCII conversion algorithm.– Introduced new concept named ‘Generalized variable-l
ength integers’.– BASE36 (A-Z, 0-9).
24 Jan 2002 APAN2002 Conference 20
Compression process of Punycode(simplified for understanding)
• “ 文字列例”• Compression.
1. 1:U+6587 2:U+5B57 3:U+5217 4:U+4F8B
2. 4:0x4F8B 3:0x28C 2:0x440 1:0xA30
3. 0x13E30 0xA33 0x1102 0x28C1
sort, diff
To integer(diff*chars+position)
24 Jan 2002 APAN2002 Conference 21
Generalized variable-length integers of Punycode
• 12345 in decimal is represented as 1*10^4+2*10^3+3*10^2+4*10^1+5*10^0
• Digits in all place are 0-9, so components in sequential 12345 cannot distinguish 123 and 45 or 1234 and 5.
• Furthermore, 012345 and 12345 are the same value with different representation.
• GVLI (Generalized variable-length integers) is an idea to solve this problem.
• Defines threshold for each place, and recognize a number below the threshold is delimiter.
• Threshold is an appropriate number smaller than base number.
24 Jan 2002 APAN2002 Conference 22
Encoding process of Punycode (simplified for understanding)
• Assign A-Z0-9 to GVLI.– Assume 36 for base, 10, 18, 25, 25 for thresholds.1. 0x13E30 0xA33 0x1102 0x28C1
2. OIUD3. BS44. CN85. XML
• “ 文字列例” =>“OUIDBS4CN8XML” .– Real Punycode generates “FSQW5D78MBSK”.
24*1+18*26(=1*(36-10))+30*468(=26*(36-18))+13*5148(=468*(36-25))
11*1+28*26+4*46812*1+23*26+8*468
33*1+22*26+21*468
24 Jan 2002 APAN2002 Conference 23
Standardization of IDN is just the start point of utilization
• End users uses IDN with application softwares.– Web, Mail, etc.
• IDNA requires application’s correspondence.• Must define how to deal IDNs in application proto
cols.
Standardization of IDN does not mean ready to use. Just a start point for applications incorporating
new features.
24 Jan 2002 APAN2002 Conference 24
HTTP Request(DNS resolve only)
Web
User
http:// ジェーピーニック .JP/
ZQ--HCKQZ9BZB1CYRB.JP
Web server’s
IP adress
GET http:// ジェーピーニック .JP/ HTTP/1.1Host: ジェーピーニック .JPReferer: http:// ジェーピーニック .JP/
Error!
DNS
24 Jan 2002 APAN2002 Conference 25
HTTP Request(ACE in HTTP header)
Web
User
http:// ジェーピーニック .JP/
ZQ--HCKQZ9BZB1CYRB.JP
Web server’s IP address
GET http://ZQ--HCKQZ9BZB1CYRB.JP/ HTTP/1.1Host: ZQ--HCKQZ9BZB1CYRB.JPReferer: http://ZQ--HCKQZ9BZB1CYRB.JP/
Contents
DNS
24 Jan 2002 APAN2002 Conference 26
References
• IETF IDN WG Web page– http://www.i-d-n.net/
• Unicode Consortium– http://www.unicode.org/
24 Jan 2002 APAN2002 Conference 27
Acknowledgement
• Telecommunications Advancement Organization of Japan (TAO).– JPNIC’s research activity of security investigati
on of IDN is a part of TAO’s research.– http://www.shiba.tao.go.jp/
24 Jan 2002 APAN2002 Conference 28
IDN Compliant clients & implementations
• Mozillahttp://playground.i-dns.net/mozilla/index.html– Plug-in to Mozilla, resolution using RACE
• Operahttp://www.opera.com/– Native, Resolution using RACE
• Internet Explorer 5 or higherhttp://www.microsoft.com/windows/ie/default.asp– Uses keyword search engine as RACE converter
• mDNkithttp://www.nic.ad.jp/jp/research/idn/mdnkit/download/– Opensource toolkit for developing IDN compliant softwares