Unicode 3.0.1 Mark Davis . Unicode 3.0 New 3.0 Characters CategoryV 2.1V 3.0 Alphabetics,...

25
Unicode 3.0.1 Mark Davis www.macchiato.com

Transcript of Unicode 3.0.1 Mark Davis . Unicode 3.0 New 3.0 Characters CategoryV 2.1V 3.0 Alphabetics,...

  • Slide 1

Unicode 3.0.1 Mark Davis www.macchiato.com Slide 2 Unicode 3.0 New 3.0 Characters CategoryV 2.1V 3.0 Alphabetics, Symbols6,51110,236 CJK Ideographs21,20427,786 Hangul Syllables11,17211,172 Assigned characters38,88749,194 Unassigned code values18,1347,827 Synced with ISO/IEC 10646, 2 nd edition Slide 3 Unicode 3.0 New 3.0 Blocks 80Syriac 192Thaana 128Sinhala 160Myanmar 384Ethiopic 96Cherokee 640U.C. Ab. Syl. 32Ogham 96Runic 128Khmer 176Mongolian 256Braille 128CJK Rad. Sup. 224Kangxi Rad. 16Ideo. Desc. 32Bopomofo Ext. 6,582CJK Ideo. A 1,168Yi Syllables 64Yi Radicals Slide 4 Unicode 3.0 Property Updates (1) Bidirectional properties Byte order mark Capital letters with iota adscript Case Combining classes Decompositions Slide 5 Unicode 3.0 Property Updates (2) Identifier Syntax Layout controls Linebreak properties East-Asian width properties Misc. Characters: Figure Space, Tilde, Ligature Control Unassigned Code Points Slide 6 Unicode 3.0 Conformance Unicode Transformation Formats UTF-16 BE, UTF-16 LE, UTF-16, UTF-8 Unicode Bidirectional Behavior Other normative character property values Clause numbering maintained! Stability Policies Clarification of noncharacters Normalization Conformance Test Slide 7 Unicode 3.0 Unicode Standard Annexes (UAX) Integral part of 3.0.1 Standard UAX #09: BIDIBIDI UAX #11: East Asian WidthEast Asian Width UAX #13: Newline GuidelinesNewline Guidelines UAX #14: Line BreakingLine Breaking UAX #15: NormalizationNormalization Included in any reference to version 3.0 or later Slide 8 Unicode 3.0 Unicode Technical Standards (UTS) UTS #06: CompressionCompression IANA name: SCSU UTS #10: CollationCollation Note: defined over all Unicode code points Values will be updated soon for better ordering Slide 9 Unicode 3.0 Technical Reports UTR #07: Language TagsLanguage Tags UTR #16: UTF-EBCDICUTF-EBCDIC UTR #17: Character Encoding ModelCharacter Encoding Model UTR #18: Regular ExpressionsRegular Expressions UTR #19: UTF-32UTF-32 UTR #21: Case MappingsCase Mappings Slide 10 Unicode 3.0 Draft Technical Reports UTR #20: Unicode in XMLUnicode in XML UTR #22: Character Mapping TablesCharacter Mapping Tables UTR #24: Script NamesScript Names Open for public comment Slide 11 Unicode 3.0 Unicode Character Database More Documentation, More Data UnicodeDataBlocks ArabicShapingJamo CompositionExclusionsSpecialCasing EastAsianWidthLineBreak UnihanBidiMirroring CaseFoldingNormalizationTest Slide 12 Unicode 3.0 Website changes New Look & Feel New Navigation Enhanced FAQ Glossary What is Unicode? Where is my character? Slide 13 Unicode 3.0 Beyond 3.0 Characters CJK characters, symbols, music systems, ancient scripts, extra characters, etc. First allocated surrogate pairs Properties essential for Unicode enablement Slide 14 Unicode 3.0 Major new version Over 10,000 new characters Enhanced character data for implementations Reorganized text for better reference The version for normalization Unicode Character Database 3.0.0 Available now! Slide 15 Unicode 3.0 Q & A Slide 16 Unicode 3.0 Backup Slides Slide 17 Unicode 3.0 ICU: Paid Advertisement Open Source Unicode Enablement Library ICU: C/C++ and Java Versions IBM Public License Friday, 10:00 Helena Shih http://oss.software.ibm.com/icu Slide 18 Unicode 3.0 Enumerated Versions Unicode 1.0.0, Unicode 1.0.1 Unicode 1.1.0, Unicode 1.1.5 Unicode 2.0.0 Unicode 2.1.2, Unicode 2.1.5, Unicode 2.1.8, Unicode 2.1.9 Unicode 3.0.0 www.unicode.org Slide 19 Unicode 3.0 Editorial Committee Joan Aliprand Julie Allen (editor) Joe Becker Mark Davis Asmus Freytag John Jenkins Mike Ksar Rick McGowan Lisa Moore Ken Whistler Slide 20 Unicode 3.0 New Characters (2) CategoryV 2.1V 3.0 Private Use6,4006,400 Surrogates2,0482,048 Controls6565 Not Characters22 Assigned code values47,40257,709 Unassigned code values18,1347,827 Slide 21 Unicode 3.0 Reference to Versions Open repertoire, but backwards compatible Characters only added, not removed Two early exceptions: ISO sync. & Korean Dont overspecify the version: Version 2.1.0 vs. Version 2.1 vs. Version 2 or later Includes Technical Reports!! Slide 22 Unicode 3.0 Versions of the Standard major - significant additions published as a book minor - character additions or more significant normative changes published as a Technical ReportTechnical Report update - any other changes on the website in /standard/versions/ Example: 2.1.9 Slide 23 Unicode 3.0 Versioning Characters Properties Conformance Technical Reports Unicode Character Database Future Slide 24 Unicode 3.0 Reorganized Text 6: Punctuation 7: European Alphabetics 8: Middle Eastern 9: South Asian 10: East Asian 11: Other (Mongolian, etc.) 12: Symbols 13: Formatting, Controls, Specials Slide 25 Unicode 3.0 Additionally Shift-JIS Index Full Radical Stroke Index CJK split in several blocks Improved Charts Especially for CJK Ideographs Improved Implementation Guidelines General Clarifications