ARCH-12 Broaden Your Potential Customer Base Using Unicode

50
ARCH-12 Broaden Your Potential Customer Base Using Unicode David Lund Sr. Training Program Manager, Progress

description

ARCH-12 Broaden Your Potential Customer Base Using Unicode. David Lund Sr. Training Program Manager, Progress. Broaden Your Potential Customer Base Using Unicode. Unicode is the best way to support multiple languages A number of recent OpenEdge ™ enhancements facilitate Unicode - PowerPoint PPT Presentation

Transcript of ARCH-12 Broaden Your Potential Customer Base Using Unicode

Page 1: ARCH-12 Broaden Your Potential Customer Base Using Unicode

ARCH-12Broaden Your Potential Customer Base Using Unicode

David LundSr. Training Program Manager, Progress

Page 2: ARCH-12 Broaden Your Potential Customer Base Using Unicode

2 © 2005 Progress Software Corporation ARCH-12, Unicode

Broaden Your Potential Customer Base Using Unicode

Unicode is the best way to support multiple languages

A number of recent OpenEdge™ enhancements facilitate Unicode

OpenEdge tools simplify the task

Page 3: ARCH-12 Broaden Your Potential Customer Base Using Unicode

3 © 2005 Progress Software Corporation ARCH-12, Unicode

Agenda - Implementing Unicode

Essentials

Migrating a Database

Unicode Client

Sorting

Normalization

Other Areas to Consider

Page 4: ARCH-12 Broaden Your Potential Customer Base Using Unicode

4 © 2005 Progress Software Corporation ARCH-12, Unicode

Unicode Essentials

Unicode foundation for creating internationalized and localized applications

Unicode provides a unique number for every character

Lossless round tripping– Mapping from any Unicode coded character

sequence S to a sequence of bytes and back will produce S again

Page 5: ARCH-12 Broaden Your Potential Customer Base Using Unicode

5 © 2005 Progress Software Corporation ARCH-12, Unicode

Unicode Essentials

UTF – Unicode Transformation Format– Algorithm for mapping (encoding) Unicode

scalar value to a unique sequence

– 3 formats (mappings) UTF-8, UTF-16, UTF-32

– Formats vary in how they handle mapping Impacts access, storage, and performance

Page 6: ARCH-12 Broaden Your Potential Customer Base Using Unicode

6 © 2005 Progress Software Corporation ARCH-12, Unicode

Unicode Essentials

Code Page = table that assigns a numeric value– Letters, numbers, punctuation, control codes, etc.

– prolang\list-cp.p lists code pages in convmap Sample Code Page – IBM850 (partial)

– Character ‘2’ is hex 32

20 30 400 0 @1 ! 1 A2 2 B3 # 3 C

Page 7: ARCH-12 Broaden Your Potential Customer Base Using Unicode

7 © 2005 Progress Software Corporation ARCH-12, Unicode

Progress I18N Essentials

Undefined code page– Tells Progress not to do any conversions

when reading or writing data

– For example Sports database uses undefined Can be used with any character set

I18N (Internationalization)

Page 8: ARCH-12 Broaden Your Potential Customer Base Using Unicode

8 © 2005 Progress Software Corporation ARCH-12, Unicode

Progress I18N Essentials

Startup parameters – -cpinternal

Code page used for internal data processing

– -cpstream Code page used for stream files

– Parameter file prolang\UTF\UTF-8.pf

I18N (Internationalization)

-cpinternal utf-8-cpstream utf-8

Page 9: ARCH-12 Broaden Your Potential Customer Base Using Unicode

9 © 2005 Progress Software Corporation ARCH-12, Unicode

Progress I18N Essentials

Performing code page conversions– Progress provides a character set

management facility– Automatically converts data between the

code pages of different data sources and targets

Must be in CONVMAP file

Targets for code page conversion– Memory (-cpinternal)– Streams (-cpstream)– Databases

Page 10: ARCH-12 Broaden Your Potential Customer Base Using Unicode

10 © 2005 Progress Software Corporation ARCH-12, Unicode

Progress I18N Essentials

proutil <dbname> –C CODEPAGE-COMPILER convmap.dat convmap.cp

Referenced code pages must be in CONVMAP

Modifying CONVMAP– Edit convmap.dat

– Compile CONVMAP

– Make convmap.cp available to session Progress installation directory PROCONV environment variable -convmap startup parameter

Page 11: ARCH-12 Broaden Your Potential Customer Base Using Unicode

11 © 2005 Progress Software Corporation ARCH-12, Unicode

Progress I18N Essentials

Converting characters or strings in memory– Specify code page in functions

ASC CHR CODEPAGE-CONVERT

Converting input and output data– Specify code page in statements

INPUT FROM (input source to memory target) OUTPUT TO (memory source to output target)

Page 12: ARCH-12 Broaden Your Potential Customer Base Using Unicode

12 © 2005 Progress Software Corporation ARCH-12, Unicode

Fonts for Unicode

Locating fonts on windows– C:\WINDOWS\Fonts

– Control Panel, select Font icon Unicode fonts may need to be purchased Setting Unicode fonts for Progress

– Progress.ini

– Use ini2reg.exe to place in registry

Page 13: ARCH-12 Broaden Your Potential Customer Base Using Unicode

13 © 2005 Progress Software Corporation ARCH-12, Unicode

System Resources

Page 14: ARCH-12 Broaden Your Potential Customer Base Using Unicode

14 © 2005 Progress Software Corporation ARCH-12, Unicode

Agenda - Implementing Unicode

Essentials

Migrating a Database

Unicode Client

Sorting

Normalization

Other Areas to Consider

Page 15: ARCH-12 Broaden Your Potential Customer Base Using Unicode

15 © 2005 Progress Software Corporation ARCH-12, Unicode

Migrating a Database to Unicode

Two ways to migrate database to Unicode– Dump and Load– Converting the database without doing a

dump and load

Start an OpenEdge session– Use startup parameters

-cpinternal UTF-8 -cpstream UTF-8

Page 16: ARCH-12 Broaden Your Potential Customer Base Using Unicode

16 © 2005 Progress Software Corporation ARCH-12, Unicode

Migrating a Database to UnicodeCautions:

Backup your database Dump definitions and data

– Do not do a binary dump and load Binary data is not converted to the code

page of the database when it is loaded

– Always use Data Admin tool Goes through automatic conversion

Using dump and load 1 of 3

Page 17: ARCH-12 Broaden Your Potential Customer Base Using Unicode

17 © 2005 Progress Software Corporation ARCH-12, Unicode

Migrating a Database to Unicode

Create an empty UTF-8 database– Data Administration tool

Database>Create Database

– Create Database dialog Select radio set to create a copy of some

other database Select an empty database from

prolang/UTF-8– For example empty4.db

Using dump and load 2 of 3

Page 18: ARCH-12 Broaden Your Potential Customer Base Using Unicode

18 © 2005 Progress Software Corporation ARCH-12, Unicode

Migrating a Database to Unicode

Load the Definitions– Load will convert to UTF-8 automatically

Load the Data– Data will be automatically converted to

UTF-8 from the dumped code page when it is loaded

Using dump and load 3 of 3

Page 19: ARCH-12 Broaden Your Potential Customer Base Using Unicode

19 © 2005 Progress Software Corporation ARCH-12, Unicode

Migrating a Database to Unicode

Backup your database Use proutil to convert the database

Load the UTF-8 collation table– prolang/UTF/_tran.df

Assign a word break rules to the database Rebuild the indexes

Converting without a dump and load

proutil <db-name> -C convchar convert UTF-8

proutil <db-name> -C idxbuild

Page 20: ARCH-12 Broaden Your Potential Customer Base Using Unicode

20 © 2005 Progress Software Corporation ARCH-12, Unicode

Agenda - Implementing Unicode

Essentials

Migrating a Database

Unicode Client

Sorting

Normalization

Other Areas to Consider

Page 21: ARCH-12 Broaden Your Potential Customer Base Using Unicode

21 © 2005 Progress Software Corporation ARCH-12, Unicode

Benefits of GUI Unicode Client

Multi-lingual– Able to use data from multiple languages in the

same session Fully enables AppBuilder to build multilingual

UTF-8 applications Easier deployment:

– Lower costs, higher ROI– No need to have different configurations using

specific settings per language Increased competitive advantage

– No (or very few changes) required to existing apps to take advantage of GUI Unicode client

Added in OpenEdge 10.0A release

Page 22: ARCH-12 Broaden Your Potential Customer Base Using Unicode

22 © 2005 Progress Software Corporation ARCH-12, Unicode

Unicode Editor

RichEdit editor in OpenEdge 10– Supports Unicode

Selecting an editor– Modify UseSourceEditor in progress.ini – Default SlickEdit:

UseSourceEditor=yes– For Unicode use RichEdit:

UseSourceEditor=no

Page 23: ARCH-12 Broaden Your Potential Customer Base Using Unicode

23 © 2005 Progress Software Corporation ARCH-12, Unicode

Demonstration

GUI UnicodeClient

MultipleLanguages

Page 24: ARCH-12 Broaden Your Potential Customer Base Using Unicode

24 © 2005 Progress Software Corporation ARCH-12, Unicode

Agenda - Implementing Unicode

Essentials

Migrating a Database

Unicode Client

Sorting

Normalization

Other Areas to Consider

Page 25: ARCH-12 Broaden Your Potential Customer Base Using Unicode

25 © 2005 Progress Software Corporation ARCH-12, Unicode

Linguistic Sorting

Language sensitive collations

– Tailor to expectations of locale Language Country

Easy to use– Functions just like any other

collation for 4GL

The goal …

Page 26: ARCH-12 Broaden Your Potential Customer Base Using Unicode

26 © 2005 Progress Software Corporation ARCH-12, Unicode

Unicode Sorting

OpenEdge 10.0A supports binary sorting Basic collation support Sorts by value in code page Possible to do user defined sorting

OpenEdge 10.0B also supports linguistic sorting– Supports ICU collations

International Components for Unicode OpenEdge does not support multiple

collations in the database

Page 27: ARCH-12 Broaden Your Potential Customer Base Using Unicode

27 © 2005 Progress Software Corporation ARCH-12, Unicode

Binary versus Linguistic Sorting -A Visual

beetcarrotentrytrustzoomécoleçedilla

beetcarrotçedillaécoleentrytrustzoom

Binary Sort Linguistic Sort

English (ICU-en)

Page 28: ARCH-12 Broaden Your Potential Customer Base Using Unicode

28 © 2005 Progress Software Corporation ARCH-12, Unicode

Linguistic Sorting

Progress uses collations for:– -cpcoll session startup

parameter

– Database collation

– Collation of database CLOB column

– Argument to COMPARE function COLLATE option of the BY

phrase

Page 29: ARCH-12 Broaden Your Potential Customer Base Using Unicode

29 © 2005 Progress Software Corporation ARCH-12, Unicode

Linguistic SortingSupported Collations

OpenEdge supports all ICU collations in the icui18n library– Beyond icui18n one additional

collation is supported Japanese Hiragana Quaternary

as case-sensitive

Page 30: ARCH-12 Broaden Your Potential Customer Base Using Unicode

30 © 2005 Progress Software Corporation ARCH-12, Unicode

Linguistic Sorting

4GL Usage - Reference collation by name For example “ICU-fr” for French

Specify using– -cpcoll <table name>

Identifies collation table to use with code page in memory at session startup

<table name> is the collation table in convmap.cp or the name of the ICU collation

– 4GL Statements COMPARE COLLATE

Page 31: ARCH-12 Broaden Your Potential Customer Base Using Unicode

31 © 2005 Progress Software Corporation ARCH-12, Unicode

Linguistic Sorting

/* French collation */DISPLAY “ICU-fr = ” + COMPARE("côte", "<", "coté",

"case-insensitive", "ICU-fr")/* Spanish collation */DISPLAY “ICU-es = ” + COMPARE("côte", "<", "coté",

"case-insensitive", "ICU-es")

ICU-fr = yesICU-es = no

Sort order depends on selected collation

Output of above statements

Page 32: ARCH-12 Broaden Your Potential Customer Base Using Unicode

32 © 2005 Progress Software Corporation ARCH-12, Unicode

Linguistic SortingExamples 1 of 4

Examples– UTF-8 database with “basic” collation– Names: beet, carrot, çedilla, entry, école,

zoom, trust

FOR EACH words WHERE name < “t”:DISPLAY name.

END.

beetcarrotentry

Output result

Page 33: ARCH-12 Broaden Your Potential Customer Base Using Unicode

33 © 2005 Progress Software Corporation ARCH-12, Unicode

Linguistic SortingExamples 2 of 4

FOR EACH words WHERE name >= “t”:DISPLAY name.

END.

trustzoomécoleçedilla

Output result

Page 34: ARCH-12 Broaden Your Potential Customer Base Using Unicode

34 © 2005 Progress Software Corporation ARCH-12, Unicode

Linguistic SortingExamples 3 of 4

FOR EACH words WHERE COMPARE(name < “t”,“case-insensitive”,

“ICU-en”):DISPLAY name.

END.

beetcarrotentryécoleçedilla

Output result

Page 35: ARCH-12 Broaden Your Potential Customer Base Using Unicode

35 © 2005 Progress Software Corporation ARCH-12, Unicode

Linguistic SortingExamples 4 of 4

FOR EACH words WHERE COMPARE(name < “t”,“case-insensitive”,

“ICU-en”) BY COLLATE(name, “case-insensitive”,

“ICU-en”): DISPLAY name.

END.

beetcarrotçedillaécoleentry

Output result

Page 36: ARCH-12 Broaden Your Potential Customer Base Using Unicode

36 © 2005 Progress Software Corporation ARCH-12, Unicode

Agenda - Implementing Unicode

Essentials

Migrating a Database

Unicode Client

Sorting

Normalization

Other Areas to Consider

Page 37: ARCH-12 Broaden Your Potential Customer Base Using Unicode

37 © 2005 Progress Software Corporation ARCH-12, Unicode

Unicode Normalization

Why is this needed?

– Puts in “NCF” format as expected by XML (and other W3C entities)

– Best way to convert from Unicode to other code pages

– Useful when doing tasks such as making comparisons

Page 38: ARCH-12 Broaden Your Potential Customer Base Using Unicode

38 © 2005 Progress Software Corporation ARCH-12, Unicode

Unicode Normalization

Unicode has different ways of expressing the same characters– Base letter plus combining marks (accents)

as two Unicode code points Á = composite (composed)

(U+0041, Latin Capital Letter A) + (U+0301, Combining Acute Accent ´)

– Base letter and accents as one Unicode code point

Á = precomposed (U+00C1, Latin Capital Letter A with Acute)

What is normalization?

Page 39: ARCH-12 Broaden Your Potential Customer Base Using Unicode

39 © 2005 Progress Software Corporation ARCH-12, Unicode

Unicode Normalization

NORMALIZE– 4GL function new in OpenEdge 10.0B

– Returns either CHAR or LONGCHAR Matches the source string CHAR variable must be UTF-8 LONGCHAR variable any form of Unicode

– UTF-8, UTF-16, UTF-32

result-string = NORMALIZE(source-string, normalization-mode)

Page 40: ARCH-12 Broaden Your Potential Customer Base Using Unicode

40 © 2005 Progress Software Corporation ARCH-12, Unicode

Normalization Modes Supported

NFD– Canonical Decomposition

NFC– Canonical Decomposition, followed by Canonical

Composition NFKD

– Compatibility Decomposition NFKC

– Compatibility Decomposition, followed by Canonical Composition

None– No change to source string– Turns off normalization when normalization-mode

is a variable

Page 41: ARCH-12 Broaden Your Potential Customer Base Using Unicode

41 © 2005 Progress Software Corporation ARCH-12, Unicode

Agenda - Implementing Unicode

Essentials

Migrating a Database

Unicode Client

Sorting

Normalization

Other Areas to Consider

Page 42: ARCH-12 Broaden Your Potential Customer Base Using Unicode

42 © 2005 Progress Software Corporation ARCH-12, Unicode

Bidi Support

Bi-directional (bidi)– Behavior of individual widgets and/or the complete

window to go from right to left or left to right Supported

– Fill-in widget Can type right to left of left to right

Not-Supported– Whole frame

Cannot switch labels from left side to right side

Page 43: ARCH-12 Broaden Your Potential Customer Base Using Unicode

43 © 2005 Progress Software Corporation ARCH-12, Unicode

GB18030 Code Page SupportAdded in OpenEdge 10.0B

New Chinese code page Required for all new

software sold in mainland China as of Jan. 1, 2001

Page 44: ARCH-12 Broaden Your Potential Customer Base Using Unicode

44 © 2005 Progress Software Corporation ARCH-12, Unicode

Broaden Your Potential Customer Base Using Unicode

Unicode is the best way to support multiple languages

A number of recent OpenEdge™ enhancements facilitate Unicode

OpenEdge tools simplify the task

In summary

Page 45: ARCH-12 Broaden Your Potential Customer Base Using Unicode

45 © 2005 Progress Software Corporation ARCH-12, Unicode

Documentation

OpenEdge Development– Internationalizing

Applications

Page 46: ARCH-12 Broaden Your Potential Customer Base Using Unicode

46 © 2005 Progress Software Corporation ARCH-12, Unicode

Unicode Resources

Unicode Home page– http://www.unicode.org

– Unicode Standard, Unicode Consortium

International Components for Unicode– http://www-124.ibm.com/icu/docs/– http://www-124.ibm.com/icu/docs/papers/forms_

of_unicode/

Page 47: ARCH-12 Broaden Your Potential Customer Base Using Unicode

47 © 2005 Progress Software Corporation ARCH-12, Unicode

System Resources

Viewing keyboard layoutshttp://www.microsoft.com/globaldev/reference/keyboards.aspx

– Select the language and the keyboard layout is displayed

– Use shift key to toggle to ‘lower/upper’ case characters

– Use MS Internet Explorer to display

Page 48: ARCH-12 Broaden Your Potential Customer Base Using Unicode

48 © 2005 Progress Software Corporation ARCH-12, Unicode

Questions?

Page 49: ARCH-12 Broaden Your Potential Customer Base Using Unicode

49 © 2005 Progress Software Corporation ARCH-12, Unicode

Thank you for your time!

Page 50: ARCH-12 Broaden Your Potential Customer Base Using Unicode

50 © 2005 Progress Software Corporation ARCH-12, Unicode