Internationalization and Translatability for Beginners
-
Upload
ultan-obroin -
Category
Technology
-
view
1.781 -
download
0
Transcript of Internationalization and Translatability for Beginners
Internationalization and Translatabilityfor BeginnersAGIS 09, University of Limerick, 21-September-2009
Ultan Ó Broin
© Ultan Ó Broin September 2009
About
• Ultan Ó Broin• Microsoft and Oracle localization
and internationalization• Oracle Applications User Experience• Localization World, Multilingual Web Site,
Internationalization Roundtable Advisory Boards
• Editorial Board Multilingual Magazine• Blogos• Social media use by disabled research
(TCD)• Caveats about this presentation
• Personal perspective and opinion• Not those of Oracle Corporation• Don’t rush out and buy/sell ORCL stock as a result
• Copyright and usage• Share-alike non-attribution non-commercial please • Screenshots and images remain the copyright of respective owners• Products and services may be trademarks of their respective owners• Reproduction for promotional work for non-profit use is fine, but play nice and say where
you got the information and from whom (@ultan)
© Ultan Ó Broin 2009
© Ultan Ó Broin September 2009
Agenda: Internationalization and Translatability
• Definitions• Organization and
process • Internationalization
issues• Character
Processing• International Variables• Translatability
• Tools and environments• What makes sense for the little guy?• Resources
© Oracle Corporation 2009
© Ultan Ó Broin September 2009
Presentation Objectives
• Software and documentation-centric• Learn about internationalization (I18n) process
and responsibilities • Understand core I18n issues• Identify key I18n considerations for content
development• Consider what makes sense for you• Obtain resources for further investigation• Global user experience
© Ultan Ó Broin September 2009
Internationalization Definitions
• “Internationalization is the process of designing a product so that it can be easily localized without the need for redesign… it is the process of designing and implementing a product which is as culturally and technically “neutral” as possible, and which can therefore easily be localized for a specific culture or cultures.”
Localization Industry Standards Association (http://:www.lisa.org, accessed 15 April 2007)
© Ultan Ó Broin September 2009
Internationalization Definitions
• “Internationalization is the process of re-engineering any information product so that it can be easily localized for export to any country in the world. An internationalized information product consists of two components: core information and international variables.”
Nancy Hoft, International Technical Communication, 1995, p. 19
• Core information: Same code used by the same product in different environments
• International variables: political, economic, social, religious, educational, linguistic, technological – the localizable, cultural, user experience elements
© Ultan Ó Broin September 2009
Internationalization Definitions
• “The process of making information technology flexible enough to be used in different cultural and linguistic environments without changing source code.”
• “Allows choice of language and locale (collection of language and cultural preferences).”
• “Internationalized and localized products provide equivalent functionality to users in their own language while observing their cultural conventions.”
Oracle University, Introduction to Product Globalization, 2001
© Ultan Ó Broin September 2009
Internationalization Definitions
• “Internationalization: The process of developing a program core whose feature design and code design don’t make assumptions based on a single language or locale and whose source code base simplifies the creation of different language editions of a program.”
Nadine Kano, Developing International Software, 1995, p. 4
© Ultan Ó Broin September 2009
Forget This
English Is Just Another
Language• Why assume it’s always written in English anyway?
© Ultan Ó Broin September 2009
Questions
• Can you translate any software?
• Should you?
© Ultan Ó Broin 2009
© Ultan Ó Broin September 2009
Internationalization: Why?
• The user experience• Communicate efficiently globally using language, customs,
symbols, conventions• Facilitates cultural adaptation – localization (L10n),
customization • Eliminates cultural bias• Minimizes management• Correct market functionality• Allows for process efficiencies in development and
localization - scalability
© Ultan Ó Broin September 2009
I18n Costs
• Fix once at source during development• Very costly to fix later
© LingoPort / Multilingual Magazine 2009
© Ultan Ó Broin September 2009
Internationalization: Why?
• Rationale• You must localize• Globalization• Competition• Market share• Revenue• Internet • SimShip• Legal requirements• User experience, engagement, communication
© Salesforce.com 2007
© Ultan Ó Broin September 2009
I18n Myths Exploded
• “They all speak English”• “Once it’s in the reader’s language, it’s OK”• “Only the stuff they see needs attention”• “Won’t be translated anyway, so no need”• “Costs too much”• “Fix it later - if we have to”• “Wrote it in Japanese, so it’s fine”• “It’s open source, whatever”• “It’s Java”• “We’re giving it away for free”• “The user can translate it”
© Ultan Ó Broin September 2009
Example
© Oracle Corporation 2009
© Ultan Ó Broin September 2009
Questions
• Global information sharing• Can you afford not to internationalize?• To sell? To communicate? To share information?
• Language matters, but it’s NOT enough.
© Ultan Ó Broin September 2009
Internationalization Process
• Internationalization is a development responsibility• Designed• Core practice• Modularization• Integrate process• Build, test environments• No linguistic expertise required
© Ultan Ó Broin September 2009
Internationalization Standards
• Some standards/guidelines for free • Unicode, XML, Java, HTML and so on• Ken Lunde CJVK Information Processing great overview
• Not enough• Application of standards• Educate developers• Provide tools• Enforce standards, audit• Set priorities
• Determine warnings versus failures• Common sense
© Ultan Ó Broin September 2009
Internationalization Standards Examples
© Oracle Corporation 2009
© Ultan Ó Broin September 2009
Questions
• Which is better: forced or voluntary i18n?• How might you sell i18n to developers?
• No lectures, slogans, linguistics
© Ultan Ó Broin September 2009
Internationalization Issues
• Character processing• International variables• Translatability
© Ultan Ó Broin September 2009
Character Processing
Character Sets
A character set is a collection of letters, numbers, punctuation marks and signs which are needed to support the creation of text in a language or languages
Set of Characters
P o
e
%
“
!K
& #+a
X
C
A
;
}
[bY
o~
5Character Set A
Character Set B
© Oracle Corporation 2009
© Ultan Ó Broin September 2009
Character Processing
Character EncodingEncoding is the process of mapping a character to a bit sequence used to represent the data on the computer
H oe l l !
48 6f656c6c 21 © Oracle Corporation 2009
© Ultan Ó Broin September 2009
Character Processing
Single-Byte Character Sets
English language can be represented by 7 bits (ASCII)
To accommodate Western European languages,an extra bit is required (256 characters)
! # $
Alphabet A Zto , a zto52
Numbers10 0 9toPunctuations % ‘ ( )“ * +
- . /[
? @,^ _ { | } ~\ ] `; < =:
]>
33
© Oracle Corporation 2009
© Ultan Ó Broin September 2009
Character Processing
Multi-Byte Characters
To represent more than 256 characters, more than one byte is used
Sample : Traditional Chinese Big5
fe c9a5 40 ac 21
!
ab a5a2 c5 6f© Oracle Corporation 2009
© Ultan Ó Broin September 2009
Character Processing
• “Native”Character Sets• ISO 8859-1
Windows-1250CP852Shift JISBig5EUC-KR ...
• Different codesupport
• Conflicts• Gaps• Multiple-tier /
differences
© Oracle Corporation 2009
© Ultan Ó Broin September 2009
Character Processing
Unicode• Multilingual character encoding standard• Consistent way of encoding multilingual text data
internationally• Foundation for global software
Arabic
Chinese FrenchEnglish
Jap
anes
e
Ger
ma
n
Unicode
© Oracle Corporation 2009
© Ultan Ó Broin September 2009
Character Processing
Unicode encoding UTF-16 and UTF-8• UTF-16 is a two-byte fixed-length encoding scheme• UTF-8 is variable length encoding scheme
UTF-8Encoding
A
Latin1CharSet
US-ASCIICharSet
Character
41 41
c7N/A
N/A N/A 82
41
e3 81
87c3
UTF-16Encoding
41
c7
30 42
00
00
English Alphabet
French Alphabet
Japanese
© Oracle Corporation 2009
© Ultan Ó Broin September 2009
Character Processing Impact
Impact of Multi-Byte Character Sets• Database column widths specified in bytes but HTML form input fields specified in characters• Committing mismatched data = error Inserted value too large for the column
ID
NAME
NUMBER(3)
VARCHAR2(5)
Table Column Size
Input text field size
aaaaa
© Oracle Corporation 2009
© Ultan Ó Broin September 2009
Character Processing Impact
• Code conversion takes place during transfer of data between tiers that use different encoding methods
• Code conversion required if different character sets exist in a single system
• Code conversion number one issue?
ISO8859-6
ISO8859-6to
UTF8 UTF8to
ISO8859-6 UTF8
Sends to the server
Receives from the server © Oracle Corporation 2009
© Ultan Ó Broin September 2009
Character Processing Impact
© Oracle Corporation 2009
© Ultan Ó Broin September 2009
Character Processing Impact
Use Unicode support for:• Storing• Inserting• Editing• Sorting• Deleting• Searching• Wrapping• Shaping• Rendering• ….
Text field length Table Column
VARCHAR2(5)aa
VARCHAR2(15)aa aa
Solution…Enlarged UTF-8 database column size as three times than original
© Ultan Ó Broin September 2009
What Character Set to Use?
• Ah, Unicode• But…• Which one?• And what about legacy content?• Moving native to Unicode?• Remember data conversion!
© Ultan Ó Broin September 2009
International Variables - Locale
• Numbers• Dates• Currencies• Time and time zone• Address format• Name format• Telephone number format• Statutory compliance• Language• More…
© Ultan Ó Broin September 2009
Dates
• Different date formats for different locales
© Oracle Corporation 2009
© Ultan Ó Broin September 2009
Numbers
Country Format Numbers US 1,234,567.89
Finland 1.234.567,89 Korea 1’234’567,89Germany 1.234.567,89
• Different number formats
© Oracle Corporation 2009
© Ultan Ó Broin September 2009
Currency
• Different currencies worldwide
© Oracle Corporation 2009
© Ultan Ó Broin September 2009
Others • Salutation, Telephone, Address formats• Sort orders (linguistic v logical,…)• Units of Measure (Imperial, Metric)• Calendar (Gregorian, Japanese, Islamic…)• Business, legal rules (HRM, Financials,
Manufacturing)• VAT• Tax• GAAP• Statutory compliance• SSN, PPS, IDs• Sarbanes-Oxley• Data protection, Privacy• …
© Ultan Ó Broin September 2009
And of Course: Language
Source: Wikipedia.org entry on Internationalization and localization, accessed 15 April 2007
© Ultan Ó Broin September 2009
Locale Variables: Solutions
• Don’t hard-code • Store independently (MLS)• Rely on O/S language and country/region settings, ICU, NLS class
libraries to display• Auto-detect, then allow user to select
© Oracle Corporation 2009
© Ultan Ó Broin September 2009
Question
• Any other international variables you can think of?• Reading Writing Direction (BiDi, Vertical)
• HTML DIR=“RTL”• CSS direction• unicode-bidi property
© Ultan Ó Broin September 2009
Translatability
Translatability means that the product can be translated easily to another languages using an efficient, common sense, scalable process
© Ultan Ó Broin September 2009
Translatability
<INPUT name=ExchangeRateTypetype=radio value=USER <% if((currencyRadioButton==null)||currencyRadioButton.equals("")||currencyRadioButton.equals("USER"))out.print("CHECKED"); %>> User ratesspecified in the table below </TD>
Externalization - separating strings from software code makes translation safer and easier
Tokens
Context
MESSAGE_CODE MESSAGE_TEXT LANGUAGE------------ ------------ ---------HELLO Hello ENHELLO JAHELLO Bonjour F
© Oracle Corporation 2009
© Ultan Ó Broin September 2009
Translatability
• Separation of structure and rendering• Ready-made in some cases
• XML• XSLT• CSS
• File formats• Minimize• XLIFF• Standardize where you can
• Context versus Preview• Text formats
<trans-unit id="HcmPayBalTop_60FB2908EE5DCCCAE040D30A68810384V000">
<source>You can view a single balance (the accumulated result of a payroll calculation) and groups of balances. Review balance results to confirm that the payroll run has completed successfully, to verify that a worker has the correct pay and amount of tax deducted, and to check a balance before and after adjusting it.</source>
<note>Product feature: “HRM: Workforce Deployment”</note>
<note>Page title: "Search: Balances By Country"</note>
</trans-unit>
© Ultan Ó Broin September 2009
Ensure quality translation - allow for text expansion:• 200-300% for less than 20 characters in US English• 150% for 21-50 characters• 130% for over 50 characters
• Technologies that allowexpansion
• Context description
Translatability
© Oracle Corporation 2001
© Richard Ishida, 1999, Xerox, Designing International User Interfaces
© Ultan Ó Broin September 2009
TranslatabilityTokens• Replaceable run-time variables• Efficient programming technique• Dangerous to use for translated words and verbs
Correct: This purchase order must be approved.
Incorrect: This purchase order must be &ACTION.
• Use for non-translatables: file or server names, dates, currency amounts, system user names, or numbers
• Shortcut can prove costly in long run
Problems with concatenation• <string1>+<string2>=<string3>
Terminal is operational -> Terminal est operationnelTerminal is not operational -> Terminal n’est pas operationnel
© Oracle Corporation 2009
© Ultan Ó Broin September 2009
Translatability
• Sorting Orders• Don’t hard-code
• Manually Expensive
• O/S, DB collation or generate w/ XSL
• Multilingual files• Separate files
• Bilingual translation memories
• Easier maintenance
• Faster turnaround
• Translation kit structure• Common, then folder for each language
• Reflect storage© Ultan Ó Broin 2009
© Ultan Ó Broin September 2009
Translatability
• Identifiers• Use for leveraging, context security• Scale• Development, storage efficiencies too
<p><!--BOLOC intro1019756-->Traditional performance measurement systems typically do not provide top managers with a comprehensive view of the organization. The Balanced Scorecard is a performance measurement methodology, developed by Kaplan and Norton, that exceeds the typical scope of traditional performance measurement systems. The Balanced Scorecard methodology links the financial goals of an enterprise with the drivers that determine future success.<!--EOLOC intro1019756--></p>
© Ultan Ó Broin 2009
© Ultan Ó Broin September 2009
Translatability
• Source Content • Approved terminology• Glossary and style guide• Write for the intended audience• Care with lang, cultural references,
humor…• Active voice?• Eliminate wordiness (cost,
time, user experience)• Care with symbols,
characters, acronyms, and other “shortcuts”
• Avoid temporary or placeholder text © Oracle Corporation 2009
© Ultan Ó Broin September 2009
Translatability
Cost Control• Common sense• Basic writing
standards• Examples
Solution: Not needed: You must enter the username that you want to log on with.(12 words saved)Better : Save your work and continue. Wordy: Click the Apply button to save your work, and then Click the Continue button. (9 words saved)
© Oracle Corporation 2009
© Ultan Ó Broin 2009
© Ultan Ó Broin September 2009
TranslatabilityGraphics• Nonlocalizable if possible
• If not, externalize text (SVG, XLIFF, and so on)
• Store separately• Single tool for authoring/L10n if possible• Unicode fonts• Allow resizing• Care with images:
• Hands, Body Parts• Flags, Maps• People• Directionality• Color when associated with objects
Sound• Nonlocalizable if possible• Audio - > Recording timing, transitions
© Ultan Ó Broin September 2009
Translatability
import java.util.ListResourceBundle;public class OEXBundle extends ListResourceBundle{ public Object[][] getContents() { return contents; } static final Object[][] contents = { {"EDITDETAILS", "Edit Details"}, {"ADDTOCART", "Add to Cart"},
Images for buttons and labels generated from a translated text file at run-time
© Oracle Corporation 2009
© Ultan Ó Broin September 2009
Questions
• Can you think of other problematic graphics?• Can you see what’s
wrong with this file?What are the solutions?
© Ultan Ó Broin 2009
© Ultan Ó Broin September 2009
Tools and Environments
• Development Tools “Baked-In” I18N• Information Quality authoring (e.g., Acrolinx IQ Suite)
• Enforces terminology and style standards, reuse• Write your own scripts for text processing
• Pseudotranslation tools• Externalization• Hard-coding• Text expansion• Tokens• O/S, run-time exes• Character set support• Translation tool synergies
• Test with translated data, localized O/S and environments
© Ultan Ó Broin September 2009
Globalyzer Tool
© LingoPort, 2005, 2007
© Ultan Ó Broin September 2009
Pseudotranslated Environments
© Oracle Corporation 2009
© Ultan Ó Broin September 2009
For the Little Guy• Don’t be intimidated by the “GILT Industry”• Who can afford to pay for conferences, reports, tools?• Use common sense
• Leverage what’s provided by technology for free• Write well in English (no translation guidelines)• Prioritize (graphics 5%? Don’t sweat it)• Obtain pseudotranslations using Google Translate (AR, F, JA)• Pseudotranslate using your translation tool• Visually inspect on different browsers, platforms• Make your own checklist, write your own tools• Discount usability and I18n testing – black-team• Social media• Engage the community, volunteers• Engineer for user participation and input• Beg, borrow, steal ideas and tools• If it works for you, go for it, but architect for expansion and scale
© Ultan Ó Broin September 2009
Resources• Web
• www.w3c.org, www.xliff.org• www.multilingual.com (plus guides)• www.i18nguy.com• www.globalyzer.com• www.opentag.com
• Social Media• Blogos• LinkedIn (groups)• @r12a, @localization, #agis09, #i18n on Twitter
• Publications• Multilingual Magazine• Lunde, K. 2008. CJKV Information Processing• Hall, B. 2004. Globalization Handbook for the .NET Platform• Savourel, Yves. 2001. XML Internationalization and Localization• Graham, T. 2000. Unicode: A Primer• Apple Computer Inc. 1992. Guide to Macintosh Software Localization
© Ultan Ó Broin September 2009
Summary
• Definitions• Organizational and Process• Internationalization Issues• Tools, Environments• … References
© Ultan Ó Broin September 2009
Contact Information
• Information• [email protected]• http://www.multilingualblog.com• @localization
• Thank You• Questions?