L10N Standards Warszawa 2014

Post on 18-Dec-2015

219 views 3 download

Transcript of L10N Standards Warszawa 2014

L10N Standards

Warszawa 2014

http://maturebabespics.com/http://maturebabespics.com/

Why Standards?

Why have Standards?

L10N Standards

What are we going to cover:

1. Why L10N standards are important2. The role XML has to play3. Key L10N standards data standards4. How to leverage L10N standards5. Creating a totally data driven automated L10N process6. Interoperability

Why have Standards?

Current State of Art

L10N Typical Workflow

What you need is a better crane!???

Localization without Standards

Customer

source text

source text extract extracted text tm

process

prepared text

translatetranslated text

target texttarget text

merge target text

QA

True Cost of Translation

Standards = Uniform Data

ISO Standard

Standards = Efficiency

Standards = Lower Costs

Standards = Safe to Implement

Standards = Greater Interoperability

Standards: Unforeseen Benefits

Standards: Unforeseen Benefits

Standards: Misuse

imap://azydron%40xml-intl%40xml-intl%2Ecom@xml-intl.com:143/fetch%3EUID%3E.INBOX%3E87222?part=1.2&filename=image003.jpg

Standards: Abuse

Standards: Sabotage

• Sabotaged Standards:• Proprietary extensions• Bad implementations

The importance of XML

Everything is now XML• HTML/XHTML• Web Services• Adobe FrameMaker• Microsoft Office• Open Office• ASP• XAML• Java Properties• DITA• Standards: TMX, XLIFF, SRX, GMX, TBX, xml:tm• OAXAL Open Architecture for XML Authoring and Localization

The power of XML

Any electronic format not in XML can be converted to XML• Frame Maker• RTF• Microsoft Office pre 2007• Quark Express• Windows resource files• Java resources• PO/POT• YAML• Etc.

And then back into the original format

Benefits of XML for L10N

• Separation of form and content• Should make documents easier to translate• There are some critical design decisions• Mistakes can hinder translatability• XML can bootstrap its own localization

The significance of XML

• XML is not just another electronic format• XML is an eXtensible syntax• XML is a formal IT grammar• XML is programmable• XML is can bootstrap its own localization

Benefits of XML for L10N

Why use XML for Localization?• Most localizable documents are now in XML• One input format• Elegant• Uses the latest IT technology• Separation of source and content• One single data bus• Open Standards based• You can use XML assist its own localization• One extraction + TM + SMT engine

Core L10 Standards

• W3C ITS Document Rules

• ETSI LIS SRX

• ETSI LIS xml:tm

• ETSI LIS TMX

• ETSI LIS TBX

• ETSI LIS GMX

• OASIS XLIFF

• W3C/OASIS DITA (XHTML, DocBook, or any XML Vocabulary)

• Linport Interoperability: TIPP XLIFF:doc

ITS

• Internationalization and Localization Tag Set– http://www.w3.org/International/its

• Internationalization Tag Set – Document Rules for a given XML vocabulary:– Inline elements (within text)– Sub flows– Non-translatable– Translatable attributes

• Guidelines for localizing XML documents• Internationalization and Localization Markup Requirements• Version 1.0, 2008• Version 2.0, 2013

• http://www.etsi.org/deliver/etsi_gs/lis/001_099/002/01.04.02_60/gs_lis002v010402p.pdf

• Translation Memory Exchange• Current version 1.4b, 2.0 undergoing review• Allows for the interchange of translation

memories between different vendor systems– No translation vendor lock-in– Free exchange of translation assets

TMX

• First LISA OSCAR Standard– Version 1.1 1998 – Version 1.2 1999– Version 1.3 2001– Version 1.4b 2002

• Moved to ETSI/LIS 2012– Version 2.0 2014?

• Two level of implementation:– Level 1 (Plain Text Only) – Level 2 (Content Markup)

TMX History

http://www.gala-global.org/oscarStandards/srx/srx20.html

• Segmentation Rules Exchange

• Current version 2.0 2008

• How sentences are segmented

• Allows for the exchange of segmentation rules using regular expressions

• Complements TMX standard

• Quoted XLIFF, TMX and xml:tm

SRX

• Unicode Regular expression syntax defined• Meta characters – Unicode regular expressions: "\

X", "\s", "\S" etc.  • Operators – "*", "|", "?", "+" etc.• Defines:

– Language rules: segmentation rules– Map rules: how to apply the segmentation rules

SRXKey Concepts

GMX

http://docbox.etsi.org/ISG/Open/ISGLIS/GMX-V/GMX-V/GMX-V-2.0.html

• Global Information Management Metrics eXchange

• GMX/V Approved LISA OSCAR Standard February 2007

• Tripartite– GMX-V : Volume, published for public comment

– GMX-C : Complexity, initial specification

– GMX-Q : Quality

• Standard for defining a L10N job

• Allows for quantifying job complexity

• GMX/V 2.0 Approved ETSI LIS

– added support for CJK word counts

– overall character count including white space characters

• GIM Metrics eXchange – Volume• Objectives:

– Unambiguous and verifiable definition of word and character counts

– A method of exchanging counts within an XML framework

• Two types of count:– Verifiable, based on electronic documents– Non-verifiable

• Canonical form: XLIFF based• Word boundaries: Unicode TR29• Unicode character encoding• Minimum conformance

– Total Character Count– Total Word Count

GMX-V

XLIFF

http://www.oasis-open.org/committees/xliff• XLIFF – XML Localization Interchange File Format• Current status

– XLIFF 1.1 Committee Specification (31 Oct 2003)– XLIFF 1.2 Approved as an OASIS Standard 2008

• Segmentation support• (X)HTML XLIFF 1.1 Representation Guide PO / POT XLIFF 1.1.

Representation Guide• Java / Windows / .Net Representation Guide

– XLIFF 2.0 currently out for public comment (not backwards compatible)

XLIFF

• Single format for exchanging L10N from disperate sources

• Loss-less• Tool-neutral• Formalized as an XML vocabulary • Can embed skeleton file

XLIFF

xml:tm

http://www.xtm-intl.com/manuals/xml-tm/xml-tm2.0.html

• XML based Text Memory– Radical rethink of how to handle Translation Memory– Donated by XML INTL to LISA OSCAR– OSCAR Standard Feb 2007– Adopted by ETSI LIS, version 2.0 ready for adoption

• Takes the DITA reuse principle down to sentence level– Author Memory– Translation Memory

xml:tm - Namespace

• Namespace is a major feature of XML• Allows the mapping of different ontological entities

onto the same representation• Allows different ways to look at the same data• Namespaces can be made transparent

xml:tm

• XML based text memory• Revolutionary approach to translating XML

documents• First significant advance in translation memory

technology• Uses XML namespace to transparently embed

contextual information• The one ring that binds them all

xml:tm namespace

Example of the use of tm namespace in an XML document:

<document xmlns:tm="urn:xml-Intl-tm" > <tm:tm> <section> <para> <tm:te> <tm:tu> Namespace is very flexible. </tm:tu> <tm:tu> It is very easy to use. </tm:tu> </tm:te> </para>

xml:tm namespace

docdoc

titletitle

sectionsection sectionsection

parapara

tmtm

tete sentencesentence sentencesentencetutu tutu

tete sentencesentence sentencesentencetutu tutu

tete sentencesentence sentencesentencetutu tutu

Source document tm namespace

viewtete texttexttututexttext

tete sentencesentence sentencesentencetutu tutu

parapara texttext

parapara texttext

parapara texttext

parapara texttext

parapara texttext

tete sentencesentence sentencesentencetutu tutu

tete sentencesentence sentencesentencetutu tutu

texttext

Source document view

xml:tm Text Memory

• Author memoryMaintain memory of source textAuthoring statisticsAuthoring tool input

• Translation memoryAutomatic alignmentMaintain perfect link of source and target textReduce translation costs

xml:tm DOM differencing

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Original Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

deleted

tu id=”8”

modified

new

Updated Source Document

DOMDifferencin

g

xml:tm translated documentin Polish

docdoc

titletitle

sectionsection sectionsection

parapara

tmtm

tete zdaniezdanie zdaniezdanietutu tutu

tete zdaniezdanie zdaniezdanietutu tutu

tete zdaniezdanie zdaniezdanietutu tutu

Translated document tm namespace

viewtete tekstteksttututeksttekst

tete zdaniezdanie zdaniezdanietutu tutu

parapara teksttekst

parapara teksttekst

parapara teksttekst

parapara teksttekst

parapara teksttekst

tete zdaniezdanie zdaniezdanietutu tutu

tete zdaniezdanie zdaniezdanietutu tutu

teksttekst

Translated document view

Putting It All Together

• Open Architecture for XML Authoring and Localization (OAXAL)

– http://wiki.oasis-open.org/oaxal/FrontPage

OAXAL 2.0

OAXAL 2.0

OAXAL Benefits

• SOA (Service Oriented Architecture) Open Architecture

• Open Standards - Open APIs

• Easy Exchange

• Modular design

• Interoperability

• Very high level of automation

Interoperability Now!/Linport

Interoperability Now!http://www.interoperability-now.org/• Born out of frustration and necessity• Early 2012• Members

• Bioloom Group• Kilgray• Medtronic• Ontram• Spartan Software• XTM-INTL

• The goal:• True 100% roundtrip interoperability between TMS/CAT tools

• Now part of Linport

Interoperability Now!/Linport

Linporthttp://www.linport.org/• Language INteroperability Portfolio• Created in 2012 by the merging of two initiatives:

• Multilingual Electronic Dossier• The Container Project

• Sponsored:• the European Union DG Translation• JAIMCATT (http://jiamcatt.org/) -

• Joint Inter-Agency Meeting on Computer-Assisted Translation and Terminology

OAXAL in Action

Translating English Soccer Articles into

Arabic 24x7

Translating English Soccer Articles into

Arabic 24x7

Browser-Based Workbench

OAXAL In Action

• Contact details:• Andrzej Zydroń• azydron@xtm-intl.com• http://www.xtm-intl.com