Folt - Open TMS - A presentation for universities
-
Upload
klemens-waldhoer -
Category
Technology
-
view
867 -
download
1
description
Transcript of Folt - Open TMS - A presentation for universities
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör 1
Dr. Klemens Waldhö[email protected]
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör2
Overview
� Open TMS Overview
� Architecture
� Implementation
� Current Status
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör 3
Overview
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör4
Goals
� Development of the OpenSource Translation
Memory system OpenTMS
Three translation memory systems for one and the same process? Software investments that make translation costs shoot through the roof? Exchange formats that put the brakes on productivity? FOLT (Forum Open Language Tools) is concerned with the entire process of producing multilingual documentation. From the creation of the source text to production in foreign languages, we analyze our processes for weaknesses and a lack of standardisation.
Primary objectives:- Sharing experiences of processes using standard industry software - Sharing experiences of the use of Open Source software - Standardisation of interchange formats -Testing new Open Source technologies and improving existing technologies in the translation market - Public support for non-proprietary software and software development - Publication of aims and results
www.folt.de
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör5
OpenTMS Requirements
� Software� Web based application� Server / Client Architecture� Thin client� No installation� No proprietary run time components� Preferred open source software� Modular software approach
� OS independent operating system� Windows, Linux, Mac …
� Standard hardware � Interfaces
� Integration into CMS� Workflow management should be supported
� Open source database� Basically all SQL da-tabases should be supported
� Scalability� Single and multi user requirement
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör 6
Architecture
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör7
TranslationMemory
Converter
Back Converter
MachineTranslation
OpenTMSEditor
Segmenter
TerminologyTranslation
XLIFF
Example Work Flow� Seamless integration of different tools in the translation / localisation workflow
CMS1.
2.
3.
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör8
Architecture based on Standards
� XLIFF
� TMX
� TBX
� SRX
� …
In general XML
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör9
Application Model
UserModel
DataModel
DocumentModel
ProcessModel
Security Model
GUI Model Interface Model
OpenTMS Core Library
OpenTMS System Architecture
For details see Waldhör, K. (2008). OPENTMS SOFTWARE ARCHITECTURE.
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör10
OpenTMS primitiveprocedure
OpenTMS Process
OpenTMS Network Process
OpenTMS corelibrary
Software Structure
� Hierarchy of functions and processes
� Common functions / methods stored in a core library
� Method calls should be transparent� Running on server or user machine
� Scripting language
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör11
Modelling Language
Monolingual Object Multilingual Object
General Linguistic Object
N:1
inherits
Data Source
Terminology
Translation Memory
mapping
Linguistic Property N:1
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör12
PreTranslation
MemoryConverter Back
Converter
OpenTMSTranslation
EditorSegmenter
InteractiveTerminologyTranslation
InteractiveTranslation
Memory
Data SourceHuman Initiated Interactions
OpenTMS Initiated Interactions
OpenTMS Processes
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör 13
Implementation
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör14
Programming Language et al
� Java� Java Coding Standards
� Java Documentation Standard
� Delivered as jar files
� Eclipse
� Data Sources� SQL DB: Hibernate based
� Documentation UML� Generated ESS Model
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör15
Data Sources
� Language related data are represented as “data sources”� Idea
� Make the data access interface independent from the data itself� Not being restricted to SQL databases only
� Also flat data or xml files� TMX, XLIFF files as a data source� …
� Machine translation (MT) as data source� Spread sheets
� E.g. Excel as terminology lists� Object Oriented Databases� DMS systems� “Web Sites” (http based interfaces)
� Define a common interface for all access functions� Allows adaption to individual data source properties
� e.g. read only data sources like MT
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör16
OPENTMS
SOFTWARE
OpenTMS Data
SourceLayer
Data type specific access
functions
Maps the OpenTMSaccess functions to the
specific data component
Access to data sources through
standardised interface
Various data components like files
etc.
Data Sources
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör17
Core Data Model Status
� Data Source methods defined� Are extended depending on needs and requirements
� SQL� Access optimisation� Hibernate based� First version finished
� Other OpenSource databases…� OODBS
� DB4O partially implemented for testing purposes� Other data sources
� TMX files� XLIFF files� MT
� Google & Microsoft Translator
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör18
Data Source Core Functions
� Data Sources
� Create
� Delete
� Import TMX, XLIFF File
� Export TMX, XLIFF File
� Copy between data sources
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör19
Fuzzy Search – Core Function of TM
� Step 1: Search in KD-TREE� Restricts the number of strings to search
� Finds possible matching strings
� Step 2: Levenshtein Similarity� Compare matches from step 1 now to determine
real similarity
� Step 3: Get source and target MOLs / MUL� Create translation (alt-trans)
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör20
Data Source Configuration
� SQL Data Source contained in hibernate directory
� Existing data sources contained in database directory
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör21
Translation Core Functions
� Convert (to and from XLIFF)� Currently externally done Araya
� Complex document format like WinWord etc. thru Open Office Converters
� Segment� Currently external Araya
� Translate
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör22
Current Data Source Interface
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör 23
Security
Managing Security
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör24
Security Levels
� Level 0� No security procedures are applied, data are transferred as
they are.
� Level 1� The communication channel is secured. It uses standard
secure protocols here.
� Level 2� Encoding for security is done here on data level. Basically
this means that strings are encrypted when the are communicated through a communication channel or are written or retrieved from a database. This also involves encrypted XLIFF files (resp. parts of it).
� Level 4� GUI level related security
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör25
Security and Files
� Protection of parts of the document� Encrypt specific parts of
the xml documents� Additional security
when transferring files� Even if a file gets in the
wrong hands the file cannot be read.
� Secure XLIFF� Source� Target
� Secure TBX� Secure TMX
� TU…
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör 26
Security
Eclipse
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör27
Eclipse Core Methods
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör28
Eclipse RPC Server & Utility Methods
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör29
Eclipse GUI Methods
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör30
XML-RPC Interface
� openTMX.xml contains access functions
call env.bat
call java -Xmx1024m %OPENTMSJAVABASE%
de.folt.rpc.client.OpenTMSClient
message=TranslateDocument
sourceDocument=%2
sklDocument=%2.skl"
xliffDocument=%2.xlf"
segDocument=%2.seg.xlf„
translatedDocument=%2.trans.xlf"
paragraphBasesSegmentation=yes"
segmentBreakOnCrLf=1
dataSourceName=%1
dataSourceMatchQuality=80
sourceDocumentLanguage=de
targetDocumentLanguage=en
sourceDocumentEncoding=UTF-8
targetDocumentEncoding=UTF-8
inputDocumentType=FILE
dataSourceType=sql
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör31
Current Implementation
� openTMS.jar� Contains compiled classes and source code
� arayaserver-opentms.jar� Conversion functions
� Compiled classes
� External.jar� External classes for Araya (parser etc.)
� Hibernate directory � Hibernate jar files
� Database jdbc driver� Database driver jar files
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör32
Integration Araya XLIFF Editor
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör33
Ubuntu VM Distribution
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör34
Data Source Editor
Edit MOL/MOL Properties
Search Functions
Delete & Save Functions
Language Specific Segments
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör35
Downloads
� http://sourceforge.net/projects/open-tms
� Ubuntu Version� Windows Version:
www.heartsome.de/arayatest/opentmsserver.exe� Im Xliff Editor:
www.heartsome.de/arayatest/araya-freeversion.exe
� YourKit Java Profiler for performance measurements
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör36
Possible Contributions
� XML Parser!� Generalise OpenTMS XML interfaces to support any kind
of xml parsers (currently jdom)� Faster XML parser?!
� Logging Packing� Optimised, line numbers, class names
� Exception Handling� Improvement� Localisation approach / String handling
� Test Environment� XLIFF / TMX package improvements
� TBX reader� SRX segmentation
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör37
Possible Contributions Converters
� Document Converters� XML� OpenOffice as central converter for txt, rtf, doc,
xls, ppt…� MIF� …
� Data Model Converter� Trados� Star� Across� …
FOLT Überblick Stand 03.07.2009; Dr. Klemens Waldhör38
Contact
Heartsome Europe GmbHFriedrichstr. 17D-90574 Roßtal
www.heartsome.de
Dr. Klemens Waldhör
T: +49 9127 579001F: +49 9127 951178 [email protected]