Bilingual Terminology Extraction

2
Bilingual terminol- Bilingual terminol- Bilingual terminol- Bilingual terminol- ogy extraction ogy extraction ogy extraction ogy extraction The bilingual terminology extractor is an easy to use tool which allows you to produce your TBX from files fast and comfortably from relevant bilin- gual pairs of translation (TMX file). The automatic terminology extraction pro- duces term pair suggestions, which are weighted with a quality estimation. A very high probability of 1,0 means that the pair of terms corresponds to a transla- tion. Not only a word terms can be ex- tracted, but also multi-word terms. The extraction tool is also the ideal add- on for the Araya translation toolset. It supports the TMX files produced by the Araya XLIFF or TMX editor optimally. In addition the TMX variants and formats of the other manufacturers are supported too. ARAYA RAYA RAYA RAYA B B B BILINGUALE ILINGUALE ILINGUALE ILINGUALE TERMINOLOGY TERMINOLOGY TERMINOLOGY TERMINOLOGY EXTRACTION EXTRACTION EXTRACTION EXTRACTION CONTACT ONTACT ONTACT ONTACT: Heartsome Europe GmbH Heartsome Europe GmbH Heartsome Europe GmbH Heartsome Europe GmbH Friedrichstr. 17 D-90574 Rosstal T: +49 (0) 9127 579001 F: +49 (0) 9127/951178 [email protected] www.heartsome.de FN 9098 Amtsgericht Fürth UID: DE225881142 Management Management Management Management Dr. Klemens Waldhör Dr. Klemens Waldhör Dr. Klemens Waldhör Dr. Klemens Waldhör Managing director [email protected] Highlights Highlights Highlights Highlights Simple to use editor interface. Each suggested pair of terms is weighted with a quality measure (quality criterion). Colored marking of the individual pairs of terms depending on quality. Export into different formats possible, e.g. TBX, csv. Pairs of terms can be marked as valid and only the validated entries can be exported optionally. Well-known terminology can be ex- cluded from the extraction. TMX support Different parameters to control the extraction like frequency of the terms, number of translations which can be extracted, number of words for a term, a upper / lower case. TMX stands for translation memory exchange and is a provider independent, open standard for storing and exchanging of translation memories in XML, provided by CAT tools (computer aided translation). TMX supports the exchange of translation memory data between programs and/or translators without losing data in this process. TMX was developed on initiative of the OSCAR (Open Standard for Container/Content Allowing re- use) committee - an interest group of in LISA (Localization In- dustry Standard Association). About Heartsome Europe About Heartsome Europe About Heartsome Europe About Heartsome Europe Heartsome Europe GmbH was founded in 2002. Founder and director, Dr. Klemens Waldhör, is familiar with translation technology and CAT tools since a long time. His experiences stem from his time in the research labs of TA Triumph Adler and as founder and di- rector of EP Electronic Publish Partners GmbH. Under his guidance the transla- tion support system EURAMIS was de- veloped for the translation service of the European Commission. Later this development was used by Sun Micro- system as SunTrans. Based on this experiences he developed the transla- tion support system Araya The core competence of Heartsome is the customized adoption of Araya ac- cording to customer needs. In an inten- sive consulting phase the customer requirements are determined and opti- mized, Araya configured and integrated into the processes of the customer . Terminology extraction as Terminology extraction as Terminology extraction as Terminology extraction as service service service service Within translation projects it is very im- portant to use and apply consistent termi- nology. This terminology must be main- tained and corrected, enriched and com- pared in particular with new terminology. Our terminology extraction service offers the automatic extraction of bilingual terms (terms) from TMX files, which is based on statistical procedures. The quality of the found translations de- pends obviously on the number of entries in your TMX file, the more entries con- tained, the more and the better results are obtained. You will receive the extraction result in a TBX or csv formatted file, which contains the extracted bilingual terms of your TMX file. If required we can offer different other formats too. The terminology extraction works very fast and in most cases we can provide you with the extracted terms within a day. If necessary we clean the lists of terms, which are already stored you in your ter- minology system. We offer you a fast and simple method to extract your terminology from your trans- lations. By using our service you optimize and accelerate your terminology work, free yourselves from routine tasks and time consuming manual scanning of your translations. Translation MEMORY (TM): translation technology, which re- uses existing translations of segments (sentences, paragraphs or phrases) of previously translated documents using fuzzy search to find matching segments. XLIFF (XML Localization Interchange file format) is an open XML based standard, which was developed to support the ex- change of localization information, in particular for document formats of different manufacturers. XLIFF is based on XML.

Transcript of Bilingual Terminology Extraction

Page 1: Bilingual Terminology Extraction

B i l i ng u a l t e r m i n o l -B i l i ng u a l t e r m i n o l -B i l i ng u a l t e r m i n o l -B i l i ng u a l t e r m i n o l -

o g y e x t r ac t i o no g y e x t r ac t i o no g y e x t r ac t i o no g y e x t r ac t i o n

The bilingual terminology

extractor is an easy to use tool which

allows you to produce your TBX from files

fast and comfortably from relevant bilin-

gual pairs of translation (TMX file). The

automatic terminology extraction pro-

duces term pair suggestions, which are

weighted with a quality estimation. A very

high probability of 1,0 means that the

pair of terms corresponds to a transla-

tion. Not only a word terms can be ex-

tracted, but also multi-word terms.

The extraction tool is also the ideal add-

on for the Araya translation toolset. It

supports the TMX files produced by the

Araya XLIFF or TMX editor optimally. In

addition the TMX variants and formats of

the other manufacturers are supported

too.

AAAARAYARAYARAYARAYA B B B BILINGUALEILINGUALEILINGUALEILINGUALE TERMINOLOGYTERMINOLOGYTERMINOLOGYTERMINOLOGY EXTRACTIONEXTRACTIONEXTRACTIONEXTRACTION

CCCCONTACTONTACTONTACTONTACT::::

H e a r t s o m e E u r o p e Gm b HH e a r t s o m e E u r o p e Gm b HH e a r t s o m e E u r o p e Gm b HH e a r t s o m e E u r o p e Gm b H

Friedrichstr. 17 D-90574 Rosstal T: +49 (0) 9127 579001 F: +49 (0) 9127/951178 [email protected] www.heartsome.de

FN 9098 Amtsgericht Fürth UID: DE225881142

M a n a g em e ntM a n a g em e ntM a n a g em e ntM a n a g em e nt

Dr. Klemens WaldhörDr. Klemens WaldhörDr. Klemens WaldhörDr. Klemens Waldhör Managing director [email protected]

HighlightsHighlightsHighlightsHighlights

� Simple to use editor interface.

� Each suggested pair of terms is

weighted with a quality measure

(quality criterion).

� Colored marking of the individual pairs

of terms depending on quality.

� Export into different formats possible,

e.g. TBX, csv.

� Pairs of terms can be marked as valid

and only the validated entries can be

exported optionally.

� Well-known terminology can be ex-

cluded from the extraction.

� TMX support

� Different parameters to control the

extraction like frequency of the terms,

number of translations which can be

extracted, number of words for a term,

a upper / lower case.

TMX stands for translation memory exchange and is a provider

independent, open standard for storing and exchanging of

translation memories in XML, provided by CAT tools (computer

aided translation). TMX supports the exchange of translation

memory data between programs and/or translators without

losing data in this process. TMX was developed on initiative of

the OSCAR (Open Standard for Container/Content Allowing re-

use) committee - an interest group of in LISA (Localization In-

dustry Standard Association).

A b o ut H e a r t s om e E u r op eA b o ut H e a r t s om e E u r op eA b o ut H e a r t s om e E u r op eA b o ut H e a r t s om e E u r op e

Heartsome Europe GmbH was founded

in 2002. Founder and director, Dr.

Klemens Waldhör, is familiar with

translation technology and CAT tools

since a long time. His experiences stem

from his time in the research labs of TA

Triumph Adler and as founder and di-

rector of EP Electronic Publish Partners

GmbH. Under his guidance the transla-

tion support system EURAMIS was de-

veloped for the translation service of

the European Commission. Later this

development was used by Sun Micro-

system as SunTrans. Based on this

experiences he developed the transla-

tion support system Araya

The core competence of Heartsome is

the customized adoption of Araya ac-

cording to customer needs. In an inten-

sive consulting phase the customer

requirements are determined and opti-

mized, Araya configured and integrated

into the processes of the customer .

T e r m i n o lo g y ex t ra c t i on as T e r m i n o lo g y ex t ra c t i on as T e r m i n o lo g y ex t ra c t i on as T e r m i n o lo g y ex t ra c t i on as

s e r v i c es e r v i c es e r v i c es e r v i c e

Within translation projects it is very im-

portant to use and apply consistent termi-

nology. This terminology must be main-

tained and corrected, enriched and com-

pared in particular with new terminology.

Our terminology extraction service offers

the automatic extraction of bilingual

terms (terms) from TMX files, which is

based on statistical procedures.

The quality of the found translations de-

pends obviously on the number of entries

in your TMX file, the more entries con-

tained, the more and the better results

are obtained.

You will receive the extraction result in a

TBX or csv formatted file, which contains

the extracted bilingual terms of your TMX

file. If required we can offer different

other formats too.

The terminology extraction works very

fast and in most cases we can provide

you with the extracted terms within a day.

If necessary we clean the lists of terms,

which are already stored you in your ter-

minology system.

We offer you a fast and simple method to

extract your terminology from your trans-

lations. By using our service you optimize

and accelerate your terminology work,

free yourselves from routine tasks and

time consuming manual scanning of your

translations.

Translation MEMORY (TM): translation technology, which re-

uses existing translations of segments (sentences, paragraphs

or phrases) of previously translated documents using fuzzy

search to find matching segments.

XLIFF (XML Localization Interchange file format) is an open

XML based standard, which was developed to support the ex-

change of localization information, in particular for document

formats of different manufacturers. XLIFF is based on XML.

Page 2: Bilingual Terminology Extraction

S y s t em r e qu i r e m en t sS y s t em r e qu i r e m en t sS y s t em r e qu i r e m en t sS y s t em r e qu i r e m en t s

• Java™ based application.

• Software– requirements: Java >= 1.5.

• Operating systems: Windows™ | Linux

|Solaris™ | Mac™.

Solaris, Java and all Java-based labels are trademarks or registered trademarks of Sun Microsystems, Inc. in the US, other states or in both. UNIX is a registered trademark of Open Group in the US and other states. Windows, WinWord are a registered trademarks of Microsoft. Mac is a registered trademark of Apple Computer, Inc. Oracle is a registered trademark of Oracle Corporation. MySQL ist is a registered trademark of MySQL AB. Other company, product or service labels can be trademarks of others too.

A simple table oriented user interface with col-ored entries representing different extraction

qualities. The last column shows if the entry has been validated.

Only very few mouse clicks are needed to .extract terms from the TMX file to retrieve terms and their translations.

P r i c e s an d l i c e nc e sP r i c e s an d l i c e nc e sP r i c e s an d l i c e nc e sP r i c e s an d l i c e nc e s

• Single user license: € 800,- + VAT.

• Multi user licenses: on request.

• Terminology extraction service: on request

Order Form

I hereby order __ Araya Bilingual Extraction Tool for the price

of € 800 + VAT per license (single user licenses).

Company:

Name:

Street:

City:

E-mail:

Signature:

P l e a s e s e n d y o u r o r d er P l e a s e s e n d y o u r o r d er P l e a s e s e n d y o u r o r d er P l e a s e s e n d y o u r o r d er

t o :t o :t o :t o :

• Fax: +49 9127 95 11 78

• or

Heartsome Europe GmbH

Hr. Dr. Klemens Waldhör

Friedrichstr. 17

D-90574 Roßtal