Terminology as a Service – a model for collaborative terminology management

32
Terminology as a Service – a model for collaborative terminology management EAFT Terminology Summit Barcelona – 27-28 November 2014 Klaus-Dirk Schmitz Cologne University of Applied Sciences [email protected] Tatiana Gornostay Tilde, Riga [email protected]

Transcript of Terminology as a Service – a model for collaborative terminology management

Terminology as a Service –a model for collaborative

terminology management

EAFT Terminology SummitBarcelona – 27-28 November 2014

Klaus-Dirk SchmitzCologne University of Applied [email protected]

Tatiana GornostayTilde, [email protected]

K.-D. Schmitz, IIM, FH Köln

Interchange or Collaboration ?

K.-D. Schmitz, IIM, FH Köln

Collaborative terminology management

Collaborative: several individuals are involved in the creation of terminological entries

Different terminological competences require well elaborated user profiles with specific rights and views (read/write, only certain languages/datCats, …)

Well defined workflow and quality assurance procedures needed (supported by e.g. QuickTerm)

Metadata (datCats) for normative and workflow status needed (preferred/admitted/deprecated, draft/under discussion/final, …)

K.-D. Schmitz, IIM, FH Köln

Cloud-based terminology management

Since terminology work is “expensive”, why not involve the Crowd to create and validate terminology?

You need a tool for managing terminology in the Cloud!

Examples: Wikipedia (www.wikipedia.org) TermWiki (www.termwiki.com)

Different approach to: web interfaces for TMS (e.g. MultiTerm-Web) web-based TMS (e.g. TermWeb)

K.-D. Schmitz, IIM, FH Köln

The main questions: How can you animate the Crowd?

Hidden business model? Free services? Sharing data? Do you want to have your data in the Cloud?

Can you apply established terminological principles (meta model, datCats, concept-orientation)

How can you ensure correctness? How can you ensure completeness? How can you ensure consistency? How can you ensure reliability?

Cloud-/Crowd-based terminology work

K.-D. Schmitz, IIM, FH Köln

A new approach as an example:

TaaS - Terminology as a Service:

cloud-based platform for acquiring, cleaning up, sharing, and reusing multilingual terminological data

The project has received funding from the European Union Seventh Framework Programme (FP7/2007-2013), grant agreement no 296312.

The TaaS Project

K.-D. Schmitz, IIM, FH Köln

Partners: Tilde Latvia (Coordinator) TAUS The Netherlands Kilgray Hungary Fachhochschule Köln Germany University of Sheffield UK

Time: 1. June 2012 – 31. May 2014Languages: all European + Russian

www.taas-project.eu

The TaaS Project

K.-D. Schmitz, IIM, FH Köln

Automatic extraction of monolingual term candidates from user uploaded documents using state-of-the art terminology extraction techniques

Automatic retrieval of translation equivalents for the extracted terms, in user-defined target language(s) from different public and industry terminology databases

Translation candidate acquisition for terms not found in term banks from parallel web data using state of-the-art terminology extraction and bilingual terminology alignment methods;

Basic Services of TaaS

K.-D. Schmitz, IIM, FH Köln

Facilities for cleaning-up by users automatically acquired terminological data

Data sharing and integration facilities through APIs and export tools for sharing of resulting terminological data with major term banks and usage in different applications

Basic Services of TaaS

K.-D. Schmitz, IIM, FH Köln

TaaS architecture

K.-D. Schmitz, IIM, FH Köln

Go to: https://term.tilde.com

Example: term extraction via TaaS

K.-D. Schmitz, IIM, FH Köln

K.-D. Schmitz, IIM, FH Köln

Go to https://term.tilde.com

Direct search for terms and equivalents

Or log in / sign up for further services

Example: Term extraction via TaaS

K.-D. Schmitz, IIM, FH Köln

Example: Term extraction via TaaS

K.-D. Schmitz, IIM, FH Köln

Gehe zu https://term.tilde.com

Entweder direkte Suche

Oder anmelden / registrieren für weitere Services

Projekt zur Termextraktion anlegen

Text(e) zur Extraktion laden

Beispiel: Termextraktion mit TaaS

K.-D. Schmitz, IIM, FH Köln

Example: Term extraction via TaaS

K.-D. Schmitz, IIM, FH Köln

Gehe zu https://term.tilde.com

Entweder direkte Suche

Oder anmelden / registrieren für weitere Services

Projekt zur Termextraktion anlegen

Text(e) zur Extraktion laden

Extraktionseinstellungen festlegen

Extraktion starten

Beispiel: Termextraktion mit TaaS

K.-D. Schmitz, IIM, FH Köln

K.-D. Schmitz, IIM, FH Köln

Gehe zu https://term.tilde.com

Entweder direkte Suche

Oder anmelden / registrieren für weitere Services

Projekt zur Termextraktion anlegen

Text(e) zur Extraktion laden

Extraktionseinstellungen festlegen

Extraktion starten

Prüfe und ergänze Extraktionsergebnisse

Beispiel: Termextraktion mit TaaS

K.-D. Schmitz, IIM, FH Köln

Gehe zu https://term.tilde.com

Entweder direkte Suche

Oder anmelden / registrieren für weitere Services

Projekt zur Termextraktion anlegen

Text(e) zur Extraktion laden

Extraktionseinstellungen festlegen

Extraktion starten

Prüfe und ergänze Extraktionsergebnisse

Visualisierung

Beispiel: Termextraktion mit TaaS

K.-D. Schmitz, IIM, FH Köln

Example: Term extraction via TaaS

K.-D. Schmitz, IIM, FH Köln

Some evaluation results

Evaluation in April (and June) 2014

4 test documents

Type: online article, white paper, dissertation

Domain: energy, economics, IT, astronomy

Languages: DE-EN, DE-FR, EN-FR

Gold Standard: human term extraction, 7-10 candidates / documentproblem: subjectivity

K.-D. Schmitz, IIM, FH Köln

Gold Standard

Example Astronomy: 36x1 + 26x2 + 63x3 = 277

K.-D. Schmitz, IIM, FH Köln

Calculation of Recall and Precision

Recall:

all found relevant TC / all relevant TC

all relevant TC found?

Precision:

all found relevant TC / all found TC

all found TC relevant?

K.-D. Schmitz, IIM, FH Köln

Test with Kilgray (statistic):

Results of the TaaS evaluation

Test with TWSC and Term Normalizer (linguistic):

K.-D. Schmitz, IIM, FH Köln

Test results and the gold standard

K.-D. Schmitz, IIM, FH Köln

Improvement of TaaS

Second (short) evaluation after the end of the project in June 2014:

K.-D. Schmitz, IIM, FH Köln

Comparison TaaS – human – MT-Extract

T1: Terminologist with the best Recall and Precision values

T4: Terminologist with the worst Recall values

Ü1: Translator with the worst Precision values

MT: MultiTerm Extract (statistical) with different Silence/Noise values

K.-D. Schmitz, IIM, FH Köln29

⇒ Auto-lookup ⇒ Manual lookup

⇒ Adding and editing terms ⇒ Transferring term extraction lists

TaaS: CAT Tool Integration

K.-D. Schmitz, IIM, FH Köln30

Data acquisition from SMT systems

Export of multilingual terminology for reuse in MT systems

Online Terminology Services

Translation

Training

SMT System Training and adaptation

Online Translation Service

Input Text for Translation

Parallel corpus

Monolingual corpus

Bilingual term collections

Monolingual Term 

Extraction

Trained SMT Model

Bilingual Term 

Extraction

Translated Text

TaaS: (statistical) Machine Translation

K.-D. Schmitz, IIM, FH Köln

Conclusion

TaaS offers free of charge services for terminology extraction, retrieval, management, and sharing

The term extraction results are excellent, if the linguistic algorithms are available for that language

Companies react very carefully concerning TaaS

But the free services offered by TaaS may attract language workers to use TaaS for terminology management, to share (validated) terminology, and to collaborate with others.

Thank you for your attention

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Fakulty 03 - ITMK/IIMUbierring 48

D-50678 KölnGermany

[email protected]