2. Constantin Orasan (UoW) EXPERT Introduction

15
Introduction to EXPERT Constantin Orasan University of Wolverhampton, UK

Transcript of 2. Constantin Orasan (UoW) EXPERT Introduction

Page 1: 2. Constantin Orasan (UoW) EXPERT Introduction

Introduction to EXPERT

Constantin Orasan University of Wolverhampton, UK

Page 2: 2. Constantin Orasan (UoW) EXPERT Introduction

What are Marie (Skłodowska) Curie ITN actions? The EXPERT project Objectives of the project Work packages Individual projects Consortium

Structure

Page 3: 2. Constantin Orasan (UoW) EXPERT Introduction

Initial Training Networks (ITN): Offer the early-stage researchers the opportunity to improve their

research skills Join established research teams Enhance their career prospects

Are run by consortia made up of universities, research centres and

companies

Recruit of researchers who are in the first five years of their career for initial training – for a research-level degree (PhD or equivalent) or be doing initial post-doctoral research.

What are Marie Curie ITN actions?

Page 4: 2. Constantin Orasan (UoW) EXPERT Introduction

proposes the creation of an Initial Training Network to train young

researchers on ways to improve current data-driven MT technologies (TM, SMT and EBMT)

support young researchers of the network during the whole research and development cycle, providing guidance, core and complementary training skills and evaluating the resulting technologies

young researchers to become future leaders in this area

EXPERT: EXPloiting Empirical appRoaches to Translation

Page 5: 2. Constantin Orasan (UoW) EXPERT Introduction

Advocates there is no clear boundary between fully automatic and semi-automatic translation and that they are tools that can help human translators

Aims to: improve existing corpus-based TM and MT technologies create hybrid technologies exploit the strengths of the existing technologies and address

their main limitations consider the needs of the users when proposing new

technologies

EXPERT

Page 6: 2. Constantin Orasan (UoW) EXPERT Introduction

EXPERT has five main Training Objectives: Training through research based on the set of sub-programmes Creating a large and diverse research community focused on a

common goal. Exploiting intersectoral and transnational mobility via

secondments and shorter visits to both industrial and academic partners.

Local training in core research and complementary skills within both academic and industrial environments.

Network-wide training in core research areas and complementary skills.

Training objectives

Page 7: 2. Constantin Orasan (UoW) EXPERT Introduction

Topic State-of-the-art and limitations EXPERT solutions

User perspective

MT systems force the users to change their working style.

Consider the real needs of translators, involving them in the development of technologies, and providing training to prepare them with new skills.

Data collection and preparation

Existing TM, EBMT and SMT approaches have particular data constraints.

Investigate how data repositories can be built automatically in a way that makes them useful to multiple corpus-based approaches to translation.

Objectives of the project

Page 8: 2. Constantin Orasan (UoW) EXPERT Introduction

Topic State-of-the-art and limitations EXPERT solutions

Improve matching and retrieval with linguistic processing

Lack of linguistic processing constrains for the retrieval of previous translation.

Investigate matching algorithms which rely on lexical, syntactic and semantic variations of texts, including the use of automatically acquired domain ontologies and terminology databases

Hybrid approaches for translation

Hybrid corpus-based solutions consider each approach individually as a tool, not fully exploiting integration possibilities.

Fully integrate corpus-based approaches to improve translation quality and minimize translation effort and cost.

Objectives of the project (2)

Page 9: 2. Constantin Orasan (UoW) EXPERT Introduction

Topic State-of-the-art and limitations EXPERT solutions

Human translator in the loop: Informing users and learning from user feedback

In interactive workflows where humans post-edit/complete system translations, translators are not informed about the quality of the translations. The translators’ choice is at best saved for future use.

Generate confidence and quality estimation mechanisms to allow these choices to be based on the quality of the TM/MT output. Make use of translators’ feedback as produced at translation time to improve the system on the fly.

Objectives of the project (3)

Page 10: 2. Constantin Orasan (UoW) EXPERT Introduction

WP1: Management (UoW) WP7: Training (UvA) WP8: Dissemination (Pangeanic) WP2: User perspective (UMA) WP3: Data collection (Translated) WP4: Language technology, domain ontologies and terminologies (USSAR) WP5: Learning from and informing translators (USFD) WP6: Hybrid corpus-based approaches (DCU)

Work packages

Page 11: 2. Constantin Orasan (UoW) EXPERT Introduction

Projects

ESR1 Investigation of translators’ requirements from translation technologies UMA WP2

ESR2 Investigation of an ideal translation workflow for hybrid translation approaches USAAR WP2

ESR3 Collection and preparation of multilingual data for multiple corpus-based approaches to translation UMA WP3

ESR4 Use of language technology to improve matching & retrieval in translation memories UoW WP4

Page 12: 2. Constantin Orasan (UoW) EXPERT Introduction

ESR5 Use of terminologies and ontologies to improve corpus-based approaches to translation USAAR WP4

ESR6 Learning from human feedback on the quality of the translations USFD WP5

ESR7 Estimating the confidence of corpus-based approaches to translation and the quality of the translated texts USFD WP5

ESR8 Investigation of how each individual corpus-based translation approach (TM, EBMT and SMT) can benefit from each other DCU WP6

Projects (2)

Page 13: 2. Constantin Orasan (UoW) EXPERT Introduction

ESR9 Investigation of the ideal infrastructure for computer-aided translation: pipeline with NLP tools for pre/post-processing, SMT, EBMT and TM techniques–a hybrid CAT tool

DCU WP6

ESR10 Exploiting hierarchical alignments for linguistically-informed SMT models to meet the hybrid approaches that aim at compositional translation

UvA WP6

ESR11

Exploiting hierarchical alignments for a semantically-enriched SMT system that offers an extension to existing TMs to allow incremental, recursive partial match of the input using hierarchical constructions containing variables

UvA WP6

ESR12 Investigation of methodologies to evaluate the improved SMT, EBMT and TM prototypes and new hybrid computer-aided translation technology proposed in EXPERT

UoW WP6

Projects (3)

Page 14: 2. Constantin Orasan (UoW) EXPERT Introduction

ER1 Investigation of automatic methods for collection & preparation of multilingual data Translated WP3

ER2 Implementation and evaluation (including user aspects) of the improved SMT, EBMT and TM prototypes proposed in EXPERT Hermes WP6

ER3 Implementation and evaluation of the new hybrid computer-aided translation technology proposed in EXPERT Pangeanic WP6

Projects (4)

Page 15: 2. Constantin Orasan (UoW) EXPERT Introduction

Academic partners: University of Wolverhampton, UK – coordinator Universidad de Malaga, Spain University of Sheffield, UK Universitaet des Saarlandes, Germany Dublin city University, Ireland Universiteit Van Amsterdam, Netherlands

Private sector: Pangeanic, Spain Translated SRL, Italy Hermes, Spain

Associated partners: Celer Soluciones S.L., Spain Wordfast, France

Consortium