2. Constantin Orasan (UoW) EXPERT Introduction
-
Upload
riilp -
Category
Technology
-
view
428 -
download
0
Transcript of 2. Constantin Orasan (UoW) EXPERT Introduction
Introduction to EXPERT
Constantin Orasan University of Wolverhampton, UK
What are Marie (Skłodowska) Curie ITN actions? The EXPERT project Objectives of the project Work packages Individual projects Consortium
Structure
Initial Training Networks (ITN): Offer the early-stage researchers the opportunity to improve their
research skills Join established research teams Enhance their career prospects
Are run by consortia made up of universities, research centres and
companies
Recruit of researchers who are in the first five years of their career for initial training – for a research-level degree (PhD or equivalent) or be doing initial post-doctoral research.
What are Marie Curie ITN actions?
proposes the creation of an Initial Training Network to train young
researchers on ways to improve current data-driven MT technologies (TM, SMT and EBMT)
support young researchers of the network during the whole research and development cycle, providing guidance, core and complementary training skills and evaluating the resulting technologies
young researchers to become future leaders in this area
EXPERT: EXPloiting Empirical appRoaches to Translation
Advocates there is no clear boundary between fully automatic and semi-automatic translation and that they are tools that can help human translators
Aims to: improve existing corpus-based TM and MT technologies create hybrid technologies exploit the strengths of the existing technologies and address
their main limitations consider the needs of the users when proposing new
technologies
EXPERT
EXPERT has five main Training Objectives: Training through research based on the set of sub-programmes Creating a large and diverse research community focused on a
common goal. Exploiting intersectoral and transnational mobility via
secondments and shorter visits to both industrial and academic partners.
Local training in core research and complementary skills within both academic and industrial environments.
Network-wide training in core research areas and complementary skills.
Training objectives
Topic State-of-the-art and limitations EXPERT solutions
User perspective
MT systems force the users to change their working style.
Consider the real needs of translators, involving them in the development of technologies, and providing training to prepare them with new skills.
Data collection and preparation
Existing TM, EBMT and SMT approaches have particular data constraints.
Investigate how data repositories can be built automatically in a way that makes them useful to multiple corpus-based approaches to translation.
Objectives of the project
Topic State-of-the-art and limitations EXPERT solutions
Improve matching and retrieval with linguistic processing
Lack of linguistic processing constrains for the retrieval of previous translation.
Investigate matching algorithms which rely on lexical, syntactic and semantic variations of texts, including the use of automatically acquired domain ontologies and terminology databases
Hybrid approaches for translation
Hybrid corpus-based solutions consider each approach individually as a tool, not fully exploiting integration possibilities.
Fully integrate corpus-based approaches to improve translation quality and minimize translation effort and cost.
Objectives of the project (2)
Topic State-of-the-art and limitations EXPERT solutions
Human translator in the loop: Informing users and learning from user feedback
In interactive workflows where humans post-edit/complete system translations, translators are not informed about the quality of the translations. The translators’ choice is at best saved for future use.
Generate confidence and quality estimation mechanisms to allow these choices to be based on the quality of the TM/MT output. Make use of translators’ feedback as produced at translation time to improve the system on the fly.
Objectives of the project (3)
WP1: Management (UoW) WP7: Training (UvA) WP8: Dissemination (Pangeanic) WP2: User perspective (UMA) WP3: Data collection (Translated) WP4: Language technology, domain ontologies and terminologies (USSAR) WP5: Learning from and informing translators (USFD) WP6: Hybrid corpus-based approaches (DCU)
Work packages
Projects
ESR1 Investigation of translators’ requirements from translation technologies UMA WP2
ESR2 Investigation of an ideal translation workflow for hybrid translation approaches USAAR WP2
ESR3 Collection and preparation of multilingual data for multiple corpus-based approaches to translation UMA WP3
ESR4 Use of language technology to improve matching & retrieval in translation memories UoW WP4
ESR5 Use of terminologies and ontologies to improve corpus-based approaches to translation USAAR WP4
ESR6 Learning from human feedback on the quality of the translations USFD WP5
ESR7 Estimating the confidence of corpus-based approaches to translation and the quality of the translated texts USFD WP5
ESR8 Investigation of how each individual corpus-based translation approach (TM, EBMT and SMT) can benefit from each other DCU WP6
Projects (2)
ESR9 Investigation of the ideal infrastructure for computer-aided translation: pipeline with NLP tools for pre/post-processing, SMT, EBMT and TM techniques–a hybrid CAT tool
DCU WP6
ESR10 Exploiting hierarchical alignments for linguistically-informed SMT models to meet the hybrid approaches that aim at compositional translation
UvA WP6
ESR11
Exploiting hierarchical alignments for a semantically-enriched SMT system that offers an extension to existing TMs to allow incremental, recursive partial match of the input using hierarchical constructions containing variables
UvA WP6
ESR12 Investigation of methodologies to evaluate the improved SMT, EBMT and TM prototypes and new hybrid computer-aided translation technology proposed in EXPERT
UoW WP6
Projects (3)
ER1 Investigation of automatic methods for collection & preparation of multilingual data Translated WP3
ER2 Implementation and evaluation (including user aspects) of the improved SMT, EBMT and TM prototypes proposed in EXPERT Hermes WP6
ER3 Implementation and evaluation of the new hybrid computer-aided translation technology proposed in EXPERT Pangeanic WP6
Projects (4)
Academic partners: University of Wolverhampton, UK – coordinator Universidad de Malaga, Spain University of Sheffield, UK Universitaet des Saarlandes, Germany Dublin city University, Ireland Universiteit Van Amsterdam, Netherlands
Private sector: Pangeanic, Spain Translated SRL, Italy Hermes, Spain
Associated partners: Celer Soluciones S.L., Spain Wordfast, France
Consortium