BargainNet: Background-Guided Domain Translation for Image ...
Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect...
Transcript of Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect...
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 644753 (KConnect).
Medical-domain Machine Translation in KConnect
Pavel PecinaCharles University, PragueFaculty of Mathematics and PhysicsInstitute of Formal and Applied LinguisticsCzech Republic
Apr 4th, 2017 – QT21 workshop, Valencia, Spain
Outline
● Context of the project (Khresmoi)
● Project details goals and objectives
● Role of MT in the project
● Industry requirements/constraints
● Solutions and tools
● Prototypes/Demos
● What is still needed
Khresmoi
● „Collect and make sense of biomedical information, then make it freely and easily available in several languages.“
● FP7-ICT, No. 257528, Collaborative project
● Total cost: EUR ~10M, 2010/09-2014/08
● Topic: ICT-2009.4.3 - Intelligent Information Management
● Coordinator: Henning Müller, University of Applied Sciences Western Switzerland, Sierre
● Consortium: 12 institutions
● http://www.khresmoi.eu/
Khresmoi objectives
● Effective automated information extraction from (unstructured) biomedical documents
● Linking information extracted from unstructured biomedical texts/images to structured information in knowledge bases
● Support of cross-language search, including multi-lingual queries, and returning machine-translated pertinent excerpts
● Adaptive user interfaces to assist in formulating queries and display search results via ergonomic/interactive visualizations
● Automated analysis and indexing for medical images
Khresmoi results (MT related)
● MT component to allow cross-lingual search and access
● Based on Moses and domain-adaptation techniques
● Deployed as (cloud-based) web-service
● Translation in two „modes“:– Translation of search queries from user languages to the
documents languages (query translation)– Translation of sentences from automaticaly created
summaries of medical documents (summary translation)
● Languages: Czech, German, French ↔ English
KConnect – a follow-up of Khresmoi
● „Development and commercialization of cloud-based services for multilingual Semantic Annotation, Semantic Search and Machine Translation of Electronic Health Records and medical publications.“
● H2020 project, No. 644753, Innovation action
● Total cost: EUR ~4M, 2015/02–2017/07
● Topic: ICT-15-2014 Big data and open data innovation and take-up
● Coordinator: Allan Hanbury, Technical University in Viena
● Consortium: 10 institutions (5 from Khresmoi)
● http://www.kconnect.eu
Consortium● Academia:
– Technische Universitaet Wien (Austria) – coordination
– University of Sheffield (United Kingdom)
– King’s College London (United Kingdom)
– Charles University, Prague (Czech Republic)
● Industry:
– Findwise AB (Sweden)
– Precognox Informatikai Kft (Hungary)
– Ontotext AD (Bulgaria)
– Trip Database Ltd (United Kingdom)
– Health on the Net Foundation (Switzerland)
– Jonopkins Lan (Sweden)
KConnect objectives
● Productisation of the multilingual medical text processing tools developed in Khresmoi.
● Creating professional services community of companies trained to build solutions based on the KConnect Services.
● Development of toolkits for straightforward adaptation of the commercialised services to new languages.
● Adapting the services to Electronic Health Records processing, which is particularly challenging due to misspellings, neologisms, organisation-specific acronyms, etc.
● Languages: Hungarian, Polish, Spanish, Swedish ↔ English
MT Application Scenarios
1. Query translation– Translation of medical/health-related search queries from a
user language to the document language(s)– Queries usually non-grammatical, short sequences of terms– Lay-people queries vs. expert queries
2. Summary translation – Sentences taken from automaticaly created abstracts of
medical documents translated back to the user language– Usually longer, highly informative sentences
Requirements, constraints
● Requirements– Cloud-based solution, easily accessible as webservice– Local instalation (hospitals)– Instant response, scalable – Low computation resources (local instalations)– Easily (re)trainable
● Constraints– No (very limited) domain-specific in-house training data
Solutions and tools
● Moses (phrase-based, domain adaptated)
● MT Monkey – MT webservice architecture
● Eman Lite – MT traninig pipeline
● Manually translated dev/test sets for medical domain
● Training data colllected and made available for WMT 17
MT Monkey
● Webservice architecture
● Developed at CUNI within Khresmoi
● Activelly extended and maintained within KConnect
● Scalable (see Tamchyna et al, 2013 for evaluation)
● Recently Dockerized
Eman Lite
● fully automated MT system training
● command-line and web-based interface
Prototypes/demos
● Trip database search– https://www.tripdatabase.com– Search in medical articles (clinical trials, research papers ...)
● Health-on-the-Net Search– http://everyone.khresmoi.eu/– Health-focused web-search engine– Readability and trustablity prediction
● Demos– http://quest.ms.mff.cuni.cz/khresmoi/demo/– http://quest.ms.mff.cuni.cz/khresmoi/client/
Trip Database Search
Trip Database Search
HON Search
HON Search
HON Search
HON Search (new version)
HON Search (new version)
HON Search (new version)
Prototypes/demos
● Trip database search– https://www.tripdatabase.com– Search in medical articles (clinical trials, research papers ...)
● Health-on-the-Net Search– http://everyone.khresmoi.eu/– http://jupiter.honservices.org/beta/– Health-focused web-search engine– Readability and trustablity prediction
● Demos– http://quest.ms.mff.cuni.cz/khresmoi/demo/
Issues
● Availability of (in-domain) training data
● Training data licences not clear (UMLS,MeSH, SnomedCT)
● Translation quality for some languages (e.g. Hungarian)
● Lay-people language vs. expert language