Boosting Sustainable Urban Mobility Plans: the concept of ...
Infrastructures and plans boosting Language Technology Research and Innovation
description
Transcript of Infrastructures and plans boosting Language Technology Research and Innovation
Co-funded by the 7th Framework Programme of the European Commission through the contract T4ME, grant agreement no.: 249119.
Infrastructures and plans boosting
Language Technology Research and Innovation
Stelios PiperidisAthena RC, Greece
Multilingual Europe
3http://www.meta-net.eu
Challenge: Providing each language community with the most advanced technologies for communication and information so that maintaining their mother tongue does not turn into a disadvantage.
While research has made considerable progress in recent years, the pace of progress is not fast enough to meet the challenge within the next 10-20 years.
All stakeholders – researchers, LT user and provider industries, language communities, funding programmes, policy makers – should team up for a major dedicated push.
Objectives
META-NET is a network of excellence dedicated to fostering the tech-nological foundations of the European multilingual
information society.
http://www.meta-net.eu 4
Four EU-Funded Projects
Initial project: T4ME (FP7; 13 partners, 10 countries)
Three ICT-PSP consortia since Feb. 2011: CESAR, METANET4U, META-NORD
All EU member states and several non-member states covered.
META-NET in Nov. 2012: 60 members in 34 countries.
http://www.meta-net.eu 5
http://www.meta-net.eu/members
Language White Paper Series
META-VISION
http://www.meta-net.eu 6
Language White Paper Series
http://www.meta-net.eu 7
Reports on the state of our languages inthe digital age and the level of support through language technology.
Series covers 30 languages. Key communication instruments to
address decision makers and journalists. Inform about societal and technological
problems and challenges as well as economic opportunities.
>2 years in the making. >200 national experts as contributors. >8.000 copies printed and distributed to
politicians and journalists.
30 Languages Covered
Basque Bulgarian* Catalan Czech* Danish* Dutch* English* Estonian* Finnish* French*
Galician German* Greek* Hungarian* Icelandic Irish* Italian* Latvian* Lithuanian* Maltese*
Norwegian Polish* Portuguese* Romanian* Serbian Slovak* Slovene* Spanish* Swedish* Croatian
http://www.meta-net.eu 8
* = Official EU language
Cross-Lingual Ranking
In four application areas, each language is assigned to one of five clusters, ranging from excellent LT support to weak/no support:1. Machine Translation2. Speech Processing3. Text Analysis4. Resources
Results finalised at a meeting in Berlin with representatives of all 30 languages (October 21/22, 2011).
http://www.meta-net.eu 9
MT
http://www.meta-net.eu 10
English
good
French, Spanish
moderate fragmentary
Catalan, Dutch, German, Hungarian, Italian, Polish,
Romanian
weak or no support
Basque, Bulgarian, Croatian, Czech, Da-nish, Estonian, Finnish, Galician, Greek, Icelandic, Irish,
Latvian, Lithuanian, Maltese, Norwegian, Portuguese, Serbian,
Slovak, Slovene, Swedish
excellent
Czech, Dutch, Finnish, French, German, Italian,
Portuguese, Spanish
moderate fragmentary
Basque, Bulgarian, Catalan, Danish, Estonian, Galician, Greek, Hungarian, Irish,
Norwegian, Polish, Serbian, Slovak, Slovene, Swedish
weak or no support
Croatian, Icelandic, Latvian, Lithuanian, Maltese, Romanian
excellent
English
good
Spee
ch
English
good
Dutch, French, German, Italian,
Spanish
moderate fragmentary
Basque, Bulgarian, Catalan, Czech, Danish, Finnish,
Galician, Greek, Hungarian, Norwegian, Polish, Portuguese,
Romanian, Slovak, Slovene, Swedish
weak or no support
Croatian, Estonian, Icelandic, Irish, Latvian, Lithuanian, Maltese,
Serbian
excellent
English
good
Czech, Dutch, French, German,
Hungarian, Italian, Polish, Spanish,
Swedish
moderate fragmentary
Basque, Bulgarian, Catalan, Croatian, Danish, Estonian,
Finnish, Galician, Greek, Norwegian, Portuguese,
Romanian, Serbian, Slovak, Slovene
Icelandic, Irish, Latvian, Lithuanian, Maltese
weak/no supportexcellent
Reso
urc
esTe
xt
Anal
ysis
Europe’s Languages and LT
http://www.meta-net.eu 11
DutchFrenchGermanItalian
Spanish
CatalanCzech
FinnishHungarian
PolishPortugues
eSwedish
BasqueBulgarian
DanishGalicianGreek
Norwegian
RomanianSlovak
Slovene
CroatianEstonianIcelandic
IrishLatvian
LithuanianMalteseSerbian
English
good support through Language
Technology
weak orno support
Key Observations
http://www.meta-net.eu 12
When it comes to Language Technology support, there are massive differences between Europe’s languages and technology areas.
LT support for English is ahead of any other language. Even support for English is far from being perfect. The gap between English and the other languages keeps
widening! Several languages – Icelandic, Latvian, Lithuanian, Maltese
– receive the weakest score in all four areas! At least 21 European languages in danger of digital
extinction!(Languages put into the “weak or no support” category at least once.)
Strategic Research Agenda
META-VISION
http://www.meta-net.eu 13
Three Ingredients
14
Appropriate
Programme
Vision & Agenda
Appropriate ActorsResearch &
Commercialisation
Appropriate Support
Funding
http://www.meta-net.eu
Strategic Research Agenda
http://www.meta-net.eu 15
META-NET Strategic Research Agenda for Multilingual Europe 2020.
Addresses the problems we found during the white paper study.
Three priority research themes and application/innovation scenarios.
Can put Europe ahead of its competitors in this technology area.
190+ contributors. Final version ready today! SRA will be presented to the EC and
national bodies.
Strategic Research Agenda
http://www.meta-net.eu 16
Priority Themes: 3 + 2
Three Priority Research Themes: Translation Cloud Social Intelligence and e-Participation Socially-Aware Interactive Assistant
Two additional themes: European Language Technology
Platform Core Technologies for Language
Analysis and Production
http://www.meta-net.eu 17
Open Resource Infrastructure
META-SHARE
http://www.meta-net.eu 18
The power of data
http://www.meta-net.eu 19
Scientific data has the potential to transform and drastically improve our lives
Evidence from many domains – geo & earth sciences, biotechnology – shows data & tools become valuable through opening and sharing Both for research and technology development &
evaluation Supporting innovative applications
Making the Human Genome Project results accessible, leveraged ~ €3 billion R&D investment, ~ €500 billion in economic activity
“Alzheimers’ researchers recently pooled genetic data and discovered 5 new genes and important evidence about the disease”
“Data is too valuable to be locked away”
Strategic Research Agenda
http://www.meta-net.eu 21
LRs in the SRA
http://www.meta-net.eu 22
LRs Discovery? Availability?
http://www.meta-net.eu 23
According to past and recent studies only a portion of language resources (LRs) is known/ announced / shared / traded / ...
… despite the fact that data collection, cleaning, annotation, curation and maintenance is a very costly business
To make any progress, enable the development of useful applications, we need all those scientific, technical, legal, organisational, societal mechanisms that enable the necessary resources to be shared, recycled, repurposed
META-SHARE rationale
http://www.meta-net.eu 24
Language resources (data and tools) are dynamic living entities they evolve over time in various dimensions (quantity,
annotation levels, conversion to new formats, addition of new languages)
they are usually the product of collaborative work they may come with varying restrictions, ...
Need solutions that enable every language resource provider, at any granularity level (individual/lab/organisation), to Create his own repository of LRs Describe, document and update LR descriptions Link to a network of repositories of other providers Keep track of the use of his LRs, trade LRs, …
Need solutions that enable every language resource consumer to Discover what LRs suitable for his/her purposes exist Get information about, download / acquire them
META-SHARE: what it is
http://www.meta-net.eu 25
META-SHARE tries to match LR providers and consumers needs and expectations by enhancing visibility, documentation, identification, availability, preservation of language data and (basic language processing) tools
It launches a long-term multidimensional endeavour by which language resources will contribute to boosting research, technology and innovation through wide availability, pooling, openness and sharing
http://www.meta-net.eu 26
metadata harvesting
…LR repoInventory
LR repoInventory
LR repoInventory
LR repoInventory
META-SHARE inventory
META-SHARE inventory
META-SHARE inventory
Search / browse
reportingmappings
licence statistics
Billing / payment recommenders
download
Registration – authentication - authorisationMETA-SHARE portal
External repos
META-SHARE architecture
Resources provision services
User oriented and support services
META-SHARE provider side All facilities for creating
your own META-SHARE-compliant repository and linking to the META-SHARE network : Open source repository
software Functionalities for
documenting, updating descriptions, storing/linking LRs
Provider support services (helpdesks, forum, knowledge base)
Each repository maintains an inventory with all LRs MD, exports MD for harvesting
Harvested MD are stored in synchronised central servers
http://www.meta-net.eu 27
META-SHARE user side
Users (LR consumers) can search the central
inventory browse using multiple
facets
http://www.meta-net.eu 28
access the actual resources by visiting the respective repositories to get legally interoperable licence(s) to download and use them
get support through an online user forum and helpdesks dedicated to technical, metadata and legal issues
access a knowledge base
Join META-SHARE as ...
Core and User Support Service Providers
Hosting (non-local) repositories
Local repositories
Depositing-only Members
Associate members
Third Party Consumers
Repository Service Providers
Legal provisions for LR sharing Language Resources Sharing Charter – high level
principles
Memorandum of Understanding – aka membership agreement
Licensing templates and deposition agreements Inclusive mix of open and openness inspired models
- Creative Commons licences (starting with Creative Commons Zero (CC-0) and all possible combinations along the CC differentiation of rights of use)
- META-SHARE Commons licences, fully developed CC-based licensing tool that allows META-SHARE members to make their resources available inside the network only
- META-SHARE “No Redistribution” licences, allowing use and exploitation of the Resources while permitting the LR Owner to have full control over the Resource distribution.
- Software tools and web services are either provided though one of the standard Open Source licenses or under a custom commercial license.
http://www.meta-net.eu 30
META-SHARE today… A network of 24 language resources repositories in 19
EU countries, with >1550 LRs
META-SHARE software, open source, under a permissive licence (BSD), to set up a language resource repository
Legal instruments catering for a range of uses
Software-based services for both LR providers and LR consumers
User support services User Forum helpdesks
Mapping services to big resource inventories (CLARIN, OLAC, …)http://www.meta-net.eu 32
In the immediate future… More META-SHARE nodes and respective language
resources will be integrated – integration of ELRA supported initiatives, LRE Map, Language Library
Adoption of the META-SHARE platform and framework by ELRA
Full deployment of the services of META-SHARE members – from software availability, maintenance and technical assistance to language resources storage and preservation as well as support related to metadata and legal issues
Coordination with upcoming initiatives (iCordi, Research Data Alliance, …)
Official launch : 25 January 2013http://www.meta-net.eu 33
ConclusionsMETA-NET
http://www.meta-net.eu 34
Conclusions
Our white paper press campaign shows that Europe is extremely interested in and passionate about its languages.
Two Parliamentary Questions in the European Parliament on the “digital extinction of languages” topic.
Now is the time to move forward with a continent-wide, systematic push and to invest in strategic research.
A modest investment is required. This push will generate a countless number of
opportunities. Horizon 2020 and Connecting Europe Facility can provide
sufficient resources to make our visions for Europe’s citizens and economy a reality.
http://www.meta-net.eu 35
http://www.meta-net.eu 36
Thank you very much!
http://www.meta-net.euhttp://www.facebook.com/META.Alliance
37
Q/A