Post on 24-Dec-2015
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 1
The European Resources Landscape
Steven Krauwer
ELSNET / Utrecht University
The Netherlands
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 2
Overview
• About ELSNET
• Main characteristics of the European scene
• Impact of EU funding policies
• Bottom-up resources infrastructure actions
• Concluding remarks
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 3
What is ELSNET
• European Network in Human Language Technologies (ca 145 academic and industrial member organisations)
• Funded by the European Commission• Created in 1991 as one network out of (eventually)
ca 25, covering all subfields of ICT• Objectives
– bringing together the language and speech communities– bringing together academia and industry– facilitating R&D in language and speech technology
• Info: elsnet@elsnet.org http://www.elsnet.org
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 4
What we do
• Spreading knowledge, e.g.:– Training (e.g annual summer schools, curriculum
development)– Information dissemination (newsletter, website, etc)– Knowledge transfer (directories, workshops)
• Creating common foundations:– language resources– common standards and evaluation methods
• Roadmapping:– Establishing a broadly supported common vision of
where the language and speech field is going
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 5
Main characteristics of the European Landscape
• Multilinguality: coping with many languages and crossing language boundaries
• Fragmentation of all R&D efforts over national funding schemes and policies
• Unbalanced efforts over languages, even though all languages are equally hard
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 6
Languages in Europe
• European Union has – 15 member states, with 11 official languages (plus quite
a few ‘unofficial languages’)– 10 new member states with (at least) 10 new official
languages joining May 1st 2004– 3 applicant countries in the waiting room with at least 3
extra languages
• Europe has– 17 other countries, with quite a few additional
languages (think of Russia!)
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 7
Languages in the world
The Ethnologue (http://www.ethnologue.org):
• Europe: 230 languages
• The Americas: 1013 languages
• The Pacific: 1311 languages
• Africa: 2058 languages
• Asia: 2197 languages
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 8
Languages in Japan
• Just one language: Japanese ….• But even in Japan multilinguality is a factor, e.g:
– Export market requires localized products (e.g. user interfaces)
– Users require documentation in their own language
– Business to business communication crosses language boundaries
– Immigrants
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 9
Resources in Europe
• Language resources collection started in most countries as a cultural or political activity
• Most activities in larger countries with bigger funding programmes
• Adoption or creation of resources for industrial application started much later
• Most of them addressing commercially interesting languages
• Result: very uneven coverage
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 10
Impact of the EU
• During 70s and 80s EU becomes a major funder of technology programmes
• For smaller languages EU becomes main funding source
• Political requirement of multinational consortia and balanced participation over member states gave strong boost to resources development for smaller languages
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 11
Recent EU policies
• EU focus shifting to activities with a more direct commercial impact
• EU focus shifting from spreading excellence to boosting excellence: only invest in sectors where Europe can maintain or strengthen world leadership (over e.g. US and Japan)
• EU moves from many small projects (up to 5 million euro) to few big projects (up to 50 million)
• Language and speech technology have disappeared from the agenda, and Interfaces and Knowledge Systems have taken their place
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 12
Result of new policies
• Strong emphasis on the commercially interesting languages
• Language and speech will only appear as embedded technologies
• Creation of language resources in EU projects only if needed for the main objectives of the project, i.e. never as a goal per se
• Fragmentation of language and speech technology activities over many projects
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 13
Impact on infrastructures
• Creation and distribution of resources, standards, and evaluation are infrastructural in nature (as opposed to research and development)
• They require continuity and active industrial involvement
• Very hard to accomplish in EU funding context because of short duration of projects and requirement that industries contribute 50% of their costs themselves
• Resources actions now mostly at national level
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 14
Overall picture …
• … not very good: very little to expect from EU as far as improvement of the language resources situation is concerned for the duration of the present Framework Programme (2003-2007)
• But there are some signs that the situation will improve in the next Framework Programme,
• And there are still a number of bottom up activities (emerging from the community, with or without EU support)
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 15
Ongoing resources infrastructure actions
• ELSNET: still running (since 1991, hopefully secured until summer 2005; funded by the EU as a series of independent 2-3 year projects), still supporting resources and evaluation, now focusing on the roadmap for language and speech technology and for language and speech resources
• ELRA/ELDA: Resources Association and Agency; European counterpart (although not twin sister) of LDC
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 16
Ongoing actions,continued
• ENABLER: – Network aiming at coordination of national
resources activities; EU funding has ended, but it remains active.
– Surveys and other useful material on website (www.enabler-network.org)
– Involved in resources roadmap and landscape (see later)
– Asian and US participation
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 17
Cocosda
• International committee for the coordination and standardisation of speech databases and assessment techniques
• International, not just European – also active Asian involvement
• Not funded, but alive
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 18
ICCWLRE
• International coordination committee for written language resources and evaluation.
• Written language counterpart of Cocosda
• Goal is to join forces with Cocosda
• To be launched at LREC 2004 in Lisbon
• International, active Asian participation
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 19
LREC
• Biannual international conference on resources and evaluation
• Initiated in 1998, very successful, and truly international
• Only conference on this topic and only conference bringing together language and speech communities
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 20
Ongoing actions,continued
• The Language Resources Roadmap:– Joint activity of ELSNET/ENABLER/ELRA– Aimed at creating a broadly supported common
vision of where the field is going, and what the implications are for language resources
– Workshops (www.elsnet.org/roadmap.html)– Graphical representation at elsnet.dfki.de
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 21
Ongoing actions,continued
• The Resources Landscape:– Joint project by ELSNET/ENABLER– Aimed at creation and continued maintenance
of a full landscape of the world of language resources (actors, actions, projects, events, resources, etc)
– Still under construction– See www.enabler-network.org
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 22
EAGLES/ISLE/Wordnet
• EAGLES (and its successor ISLE) were EU funded projects aimed at standards in language and speech processing
• Projects have ended, but there are still some ongoing activities, such as MILE (the Multilingual ISLE Lexical entry)
• WordNet has had a number of European spin-offs, such as EuroWordNet, BalkaNet and local instantiations for other languages
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 23
Ongoing actions: BLARK
• Define (in a language-independent way) the minimal set of language resources that is necessary to do any precompetitive R&D and education at all for a language (the Basic Language Resource Kit or BLARK)
• Determine for each language which components are already available (survey)
• Make for each language a priority plan to complete the BLARK (and to get funding)
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 24
New initiatives
• Proposal to create BLARKnet: rejected by EU because language and speech are no core objectives
• In France the successful launch of the new national programme TechnoLangue, explicitly addressing resources and evaluation
• In Europe the initiative towards LangNet, a network aimed at coordination of national language and speech technology programmes (including resources and evaluation)
• Some of the new EU projects will address resources problems, but project info has not been released yet
LKR2004, Tokyo March 8+9 2004
steven.krauwer@elsnet.org 25
Concluding remarks
• We have seen some problems that are inherent to the situation in Europe and that will not go away: linguistic fragmentation and uneven balance in distribution of R&D efforts over languages
• We have seen self-imposed problems (EU funding schemes and policies); they may go away if and when the funders change their minds
• But we have also seen that there is still place for a variety of resources related initiatives in Europe, many of which could benefit from collaboration with e.g. Japan