Cocosda 2001 ELRA/ELDA KC/1 Brief Overview of recent activities in Europe Khalid CHOUKRI ELRA/ELDA...
-
Upload
kerrie-shields -
Category
Documents
-
view
225 -
download
3
Transcript of Cocosda 2001 ELRA/ELDA KC/1 Brief Overview of recent activities in Europe Khalid CHOUKRI ELRA/ELDA...
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/1
Brief Overview of recent activities in Europe
Khalid CHOUKRIELRA/ELDA
55 Rue Brillat-Savarin, F-75013 Paris, FranceTel. +33 1 43 13 33 33 -- Fax. +33 1 43 13 33 30
Email: [email protected]: http://www.elda.fr/
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/2
ELRA … ELRA … European vs National activities European vs National activities Speech resources collections Speech resources collections Other projects (Enabler, EuroMap, etc.)Other projects (Enabler, EuroMap, etc.)Evaluation Evaluation LREC2002LREC2002
Outline
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/3
European Language Resource Association An Improved infrastructure for Data sharing
Centralized Not-for-profit organization for the collection, distribution, and validation of
speech, text, and terminology resources and tools.
Extension to:
•Multimodal/Multimedia Resources
•Evaluation.
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/4
European Language Resource Association An Improved infrastructure for Data sharing
A Repository Center:Technical & Logistic issuesCommercial issues (prices, fees, royalties)Legal issues (Licensing, IPR)Information Dissemination
An Association of users of Language Resources
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/5
Brief Overview of recent activities in EuropeEuropean Union Level
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/7
Brief Overview of recent activities in EuropeEuropean Union Level
European R&D Framework Programmes (FP): back to early Eighties
On-going Actions
• FP5 with a Thematic programme on Information Society technologies
•MLIS (Multi-Lingual Information Society)
•INCO (International Cooperation )
•E-Content
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/8
Brief Overview of recent activities in EuropeEuropean Union Level
European R&D Framework Programmes (FP): back to early Eighties
On-going Actions
•E-Content: Promoting European Digital Content on the Global Networks".
action line 1: "Improving access to and expanding use of public sector information"
action line 2: "Enhancing content production in a multilingual and multicultural environment
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/9
Brief Overview of recent activities in EuropeEuropean Union Level
Some Projects within FP5 and previous FPs …. Related to Cocosda concerns
Resources production: Speechdat Family
Specifications of new types of resources: Natural Interaction and MultiModality
within ISLE (International Standards for Language Engineering) project
Dialog & Evaluation : Seneca
Evaluation: CLASS
Standards: Eagles and its extension … the EU/US collaborative project ISLE
Networks: ELSNET, ENABLER
Information gathering & Dissemination : Euromap and its follow-up Hope
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/10
SpeechDat Family
SpeechDat(M) --- Fixed Telephone network -- 1K Speakers SpeechDat(M) --- Fixed Telephone network -- 1K Speakers
SpeechDat-II Fixed, Mobile, 1-5KspeakersSpeechDat-II Fixed, Mobile, 1-5Kspeakers
SpeechDat-II Speaker VerificationSpeechDat-II Speaker Verification
SpeechDat-E (CEE - SpeechDat-E (CEE - Polish Czech Slovak Russian Hungarian) Polish Czech Slovak Russian Hungarian)
SALA (Speech Across Latin America) SALA (Speech Across Latin America) and Now SALA-IIand Now SALA-II
SpeechDat-Car (inc. cellular)SpeechDat-Car (inc. cellular)
SpeeCon (Consumer products)SpeeCon (Consumer products)
Orien’telOrien’tel
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/11
SpeeCon Project
Participantnumber
Participant
Name
Participantshort name
Country
1 Siemens Aktiengesellschaft Siemens Germany
2 Ericsson Eurolab Deutschland GmbH EEDN Germany
3 IBM Deutschland Entwicklung GmbH IBM Germany
4 Lernout & Hauspie Speech Products NV L&H Belgium
5 Matra Nortel Communications Matra France
6 Nokia Corporation Nokia Finland
7 Philips Speech Processing AachenZweigniederlassung der Philips GmbH
Philips Germany
8 Sony International (Europe) GmbH Sony Germany
9 TEMIC TELEFUNKEN microelectronicGmbH
TEMIC Germany
10 DaimlerChrysler AG DCAG Germany
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/12
SpeeCon Project
Dialectal zone Language Region RemarksEsl_ES Spanish Spain (excluding Latin America)Rus_RU 1) Russian RussiaIta_IT Italian ItalySve_SE_FI Swedish Sweden and FinlandDeu_DE_AT German Germany and Austria (excluding e.g. Belgium, Luxembourg,
Switzerland)Eng_GB English United KingdomDan_DK Danish DenmarkDut_BE Dutch BelgiumFra_CA French CanadaFra_FR French France (excluding e.g. Belgium, Luxembourg,
Switzerland)Fin_FI Finnish FinlandZho_CN_HK Mandarin P. R. China (incl. Hongkong) (excluding e.g. Taiwan)Dut_NL Dutch The NetherlandsJpn_JP Japanese JapanPol_PL Polish PolandPor_PT Portuguese Portugal (excluding Brazil)Deu_CH German SwitzerlandEng_US English USA (excluding e.g. Canada)
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/13
SpeechDat Family: OrienTel
Multilingual access to interactive communication services for the Mediterranean and the Middle East
7 linguistic regions 10 OrienTel countries 23 databases
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/14
SpeechDat Family: OrienTel
Linguistic affiliation OrienTel countries Languages coveredMorocco Standard Arabic
Colloquial Moroccan ArabicFrench
Mahgreb Arabic(excluding Algeria and parts ofLibya) Tunisia Standard Arabic
Colloquial Tunisian ArabicFrench
Egyptian Arabic(excluding parts of Libya)
Egypt Standard ArabicColloquial Egyptian ArabicEnglish
Levantine Arabic(excluding Syria, Lebanon andJordan)
Israel and PalestineAuthorities
HebrewStandard ArabicColl. South Levantine Arabic
United Arab Emirates Standard ArabicColloquial Gulf ArabicEnglishGulf Arabic
(excluding Kuwait, Bahrain,Qatar, Oman and Yemen)
Saudi Arabia Standard ArabicColloquial Gulf ArabicEnglish
Cypriote Greek Cyprus GreekEnglish
Hebrew Israel HebrewTurkish Turkey, Germany for German Turkish
German
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/15
SpeechDat Family: SALA
Phase IFixed Network
MexicoArgentina
ChileBrazil
Colombia
Venezuela
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/16
SpeechDat Family: SALA - II
Phase IICellular/Mobile Network
Latin America US and CanadaMexico US English North EastArgentinaChile* US Spanish EastBrazil English South West
or US Spanish West.Colombia US English North WestVenezuelaCosta Rica* US English South EastPeru* Canadian American
English
US English North West US English South West US English North East US English South East US Spanish East (Caribbean variant) US Spanish West (Mexican variant) Canadian British English Canadian American English Canadian French
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/17
Other resources’ oriented projects
C-Oral-Rom : Conversational Speech C-Oral-Rom : Conversational Speech
Roman Languages: French, Italian, Spanish, PortugueseRoman Languages: French, Italian, Spanish, Portuguese
““Comparable” dataComparable” data
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/18
Brief Overview of recent activities in EuropeEuropean Union Level
A major project within MLIS …. Related to Cocosda concerns
NETWORK-DC: Network of international & regional Data Centers
Partners: ELRA, SPEX & LDC
Others (GSK,…) welcome to join
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/20
Brief Overview of recent activities in EuropeNational Projects/programs
Netherlands & Belgium:
Dutch spoken Corpus (Coming presentation): Data Available via ELRA, Release of April2001
OVER Nine National projects:
Germany:
From Vermobile (Data Available via ELRA) to SmartKom
France:
Reseau National en Recherche en TéléCommunication (RNRT),
Others RIAM, RNTL, Coming: Evaluation program………..
Italy, Greece, Czech, ….
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/21
Brief Overview of recent activities in EuropeNational Projects/programs… Dutch & Flemish
Release 1 (March 2000) 62 hours speech samples orthographically transcribed (615,000 words), 90,000 words enriched with
Part-of-Speech tags; annotation CD with first version of PRAAT (annotation tool) and first version of documentation (in
Dutch) among which relevant information on the speakers (e.g. gender, age, socio-economic class) andsamples (e.g. recording conditions, the equipment) (information on the speakers in anonymous form);
Release 2 (October 2000) over 150 hours of speech samples, orthographically transcribed (over 1,500,000 words), approximately
750,000 words enriched with Part-of-Speech tags; annotation CD with annotation protocols and relevant information on the speakers (e.g. gender, age,
socio-economic class) and samples (e.g. recording conditions, the equipment) is available (informationon the speaker in anonymous form);
Release 3 (April 2001) more orthographically data enriched with Part-of-Speech tags; the first broad phonetic transcriptions, word alignments, syntactic annotations, lexicon link-up will be
available; annotation CD with documentation among which relevant information on the speakers (e.g. gender,
age, socio-economic class) and samples (e.g. recording conditions, the equipment);this release encompasses the first version of Corex, the exploitation tool.
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/22
MARKET ANALYSIS
First objective:
To get hard facts about the needs/requirements To get reliable figures about the market
Second objective:
To enforce /confirm our knowledge / assessments
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/23
0
10
20
30
40
50
60
total telephony office consumer
1998
2003
Million EUR Million €
MARKET ANALYSIS (Worldwide Market of LR - Commercial Use)
Courtesy of Siemens
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/24
Speech Recognition -- Market SegmentationImplications for the Language Resource Market 1998-2003
Market Segment Office Telephony Consumer Total Market
# of Costumers 4 - 8 10 - 20 10 - 30 24 - 58
# of databases* 200 - 400 1000-2000 1000-3000 2200-5400
Market Size ( M € ) 6 - 12 30-60 30-90 66-162
50 Languages 30K€ per LR
Telephony: 2 databases/language (fixed and mobile network)Consumer: 2 databases/language (car and public environment)
* all databases needed by all providers of speech recognition technology
** Estimated accumulated market from 1998 until 2003 ( in M€)
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/27
Distribution Activities of Language Resources for Evaluation
(via ELRA)
EVALUATION ACTIVITIES
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/28
AURORA (Speech distributed recognition)
AMARYLLIS (Multilingual/Parallel
corpora)
CLEF (Cross-Language Evaluation Forum)
ARCADE/ROMANSEVAL
Distribution ActivitiesLanguage Resources for Evaluation
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/29
AURORA (Speech distributed recognition)
Set up to establish a worldwide standard for the feature extraction software in a DSR (Distributed Speech Recognition) system:
(i) Evaluation of algorithms for front-end feature extraction algorithms in background noise
(ii) Evaluation and comparison of the performance of noise robust speech recognition algorithms.
Distribution ActivitiesLanguage Resources for Evaluation
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/33
Language Resources for Evaluation
(production/commissioning; Distribution)
Methodologies for Evaluation
Management of Evaluation Campaigns
Evaluation of Language Resources
(Validation)
European Language Resource Association& Evaluation
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/34
ENABLER European National Activities for Basic Language Engineering & Resources
Survey of existing national activities
Fostering common research and compatibility of LR
Suggestion for and contribution to international
cooperation
-- A new InitiativeIdentification of existing resources (Universal Catalogue)The Basics (e.g. Standards, tools, evaluation procedures, …)
Extension foreseen/ Planned
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/36
EUROMAP - HOPE
HOPE is a knowledge building and dissemination project
whose main goal is to
raise awareness about the market readiness and potential benefits of
Human Language Technologies (HLT)
among appropriate market players in the information society.
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/37
EUROMAP - HOPE
Center for Sprogteknologi CST DK
VDI/VDETechnologiezentrumInformationstechnikGmbH
VDI/VDE-IT DE
VIKOP Verein fürInternationale Forschungs-Technologie undBildungskooperation
BIT AT
Instituto Cervantes IC ES
Scientific Computing Ltd. CSC FI
Consorzio Pisa Ricerche CPR IT
Arax Limited Arax UK
European LanguageResources DistributionAgency
ELDA FR
University of Brighton ITRI UK
Institute for Language andSpeech
ILSP GR
Nederlandse Taalunie NTU NL
Central Laboratory forParallel Processing
CLPP BG
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/39
LREC-2002
Issues in the design, construction and use of Language Resources (LR)
Issues in Human Language Technologies evaluation
General issues (National and international activities and projects, Cooperations,…)
Conference: 29-30-31 MAY 2002Pre Conference Workshops: 27-28 MAY 2002Post Conference Workshop: 1-2 JUNE 2002
Cocosda 2001 ELRA/ELDAELRA/ELDAKC/40
LREC-2002 …. IMPORTANT DATES
Submission of proposals for oral and poster papers, referenced demos, panels and workshops:
20 NOVEMBER 2001
Notification of acceptance of workshop and panel proposals: 10 DECEMBER 2001
Notification of acceptance of papers, posters, referenced demos:2 FEBRUARY 2002
Final versions: 2 APRIL 2002