Mediaeval 2013 Spoken Web Search results slides
-
Upload
xanguera -
Category
Technology
-
view
361 -
download
1
description
Transcript of Mediaeval 2013 Spoken Web Search results slides
Spoken Web Search at Mediaeval 2013
Xavier Anguera, Florian Metze, Andi Buzo, Igor Szoke and Luis Javier
Rodriguez-Fuentes
Spoken Audio Search (or Query-by-Example Spoken-Term Detection)
Given a spoken query we search for instances at lexical level within spoken documentsIt is similar to Spoken Term Detection (NIST STD2006, OpenKWS 2013) but…
Queries are spoken
Different speakers
Different acoustic conditions
No prior knowledge of the
language(s) might be available
SWS history in Mediaeval• SWS 2011 had 5 finishing participants and
focused on 4 Indian languages• SWS 2012 had 9 finishing participants and
focused on 4 African Languages• SWS 2013 has 13 finishing (18 registered)
participants and contains 9 languages
2011 2012 20130
2
4
6
8
10
12
14
16
18
0
200
400
600
800
1000
1200
1400#teams
database size
SWS 2013 evaluation setup
• 1 single search corpus with ~20 hours of data, collected from contributions of 9 languages– No transcription or language information is given
to participants• 500 queries for dev and 500 queries for eval– For each query, participants need to return all
instances of that query in the search corpus
Mediaeval SWS 2013• 9 languages in different acoustic contexts: 4 African
languages (isixhosa, isizulu, sepedi, setswana), Albanian, Basque, Czech, non-native English, Romanian
#utts time Avg. length/utt.
Search corpus 10762 19:57:55 6.67s
Dev Queries 505 0:11:26h 1.35s
Extended dev* 1046 0:08:42h 0.49s
Eval Queries 503 0:11:37h 1.38s
Extended eval* 1037 0:08:57h 0.51s
Total 13853 20:38:37h*Only Basque (3x) and Czech (10x) queries have extended versions
Database distribution per language
Language Number of utterances / total duration
Number of queries Speech quality (original sampling rate)
Recording environment
African - isixhosa 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech
African - isizulu 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech
African - sepedi 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech
African - setswana 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech
Albanian 968 / 127 min. 50 / 50 PC microphone, 16KHz Lab environment, read speech
Basque 1841 / 192 min. 100 / 100 (recorded by mobile phone)
TV Broadcast news, 16KHz
Studio, read speech
Czech 3667 / 252 min. 94 / 93 Telephone speech, 8KHz Telephone calls into radio broadcasts, spontaneous speech
Non-native English 434 / 141 min. 61 / 60 High quality mic, 44KHz Conference lectures, spontaneous speech
Romanian 2272 / 244 min. 100 / 100 PC microphone, 16KHz Lab environment, read speech
SWS 2013 participantsTeam name countryDto. Electricidad y electrónica, Universidad Pais Vasco SpainSpeec@FIT, Brno University of Technology Czech RepublicTelefonica Research SpainUniversity Politechnica of Bucarest RomaniaSchool of Electrical and Computer Engineering, Georgia Institute of Technology USAL2F - INESC-ID PortugalDepartament de sistemes informàtics I Computació, Universitat Politècnica de València SpainAudiolab, University of Zilina SlovakiaLIA, University of Avignon FranceTechnical University of Kosice SlovakiaUniversitat Pompeu Fabra SpainDSP-STL, Dept. of EE, The chinese University of Hong Kong Hong KongInternational Institute of Information Technology- Hyderabad IndiaIAIS, Fraunhofer Institute GermanyTATA Consultancy Services Ltd. IndiaIndian Statistical Institute IndiaNorthwestern Polytechnical University of Xi’an ChinaToyota Technological Institute at Chicago USA
orga
nize
rsN
on-fi
nish
ing
Possible approaches to QbE-STD
Pattern based
Lattice based
Word-based
Language spokenAcoustic models
Language models
+
+
Dynamic Tim
e Warping
Acoustic Keyword
Spotting
Full ASR
Followed approachesTeam name DTW-like AKWSDto. Electricidad y electrónica, Universidad Pais VascoSpeec@FIT, Brno University of TechnologyTelefonica ResearchUniversity Politechnica of BucarestSchool of Electrical and Computer Engineering, Georgia Institute of TechnologyL2F - INESC-IDDept. de sistemes informàtics I Computació, Universitat Politècnica de ValènciaAudiolab, University of ZilinaLIA, University of AvignonTechnical University of KosiceUniversitat Pompeu FabraDSP-STL, Dept. of EE, The chinese University of Hong KongInternational Institute of Information Technology- Hyderabad
Scoring metrics
• PRIMARY: Actual Term Weighted Value (ATWV) / Maximum Term Weighted Value (MTWV)
• Actual/minimum Cnxe
• Real-time factor• Memory usage
Primary metric (dev)
Primary metric (eval)
Per language resultsAverage for the 10-best systems
Per-language results: African (eval)
Per-language results: Albanian(eval)
Per-language results: Basque(eval)
Per-language results: Czech (eval)
Per-language results: Non-native English (eval)
Per-language results: Romanian (eval)
DET dev
DET eval
Cnxe metric
Extended Queries
• 4 teams submitted 4 extended systems, making use of 3 repetitions of Basque queries and 10 repetitions of Czech queries available– TID: computes each query individually and then puts together all
results– GTTS: DTW-aligns all queries above a minimum duration and searches
with the resulting query– GeorgiaTech: builds a graphical keyword model using more than one
instance
Extended systems
Extended systems
Extended systems
Extended systems
Real-Time Factor versus Memory usage
Real-Time Factor versus Memory usage (partial)
Take home messages
• The task was more complicated than in 2012– GTTS got MTWV-13 = 0.39 MTWV-12 = 0.51 (on
2013 data)– HKCU MTWV-12 = 0.74 (on 2012 data)
• It is possible to do QbE-STD on unknown/low resources data
New things to watch out for in the posters session• BUT:
– Fusion of 26 systems (13 AKWS + 13 DTW)– M-norm normalization
• IIIT:– Articulatory Bottleneck features
• CUHK:– Tokenizer construction using Gaussian Component clustering– Query expansion using PSOLA
• L2F– DTW candidate pre-selection
• GTTS:– Distance matrix normalization in DTW
• GeorgiaTech:– Low-resource speech modeling using EHMM Models
• LIA:– Use of I-vectors in SWS
• ARF– DTW string matching algorithm with a novel scoring
Poster session
System presentations
• 16:30-16:45 "GTTS Systems for the SWS Task at MediaEval 2013", Luis Javier Rodriguez-Fuentes, DEE, Universidad del País Vasco
• 16:45-17:00 "The L2F Spoken Web Search system for Mediaeval 2013”, Alberto Abad, L2F, INESC-ID
• 17:00-17:15 "BUT SWS 2013 - MASSIVE PARALLEL APPROACH", Lucas Ondel, Speech@BUT, Brno University of Technology
• 17:15-17:30 "The CMTECH Spoken Web Search System for MediaEval 2013", Ciro Gracia, UPF
• 17:30-17:45 Discussion and SWS 2014 teaser, Xavier Anguera