Post on 31-Dec-2015
description
Behrooz Chitsaz Lorrie Apple JohnsonMicrosoft Research U.S. Department of Energy Behrooz Chitsaz Lorrie Apple JohnsonMicrosoft Research U.S. Department of Energy
Multimedia ResearchSpeech Search
Face identification
Object
recognition
Video browsing
Semantic
extraction
(3D) Segmentation
(3D) Image search
Speech as interface
Speech as 1st class content
Speech Applications
Speech recognition
Spectral Analysis
Matching (Decoding)time alignment most likely hypothesis
W’=argmax(w1..wN)p(ot..o|w1..wN) P(w1..wN)
Acoustic Modelsp(ot..o|phoneme)
DictionaryP(phonemes|w)
Grammar (Language Model)
P(w1..wN)
“Hello World”
o1..oT
(w1..wN)^
MAVIS technology
• Indexing automatic transcripts as text– Automatic transcription accuracy is only 50-80%
• MAVIS techniques– Word-level lattice indexing
• index word alternatives – robust to recognizer errors• 50-140% accuracy improvement • index timing – navigate to exact point in video
– Vocabulary Adaptation• Use NLP and Bing Search to expand word dictionary
– Automatic keywords to expose to search engines• Enables discovery of speech content through search engines• Bi-product of vocabulary adaptation
– See http://research.microsoft.com/mavis
MAVIS Architecture
SQL Server(s)
1. S
ubm
it au
dio/
vid
eo R
SS
2. R
etrie
ve
AIB
3. Import AIB in SQL
Web server(s)
4. S
earc
h/R
etr
ieve
re
sults
• Store content to be processed in temporary Azure storage
• Do vocabulary adaptation using Bing• Run recognition engine on content• Store results or recognition process (AIB)
U.S. Department of Energy Office of Scientific and Technical
Information (OSTI) Mission
• DOE invests > $10 billion/year in basic sciences, clean energy technology, and nuclear research.
• The immediate output from this investment is Information…Knowledge… R&D results
• OSTI’s mission is to accelerate scientific progress by accelerating access to this information.
OSTI’s Core Products
• Information Bridge
• Science Accelerator
• Science.gov
WorldWideScience.org
Emerging Forms of Scientific Information Require New Tools
• Numeric data, multimedia, and social media are emerging forms of scientific information
• Each form presents special opportunitiesand challenges
Search and Retrieval Challenges with Multimedia Science Information
• Lack of written transcripts, i.e. no “full text” to search
• Metadata, if available, is often minimal
• Scientific, technical, and medical terminology/vocabulary
• Videos can be long, often up to an hour or more
• Video files collected from DOE’s National Laboratories
• RSS feeds with metadata and URLs sent to Microsoft Research
• Audio indexing performed via MAVIS• Audio index blob (AIB) returned to OSTI and
integrated with SQL servers• Users can search for a precise term within the video,
and be directed to the exact point in the video where the term was spoken
OSTI and Microsoft Research Partnership
Looking to the Future
• Additional content from DOE researchers• Integration of multimedia searches into
WorldWideScience.org by June• High quality automatic closed captions• Multilingual translation capabilities