Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents ›...
Transcript of Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents ›...
Artificial Intelligence: Automatic Speech Recognition
NALIT
Boise, Idaho2019
Presented by:
Nic Côté, Sliq Media Technologies
Agenda:
• Evolution of automatic speech recognition
• Current state of ASR
• Prediction: What to Expect
• Real world results
Automatic Speech Recognition
How does it work and what’s new?
How Speech Recognition Works
The Basics - Spectral analysis
The characteristic formant frequencies (F1 and F2) for the English vowels a, e, i, o and u are: 850Hz and 1610Hz (a); 390Hz and 2300Hz (e); 240Hz and 2400Hz (i); 360Hz and 640Hz (o) and 250Hz and 595Hz (u).Image: Charles McLellan
Traditional Technology Hidden Markov Model – Probability Based
Evolution of Speech Technology
New Technology Artificial Intelligence Models
Google Speech Recognition Error Rate - 2017
Why Artificial Intelligence?Contextual accuracy
Commercial Applications / Virtual Assistants
• Nuance – Dragon NaturallySpeaking
• Automotive Virtual Assistants
• Cortana
• Siri
• Google Now/Home
• Amazon Alexa
Legislative Applications
• Closed Captions/Subtitles
• Transcripts
Benefits
• Cost
• Improved accessibility
• Processing and production time
• Mitigate stenographer and transcriber availability
Challenges
• Accuracy/Acceptable error rate
• Consistency
• Recording Quality
• Processing delays for live captions
• Multilingual capabilities
• Dialects / Accents / Slang / Context
Prediction: What to Expect
• Massive improvements year over year
• Short term (<5years) progress will makes accurate ASR very accessible
• Consumer demand drives development, ASR will be ubiquitous
• Great progress expected regarding noisy environments and imprecise grammar
• Decreased processing time to optimize real time captions
• Affordability of outsourced captions/transcripts (currently $1.50 -$4/minute)
Human Brain
• Your brain contains 100 billion neurons and 10,000 times as many connections
• There are more than 125 trillion synapses just in the cerebral cortex alone
● 86 Million Equivalent AI Nuerons
150 billion Equivalent AI Synapses
Artificial Intelligence
Model Training
Better Training for Better Intelligence
Facial RecognitionExample
QueryImage
Vehicle RecognitionExample
Real World Results
Competing Automatic Speech Recognition Engines
ASR Battle
Can multiple Automatic Speech Recognition engines do a better job than just one?
YES!(mostly)
Legislative Recordings – Word Error Rates
ASR1 ASR2 Combined
Total words 2210 2210 2210
Total errors 273 178 148
WER 12.33% 8.05% 6.70%
Sample Legislative ASR – Failure
Sample Legislative ASR – Missed Word
Sample Legislative ASR – Removed Duplicate
Sample Legislativie ASR – Almost Success
Sample Legislative ASR - Success
Legislative RecordingsMulti-ASR Error Rate, Different SourcesHouse of Commons Budget Speech 4.0%
House of Commons QP1 6.1%
House of Commons QP2 13.2%
House of Commons Statements by Members 18.9%
Arkansas House of Representatives 8.9%
Oklahoma House of Representatives 11.4%
City of Fredericton English 9.3%
City of Fredericton French 6.4%
Improving Odds – Statistical Probability Model“Bad Poetry Filter”Accurate
"member for Saanich Gulf islands" - 3,260 results
"and I thank my friend from" - 8 results
"Okanagan Similkameen Nicola" - 29,700 results
"particularly for his advocacy" - 18,000 results
"those what really agricultural" - 0 results
Inaccurate"member for Senate sköll islands" - 0 results
"night I thank my friend for" - 0 results
"Okanagan smell me Nicola" - 0 results
"particular phrase advocacy" - 0 results
"those really agricultural" - 1 result
Live Examples
Arkansas House of Representatives -ASR Meeting
British Columbia – Closed Caption Search
Conclusion
• We are very close to 90% accuracy with many Legislative recordings
• Expect to be >95% within a few years
Problems:
• Speaker coherence
• Multilingual Languages