Voice Separation with tiny ML on the edge · 2020-03-13 · Voice Separation with tiny ML on the...
Transcript of Voice Separation with tiny ML on the edge · 2020-03-13 · Voice Separation with tiny ML on the...
Voice Separation with tiny ML on the edge
Tiny ML Summit 2020
Main collaborators:
Dr. Lars Bramsløw (Eriksholm Research Centre, Denmark)
Prof. Toumas Virtanen (University of Tampere, Finland)
Gaurav Naithani (University of Tampere, Finland)
Niels H. Pontoppidan, PhD
Research Area Manager, Augmented Hearing Science
Additional acknowledgements and references
• Thomas “Tom” Barker
• Giambattista Parascandolo
• Joonas Nikunen
• Rikke Rossing
• Atefeh Hafez
• Marianna Vatti
• Umaer Hanif
• Christian Grant
• Christian Hansen
• Bramsløw, L., Naithani, G., Hafez, A., Barker, T., Pontoppidan, N. H., & Virtanen, T. (2018). Improving competing voices segregation for hearing impaired listeners using a low-latency deep neural network algorithm. The Journal of the Acoustical Society of America, 144(1), 172–185.
• Naithani, G., Barker, T., Parascandolo, G., Bramsløw, L., Pontoppidan, N. H., & Virtanen, T. (2017). Low latency sound source separation using convolutional recurrent neural networks. 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 71–75.
• Naithani, G., Barker, T., Parascandolo, G., Bramsløw, L., Pontoppidan, N. H., & Virtanen, T. (2017). Evaluation of the benefit of neural network based speech separation algorithms with hearing impaired listeners. Proceedings of the 1st International Conference on Challenges in Hearing Assistive Technology. CHAT-17, Stockholm, Sweden.
• Naithani, G., Parascandolo, G., Barker, T., Pontoppidan, N. H., & Virtanen, T. (2016). Low-latency sound source separation using deep neural networks. 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP), 272–276.
• Pontoppidan, N. H., Vatti, M., Rossing, R., Barker, T., & Virtanen, T. (2016). Separating known competing voices for people with hearing loss. Proceedings of the Speech Processing in Realistic Environments Workshop, SPIRE Workshop.
• Barker, Thomas, Virtanen, T., & Pontoppidan, N. H. (2016). Hearing device comprising a low-latency sound source separation unit.
• Barker, Tom, Virtanen, T., & Pontoppidan, N. H. (2014). Hearing device comprising a low-latency sound source separation unit (Patent No. US Patent App. 14/874,641).
• Barker, Tom, Virtanen, T., & Pontoppidan, N. H. (2015). Low-latency sound-source-separation using non-negative matrix factorisation with coupled analysis and synthesis dictionaries. Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2015, 241–245.
Facts and stats about hearing aids and market
Market
• 15+ million units sold per year
• Global wholesale market of USD 4+ billion per year
• Six largest manufacturers hold a market share of +90%
• Main market: OECD countries
• 4-6% yearly unit growth mainly due to demographic
development
• Growing aging population and increasing life expectancy
Hearing-aid users
• 10% of the population in OECD countries suffer from hearing
loss
• Only 20% of people suffering from a hearing loss use a
hearing aid
• 35-40% of the population aged 65+ suffer from a hearing loss
• Average age of first-time user is 69 years (USA)
• Average age of all users is 72 years (USA)
Hearing devices
• Hearing devices help people communicate in simple and complex listening situations – also in sound environments were people with normal hearing give up using phones and headsets
• Some rely on hearing devices for a few hours a day for specific situations and many use them all awake hours
• Power 1 mA from zinc-air batteries replaced every week or Li-Ion batteries recharged every night
• Hardware design employs many low voltage and low clock-frequency methods
Enhancing segregation by transforming “mono” to “stereo”
History
1953
• Cocktail Party Problem coined by Colin Cherry
• Cherry proposes mono-to-Stereo to solve the probelm
2000
• Sam Roweis: One Microphone Source Separation at NIPS shows separation of known voices
2018
• Bramsløw et al: First time algorithms improve segregation of known voices for people with hearing loss
2020’s
• When will Tiny ML enable enhanced voice segregation in a hearing device?
Spatial augmentation
• The algorithms separates voices into
two (or more channels)
• The hearing devices increases the
spatial difference cues, i.e. repositions
the sound sources further apart
• In case of spatial audio-visual cue
conflicts, visual cues are expected to
override the auditory cues just like
with ventroqlists
m
o
n
o
Artificial stereo
Flowchart for training
DNN
training
Flowchart for processing
Enhanced segregation for people with mild/moderate hearing loss
• DNN processing
• 4 MIO weights for FDNNs (not optimized)
• 250 Hz audio frame processing rate
Bramsløw et al, JASA, 2018
Unprocessed
Ideal
How listeners with normal hearing hears two competing voices
How listeners with impaired hearing hears the two voices[The example could be harder to segregate]
How it sounds when the two voices are separated out
Focusing only on the female voice
Focusing only on the male voice
Enhanced segregation for people with mild/moderate hearing loss
• Processing requirements
• 4 MIO weights for FDNNs (not optimized)
• 250 Hz audio frame processing rate
Bramsløw et al, JASA, 2018
Unprocessed
Ideal
Next steps
Feature performance
Increasing robustness to additional noise and reverberation
Increasing robustness to personal voice changes
Break reliance on training on specific voices (transfer learning)
Further decrease network sizes from 4 MIO weights
Hardware performance
See Zuzana Jelčicová’s poster:
• Benchmarking and improving NN execution on DSP vs. custom accelerator for hearing instruments
• From float to fixed point
• Parallel MACS
• Two-step scaling
Zuzana Jelčicová: Benchmarking and improving NN execution on DSP vs. custom accelerator for hearing instruments
Voice Separation with tiny ML on the edge
Tiny ML Summit 2020
Main collaborators:
Dr. Lars Bramsløw (Eriksholm Research Centre, Denmark)
Prof. Toumas Virtanen (University of Tampere, Finland)
Gaurav Naithani (University of Tampere, Finland)
Niels H. Pontoppidan, PhD
Research Area Manager, Augmented Hearing Science