The Fully Networked Car Geneva, 4-5 March 2009 1 Automotive Speech Enhancement of Today:...
-
Upload
zachary-stone -
Category
Documents
-
view
216 -
download
3
Transcript of The Fully Networked Car Geneva, 4-5 March 2009 1 Automotive Speech Enhancement of Today:...
The Fully Networked Car Geneva, 4-5 March 2009
1Automotive Speech Enhancement of Today: Applications, Challenges and Solutions
Tim HaulickHarman/Becker Automotive Systems
The Fully Networked Car Geneva, 4-5 March 2009
2Automotive Speech Enhancement Applications
Phone
Speechdialogsystem
Send side processing:beamforming, echo
and noise suppression
Receive side processing:gain control
In-car communication:feedback suppression,
automatic gain adjustment
Communication channel to Communication channel to vehiclevehicle
Communication channel from Communication channel from vehiclevehicle
Communication channel in Communication channel in vehiclevehicle
The Fully Networked Car Geneva, 4-5 March 2009
3Extended Speech Enhancement
Mobile Phone
Speechdialogsystem
Send side processing:echo & noise suppression, wind-noise suppression speech reconstruction, beamforming/postfilter
Receive side processing:gain control,
bandwidth extension, adaptive equalization
Speaker independentspeech knowledge
Speaker (in-)dependent
speech knowledge
The Fully Networked Car Geneva, 4-5 March 2009
4
Bandwidth extension for wideband speech signals (bandwidth 7 kHz, e.g. AMR wideband codec G.722.2) – extension of high frequency components up to 11kHz.
Narrowband connection:
Wideband connection:
Bandwidth extension for narrowband speech signals (bandwidth 3.4 …3.8 kHz) – extension of low frequency components and extension of high frequency components up to 5.5 or 8 kHz.
Wideband input
Wideband output
Narrowband output
Narrowband input
Bandwidth Extension – Examples
The Fully Networked Car Geneva, 4-5 March 2009
5Bandwidth Extension - Enhancements
o Adaptation to different transmission channels (variable bandwidth, different signal-to-noise ratios)
o Adaptation to different sound amplifiers
o Adaptation depending on the background noise in the vehicle
The Fully Networked Car Geneva, 4-5 March 2009
6Speech Reconstruction
Motivation:
At medium and high driving speed low frequency speech components are often masked by the background noise. However, standard noise suppression methods of today often fail in these situations. As a result the processed speech signal sounds thin and distorted.
For further improvementof the speech qualitya reconstruction approachis an alternative.
However, speech recon-struction starts whereconventional noise reductionfails …
The Fully Networked Car Geneva, 4-5 March 2009
7
Phone(downlink)
Phone(uplink)
Loud-speaker
Micro-phone
Receive side processing
Speaker (in-)dependent speech knowledge
Analysisfilter bank
Analysisfilter bank
Residual echoand noise
suppression
Mixer
Echocancellation
Synthesisfilter bank
Speech reconstruction
Speech Reconstruction – Algorithmic Overview
The Fully Networked Car Geneva, 4-5 March 2009
8
Microphonesignal
Conven-tional
Recon-structed
Time in seconds
Fre
qu
en
cy
in
Hz
Time in seconds
Before CDMA coding
Time in seconds
Fre
qu
en
cy
in
Hz
Time in seconds
Fre
qu
en
cy
in
Hz
Fre
qu
en
cy
in
Hz
Microphonesignal
Conven-tional
Recon-structed
Conven-tional
Recon-structed
Conven-tional
Recon-structed
After CDMA coding
120
km/h
160
km/h
Speech Reconstruction – Audio Examples
The Fully Networked Car Geneva, 4-5 March 2009
9Speech Reconstruction – Evaluation (CDMA)
o A subjective evaluation was performed by 10 trained listeners. The signals have been coded and decoded with the CDMA enhanced variable rate codec (EVRC) prior to listening.
o The test set comprised 20 different test scenarios with SNRs ranging from 3 dB to 12 dB.
The Fully Networked Car Geneva, 4-5 March 2009
10Beamformer/Postfilter
o Beamformers perform a directional filtering: signals from the desired speaker direction are passed while signals arriving from other directions are attenuated.
o The achievable noise reduction is dependent on the spatio-temporal properties of the soundfield and the number of microphones.
o By extending the beamformer with a spatial postfilter a high directionality can even be achieved with a small number of microphones.
Spatialpostfilter
Ratiocomputat.
MAPapproxi-mation
Fixed beamf.
Micro-phone
spectra
Blocking matrix
Interferencecanceller
Noisepower
estimation
Outputspectrum
|…|²
GSC
Beamformer Postfilter
The Fully Networked Car Geneva, 4-5 March 2009
11
© 2008 Harman International Industries, Incorporated. All rights reserved. Page 11
Signal of the first microphone
Beamformer/Postfilter – Audio Example
2-channel processing (beamformer/postfilter)
Time
Fre
qu
en
cy
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
500
1000
1500
2000
2500
3000
3500
4000
Time
Fre
qu
en
cy
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
500
1000
1500
2000
2500
3000
3500
4000
Audio example: 2-channel microphone array, 120 km/h, driver and passenger are talking
The Fully Networked Car Geneva, 4-5 March 2009
12
By extending the adaptive beamformer with a spatial postfilter the voice recognition performance can be improved significantly without impairing the speech quality.
Reference system: 2-channel adaptive beamformer (GSC) without postfilter
Beamformer/Postfilter – Evaluation Results
120 km/h 100 km/hwindow 1/4 open
120 km/hdouble-talk
0
5
10
15
20
25
30
rela
tiv
e e
nh
an
ce
me
nt
of
WE
R [
%] Voice Recognition Performance
120 km/h 100 km/hwindow 1/4 open
120 km/hdouble-talk
0
1
2
3
4
5
6
7
Lo
g-S
pe
ctr
al
Dis
tan
ce
Speech Distortion
Adaptive Beamformer with PostfilterAdaptive Beamformer
The Fully Networked Car Geneva, 4-5 March 2009
13Wind Noise Suppression
o Problem: Due to design reasons and lack of space the standard wind shield of hands-free microphones is often insufficient. For this reason the microphone signal is often impaired by wind noise caused by the fan or an open top of a convertible.
o Solution: Suppression of wind noise by algorithmic means taking advantage of the statistical properties of the noise
Microphone Signal
2 channel processed
output signal
The Fully Networked Car Geneva, 4-5 March 2009
14In-Car Communication (ICC)
Current Situation:o Communication between passengers is
difficult, because of the acoustic loss (especially front to back).
o Front passengers have to speak louder than normal – longer conversations will be tiring.
o Driver turns around – road safety is reduced.
Solution:o Improve the speech quality and intelligibility
by means of an intercom system.
Application:o Mid and high class automobiles, which are
already equipped with the necessary audio and signal processing components.
o Minibuses, Vans, etc. (cars more than 2 rows of seats).
Passenger compartment
*Acoustic loss (referred to the
ear of the driver)
-5…-15dB*
The Fully Networked Car Geneva, 4-5 March 2009
15Configurations
One-Way Systemo 2-4 microphoneso 2-4 loudspeakers
Two-Way Systemo 4-8 microphoneso 6-8 loudspeakers
ICC System
ICCSystem
The Fully Networked Car Geneva, 4-5 March 2009
16Subjective Evaluation
Driving Scenarios o 0 km/h beside motorwayo 130 km/h on motorway
o Prerecorded speech sentences with different Lombard levels were played back via an artificial mouth.
o Binaural recordings were made by means of a HEAD acoustics NoiseBook on the seat behind the driver.
The Fully Networked Car Geneva, 4-5 March 2009
17Results of the Comparison Mean Opinion Score Test
0 km/h, vehicle parked close to a motorway:
o 19.7 % prefer the system to be switched off
o 29.7 % have no preferenceo 50.6 % prefer an activated
system
130 km/h, motorway:o 4.3 % prefer the system to be
switched offo 7.1 % have no preferenceo 88.6 % prefer an activated
system
(25 signal pairs for each driving situation / 15 listeners per scenario):
The Fully Networked Car Geneva, 4-5 March 2009
18Results of Modified Rhyme Tests (MRT)
0 km/h, vehicle parked close to a motorway:o No significant difference (95.2 % system off versus 95.0 % system on)o Due to the automatic gain adjustment the intercom system operates
with only very small gain at these noise levels
130 km/h, motorway:o Significant improvement of
the MRT error rateo Nearly 50 % error reduction
(85.4 % correct answers increased to 92.2 % correct answers)
(48 utterances were presented to each listener per driving situation):