The Fully Networked Car Geneva, 4-5 March 2009 1 Automotive Speech Enhancement of Today:...

The Fully Networked Car Geneva, 4-5 March 2009

1Automotive Speech Enhancement of Today: Applications, Challenges and Solutions

Tim HaulickHarman/Becker Automotive Systems


2Automotive Speech Enhancement Applications

Phone

Speechdialogsystem

Send side processing:beamforming, echo

and noise suppression

Receive side processing:gain control

In-car communication:feedback suppression,

automatic gain adjustment

Communication channel to Communication channel to vehiclevehicle

Communication channel from Communication channel from vehiclevehicle

Communication channel in Communication channel in vehiclevehicle


3Extended Speech Enhancement

Mobile Phone

Speechdialogsystem

Send side processing:echo & noise suppression, wind-noise suppression speech reconstruction, beamforming/postfilter

Receive side processing:gain control,

bandwidth extension, adaptive equalization

Speaker independentspeech knowledge

Speaker (in-)dependent

speech knowledge


4

Bandwidth extension for wideband speech signals (bandwidth 7 kHz, e.g. AMR wideband codec G.722.2) – extension of high frequency components up to 11kHz.

Narrowband connection:

Wideband connection:

Bandwidth extension for narrowband speech signals (bandwidth 3.4 …3.8 kHz) – extension of low frequency components and extension of high frequency components up to 5.5 or 8 kHz.

Wideband input

Wideband output

Narrowband output

Narrowband input

Bandwidth Extension – Examples


5Bandwidth Extension - Enhancements

o Adaptation to different transmission channels (variable bandwidth, different signal-to-noise ratios)

o Adaptation to different sound amplifiers

o Adaptation depending on the background noise in the vehicle


6Speech Reconstruction

Motivation:

At medium and high driving speed low frequency speech components are often masked by the background noise. However, standard noise suppression methods of today often fail in these situations. As a result the processed speech signal sounds thin and distorted.

For further improvementof the speech qualitya reconstruction approachis an alternative.

However, speech recon-struction starts whereconventional noise reductionfails …


7

Phone(downlink)

Phone(uplink)

Loud-speaker

Micro-phone

Receive side processing

Speaker (in-)dependent speech knowledge

Analysisfilter bank

Analysisfilter bank

Residual echoand noise

suppression

Mixer

Echocancellation

Synthesisfilter bank

Speech reconstruction

Speech Reconstruction – Algorithmic Overview


8

Microphonesignal

Conven-tional

Recon-structed

Time in seconds

Fre

qu

en

cy

in

Hz

Time in seconds

Before CDMA coding

Time in seconds

Fre

qu

en

cy

in

Hz

Time in seconds

Fre

qu

en

cy

in

Hz

Fre

qu

en

cy

in

Hz

Microphonesignal

Conven-tional

Recon-structed

Conven-tional

Recon-structed

Conven-tional

Recon-structed

After CDMA coding

120

km/h

160

km/h

Speech Reconstruction – Audio Examples


9Speech Reconstruction – Evaluation (CDMA)

o A subjective evaluation was performed by 10 trained listeners. The signals have been coded and decoded with the CDMA enhanced variable rate codec (EVRC) prior to listening.

o The test set comprised 20 different test scenarios with SNRs ranging from 3 dB to 12 dB.


10Beamformer/Postfilter

o Beamformers perform a directional filtering: signals from the desired speaker direction are passed while signals arriving from other directions are attenuated.

o The achievable noise reduction is dependent on the spatio-temporal properties of the soundfield and the number of microphones.

o By extending the beamformer with a spatial postfilter a high directionality can even be achieved with a small number of microphones.

Spatialpostfilter

Ratiocomputat.

MAPapproxi-mation

Fixed beamf.

Micro-phone

spectra

Blocking matrix

Interferencecanceller

Noisepower

estimation

Outputspectrum

|…|²

GSC

Beamformer Postfilter


11

© 2008 Harman International Industries, Incorporated. All rights reserved.

Signal of the first microphone

Beamformer/Postfilter – Audio Example

2-channel processing (beamformer/postfilter)

Time

Fre

qu

en

cy

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

500

1000

1500

2000

2500

3000

3500

4000

Time

Fre

qu

en

cy

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

500

1000

1500

2000

2500

3000

3500

4000

Audio example: 2-channel microphone array, 120 km/h, driver and passenger are talking


12

By extending the adaptive beamformer with a spatial postfilter the voice recognition performance can be improved significantly without impairing the speech quality.

Reference system: 2-channel adaptive beamformer (GSC) without postfilter

Beamformer/Postfilter – Evaluation Results

120 km/h 100 km/hwindow 1/4 open

120 km/hdouble-talk

0

5

10

15

20

25

30

rela

tiv

e e

nh

an

ce

me

nt

of

WE

R [

%] Voice Recognition Performance

120 km/h 100 km/hwindow 1/4 open

120 km/hdouble-talk

0

1

2

3

4

5

6

7

Lo

g-S

pe

ctr

al

Dis

tan

ce

Speech Distortion

Adaptive Beamformer with PostfilterAdaptive Beamformer


13Wind Noise Suppression

o Problem: Due to design reasons and lack of space the standard wind shield of hands-free microphones is often insufficient. For this reason the microphone signal is often impaired by wind noise caused by the fan or an open top of a convertible.

o Solution: Suppression of wind noise by algorithmic means taking advantage of the statistical properties of the noise

Microphone Signal

2 channel processed

output signal


14In-Car Communication (ICC)

Current Situation:o Communication between passengers is

difficult, because of the acoustic loss (especially front to back).

o Front passengers have to speak louder than normal – longer conversations will be tiring.

o Driver turns around – road safety is reduced.

Solution:o Improve the speech quality and intelligibility

by means of an intercom system.

Application:o Mid and high class automobiles, which are

already equipped with the necessary audio and signal processing components.

o Minibuses, Vans, etc. (cars more than 2 rows of seats).

Passenger compartment

*Acoustic loss (referred to the

ear of the driver)

-5…-15dB*


15Configurations

One-Way Systemo 2-4 microphoneso 2-4 loudspeakers

Two-Way Systemo 4-8 microphoneso 6-8 loudspeakers

ICC System

ICCSystem


16Subjective Evaluation

Driving Scenarios o 0 km/h beside motorwayo 130 km/h on motorway

o Prerecorded speech sentences with different Lombard levels were played back via an artificial mouth.

o Binaural recordings were made by means of a HEAD acoustics NoiseBook on the seat behind the driver.


17Results of the Comparison Mean Opinion Score Test

0 km/h, vehicle parked close to a motorway:

o 19.7 % prefer the system to be switched off

o 29.7 % have no preferenceo 50.6 % prefer an activated

system

130 km/h, motorway:o 4.3 % prefer the system to be

switched offo 7.1 % have no preferenceo 88.6 % prefer an activated

system

(25 signal pairs for each driving situation / 15 listeners per scenario):


18Results of Modified Rhyme Tests (MRT)

0 km/h, vehicle parked close to a motorway:o No significant difference (95.2 % system off versus 95.0 % system on)o Due to the automatic gain adjustment the intercom system operates

with only very small gain at these noise levels

130 km/h, motorway:o Significant improvement of

the MRT error rateo Nearly 50 % error reduction

(85.4 % correct answers increased to 92.2 % correct answers)

(48 utterances were presented to each listener per driving situation):

The Fully Networked Car Geneva, 4-5 March 2009 1 Automotive Speech Enhancement of Today:...

Documents

Transcript of The Fully Networked Car Geneva, 4-5 March 2009 1 Automotive Speech Enhancement of Today:...