HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006...

17
HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006...

Page 1: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino.

HIWIRE meeting

ITC-irst

Activity report

Marco Matassoni, Piergiorgio Svaizer

March 9.-10. 2006Torino

Page 2: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino.

Outline

• Beamforming and Adaptive Noise Cancellation• Environmental Acoustics Estimation• Audio-Video data collection• Multi-channel pitch estimation• Fixed-platform prototype acquisition module

Page 3: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino.

Beamforming: D&S

Availability of multi-channel signals allows to selectively capture the desired source:

)τs(t)(~i

M

1i

1 M

ts

Issues:

• estimation of reliable TDOAs;

Method:

• CSP analysis over multiple frames

Advantages:

• robustness

• reduced computational power

Page 4: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino.

D&S with MarkIII

Test set:

• set N1_SNR0 of MC-TIDIGITS (cockpit noise), MarkIII channels

• clean models, trained on original TIDIGITS

Results (WRR [%]):

C_1 38.5

C_32 50.8

DS_C8 79.9

DS_C16 83.0

DS_C32 85.3

DS_C64 85.4

Page 5: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino.

Adaptive Noise Cancellation

A remote microphone can be used as reference for noise estimation:

+ ++ -

equivalentnoise path filter

(cockpit) noise

(beamformed) speech

Adaptive filter

noisy speech

filtered noise

denoised speech

Page 6: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino.

NMLS

The tested algorithm is the Normalized Mean Least Squares: iterativelly estimate a FIR filter that minimizes the difference between the primary channel and the reference

We implemented two algorithms:

• time domain

• frequency domain (subband)

Page 7: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino.

D&S + ANC

Test set:

• set N1_SNR0 of MC-TIDIGITS (cockpit noise), MarkIII channels

• clean models, trained on original TIDIGITS

Results (WRR):

C_32 (T) 64.7

C_32 (F) 72.4

DS_C64 (T) 81.8

DS_C64 (F) 88.4

Page 8: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino.

Acoustics estimation

Idea:

Simulate in a realistic way an environment (and the noise)

Method:

• Measure several impulse responses in an environment with a multi-channel equipment (through reproduction of chirp signals) preserving relative amplitudes and mutual delays;

• Generate appropriate noisy signals starting from clean data;

The derived acoustics models perform better in the given environment (also) using real data.

Page 9: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino.

Audio–Video Data Collection

Idea:

In a noisy environment exploit additional features from video data

(collaboration with NTUA and TUC)

Design of AV corpus:

•Task: English connected digits, HIWIRE commands/keywords

•Channels: 4 audio, 3 video

•Environment: acoustically-treated room + noise diffusion

Page 10: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino.

Audio–Video Setup

)))

)))

Cockpit noise )))

70-80 cm

Page 11: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino.

Audio–Video Setup

Audio

4 omnidirectional PZM Shure microphones, 16 kHz/16 bits

background noise diffused by 2 loudspeakers

Video

Webcam: 640x480, 30 fps – color, Unix timestamps

Stereoscopic camera pair: 640x480, 30 fps - bw or 15 fps – color, perfectly synchronous

Current data sets

• 8 speakers / connected digits

• 2 speakers / HIWIRE keyword lists

Page 12: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino.

Fixed prototype acquisition device

Hardware platform:

8 Shure microphones + RME Hammerfall

Software environment:

Linux, ALSA driver

Acquisition module:

• acquires synchronously multiple channels (8);

• writes (to its standard output/file) the enhanced signal + additional information/features (start/end speech hyphoteses, voiced/unvoiced, pitch, …)

Page 13: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino.

Multi-channel pitch analysis

The basic principle is that we can exploit many observations of the same speech processOnce located the speaker, we can take into account the different propagation time at the microphones and perform a time-alignment

Pitch analysis can be performed using:adjacent time intervals extracted from different microphone signals

Basic correlation techniques: AMDF, AUTOC, WAUTOC

Page 14: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino.

WAUTOC is computed for each channel, and summed over the M channels.

For a given frame:

Issues:

• Weights wi may represent the channel reliability; • Use of possible intraframe smoothing of the resulting

fundamental frequency contour, which could improve the overall accuracy

A Multichannel WAUTOC Method

M

iiiwautocwf

1

)()(

)(

fMAX

Page 15: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino.

Video example: distant-talking speech recognition

Page 16: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino.

Video example: multi-channel pitch estimation

Page 17: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino.

Forthcoming activities

• more effective combination of beamforming and ANC;

• test also ANC before D&S beamforming;

• test post-filtering after D&S;

• audio-video collection: an improved audio/video synchronization would be advisable;

• audio-video collection: select best balance beetween quality and frame rate

• acoustically characterize the target environment (prototype);

• integrate the selected features in the multi-channel front-end;