Leakage Problems in Array Speech Processing Julien Bourgeois Martigny - September 2003
-
Upload
jackson-acosta -
Category
Documents
-
view
15 -
download
1
description
Transcript of Leakage Problems in Array Speech Processing Julien Bourgeois Martigny - September 2003
x1(t) x4(t)
Array Processor
Recover clean individual speech flows: separate and denoise the sources
Context of the work
Microphone Array
get mixtures of the sources and noise
Individual speech flows
s1(t)
s2(t)
Road Noise spatially diffuse
Several simultaneous speakers (sources) spatially located
Beamforming
Beamforming: Minimization of output power with unit gain at the direction (DOA) of the target
+ robust against noise, sources do not have to be active- array geometry and target location must be known and far-field
Leakage Problem (Beamforming)
x1 xN
With echo or source location error:
the source signal arrives from another direction than the constrained DOA.
The beamformer can produce a zero output...
... and indeed it minimizes the output power.
+1-1(Constrain)
In a reverberant environment or by target location error,
beamforming can cancel the target signal.
0 (output)
Solution to the Leakage Problem
Do not adapt the beamformer when the target is active (the speaker is speaking).
x1 xN
With the constrain, good behavior should be preserved for the target.
When the target is off, minimizing the output power will cancel the noise sources.
0
+1
(Constrain)
Do not speak
A beamformer needs a voice activity detector (VAD) to control its adaptation.
2000 4000 6000
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.52000 4000 6000
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5 -20
-18
-16
-14
-12
-10
-8
-6
-4
-2
0
2000 4000 6000
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
1000 2000 3000 4000 5000 6000
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Estimate the target power PT with a delay-sum beam
Estimate noise power PN with M-1 orthogonal beams
Voice Activity Detector: VAD(t) = PN (t)/PT (t) (frame-wise)
M = 4 microphones
VAD by unknown noise field
20 40 60 80 100 120
1000
2000
3000
4000
5000
6000
0 20 40 60 80 100 120 140-5
0
5
10
15
VA
D(t
) [d
B]
(586 H
z)
Realistic scenario (road noise always present) Prior: DOA of the target speaker
It can be difficult to discriminate Double-Talk and Talk situations.
Noisy Speech (freeze) Noisy Jammer (adapt) Noisy Double Talk (freeze)
Leakage Problem (Beamforming)
Is caused by echoic environment (such as a car)target location errorcalibration errorwrong propagation model (far-field)
A solution: no adaptation during target activity (speech)requires a voice activity detectoris a trade-off between noise tracking and robustness
Blind Source Separation
Blind Source Separation: Minimization of a dependence measure
+ only statistical assumption on the sources (independence) + no prior on the array geometry and sources locations- ambiguities: permutations and scaling at each frequency - not robust against noise, need all sources to be active
Robust Blind Source Separation: Multiple Decorrelation
Find W s.t. the components of s = W x are decorrelated at several times
i.e. such that Rss(tk) = WHRxx (tk) W is diagonal for k = 1,...,K
t1 t2 t3 t4 tK
W is found using the gradient descent and is constrained to unity gain.
1000 2000 3000 4000 5000 6000
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5 -20
-18
-16
-14
-12
-10
-8
-6
-4
-2
0
1000 2000 3000 4000 5000 6000
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
W initialized to identity
2 microphones
1000 2000 3000 4000 5000 6000
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5 -20
-18
-16
-14
-12
-10
-8
-6
-4
-2
0
1000 2000 3000 4000 5000 6000
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
3 microphones
Leakage Problem (BSS)
1000 2000 3000 4000 5000 6000
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5 -20
-18
-16
-14
-12
-10
-8
-6
-4
-2
0
1000 2000 3000 4000 5000 6000
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
1000 2000 3000 4000 5000 6000
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5 -20
-18
-16
-14
-12
-10
-8
-6
-4
-2
0
1000 2000 3000 4000 5000 6000
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
4 microphones
8 microphones
W initialized to identity
Leakage Problemn (BSS)
1000 2000 3000 4000 5000 6000
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5 -20
-18
-16
-14
-12
-10
-8
-6
-4
-2
0
1000 2000 3000 4000 5000 6000
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
A solution with prior on source locations
8 microphones
W initialized to delay-sums at sources locations
Conclusion & Future Plans
Leakage ProblemBeamformers need to detect who speaks and when (VAD).Double talk is difficult to detect because of low directivity at low frequencies, where speech has more power.
For source separation, an unbiased spatial prior (source locations) prevents convergence to zero of the separator.
Future Work1. Set a spatial constrain at low frequencies where location error have little effect.2. Estimate location of the source at higher frequencies.
3. Is it possible to constructively use the early reflections ?
(multiple beamforming, matched filtering)