Convex optimization-based windowed Fourier filtering with ...
Reduction of non-stationary noise using a non-negative latent … · 2008-10-28 · • Signal...
Transcript of Reduction of non-stationary noise using a non-negative latent … · 2008-10-28 · • Signal...
-
Reduction of non-stationary noise using a non-negative latent variable decomposition
Mikkel M. Schmidt and Jan LarsenDTU InformaticsTechnical University of Denmarkwww.imm.dtu.dk
-
28/10/20082 DTU Informatics, Technical University of Denmark2DTU Informatics, Technical University of Denmark
Jan Larsen
Noise Reduction
• Single channel recording• Unknown speaker / signal of interest• Focus on modeling noise
Noise Reduction
System
-
28/10/20083 DTU Informatics, Technical University of Denmark3DTU Informatics, Technical University of Denmark
Jan Larsen
Single channel source separation is hard
• The is no spatial information hence– beamforming– independent component
analysisare not feasible
• Maybe higher-level cognitive capabilties such as context detection could help
• We will use a data-driven approach to learn a good noise representation
-
28/10/20084 DTU Informatics, Technical University of Denmark4DTU Informatics, Technical University of Denmark
Jan Larsen
The spectrum of alternative methods
• Wiener filter (Wiener, 1949)• Spectral subtraction (Boll 1979; Berouti et al. 1979) • AR codebook-based spectral subtraction
(Kuropatwinski & Kleijn 2001) • Minimum statistics (Martin et al. 1994,2001, 2005)• Masking techniques (Wang; Weiss & Ellis 2006)• Factorial models (Roweis 2000,2003)• Non-negative sparse coding (Casey&Westner 2000,
Wang&Plumbley2006, Schmidt&Olsson 2006, Schmidt,Larsen&Hsiao 2007)
Several method requires a VAD
Largely fail for fast changing non-
stationary noise
-
28/10/20085 DTU Informatics, Technical University of Denmark5DTU Informatics, Technical University of Denmark
Jan Larsen
Our approach in brief
Signal representation
Latent variable model
signal
Signal representation
Latent variable model
signal
Noise rep.
Learning phase
Signal reconstruction
Extracted signal of interestnot in foucs in this work
-
28/10/20086 DTU Informatics, Technical University of Denmark6DTU Informatics, Technical University of Denmark
Jan Larsen
Signal Representation
• Exponentiated magnitude spectrogram
γ = 2 Power spectrogramγ = 1 Magnitude spectrogramγ = 0.67 Cube root compression
(Steven’s power law - perceived intensity)• Any other representation could be used – wavelets, perceptually
weighted etc.• Ignores phase information. Reconstruct by re-filtering
-
28/10/20087 DTU Informatics, Technical University of Denmark7DTU Informatics, Technical University of Denmark
Jan Larsen
Non-negative latent variable model
• Speech
• Noise
• Noisy speech
Non-negative basis Weights
Binary activation
Residual
-
28/10/20088 DTU Informatics, Technical University of Denmark8DTU Informatics, Technical University of Denmark
Jan Larsen
Non-negative latent variable model
• Use a probabilistic Bayesian setting• Exponential priors (sparsity) on B and C
• Gaussian residual RGoals:
Posterior of all parmeters A, B, C, S, Nand noise variance
Marginalized mean estimate of signalcomponent
-
28/10/20089 DTU Informatics, Technical University of Denmark9DTU Informatics, Technical University of Denmark
Jan Larsen
A three-step simple approximate learning procedure for speech1. Compute speech activation using state-of-the-art voice
activity detector (Qualcomm-ICSI-OGI)2. Compute noise basis representation using non-speech
signal frames3. Jointly compute noise weights, speech basis, and speech
weights and reconstruct speech signal
-
28/10/200810 DTU Informatics, Technical University of Denmark10DTU Informatics, Technical University of Denmark
Jan Larsen
Experimental setup
• Four different noise types: machine gun, string quartet, restaurant noise, traffic noise
• Mixed by 100 sentences from the TIMIT database with SNR in the range -9dB to 6dB
• Signal represented by SFTF using 64ms 50% overlapping Hann windowed frames and mapped onto 32 MEL frequency bins [20Hz; 4kHz]
• 256 bases for signal and noise• Optimal sparsity (hyper-parameters): λB=0.1, λC=0• Qualcomn-ICSI-OGI voice activity detector (VAD)
-
28/10/200811 DTU Informatics, Technical University of Denmark11DTU Informatics, Technical University of Denmark
Jan Larsen
Quality Measure• Signal to noise ratio
– Simple measure, has only indirect relation to perceived quality
• Representation-based metrics– In systems based on time-frequency masking, evaluate the
masks• Perceptual models
– Promising to use PEAQ or PESQ• High-level Attributes
– For example word error rate in a speech recognition setup• Listening-tests
– Expensive, time-consuming, aspects (comfort, intelligibility)
-
28/10/200812 DTU Informatics, Technical University of Denmark12DTU Informatics, Technical University of Denmark
Jan Larsen
Example: Bursts of machine gun shots
-
28/10/200813 DTU Informatics, Technical University of Denmark13DTU Informatics, Technical University of Denmark
Jan Larsen
Results
• Highly non-stationary noise– Spectral subtraction breaks
down due to stationarityassumption
• Almost stationary noise– Proposed method works
equally well or better than spectral subtraction
Proposed methodSpectral subtraction
-9 -6 -3 0 3 60
5
10
15
-9 -6 -3 0 3 60
5
10
15
-9 -6 -3 0 3 60
5
10
15
-9 -6 -3 0 3 60
5
10
15
Traf
ficR
esta
uran
tSt
ring
quar
tet
Mac
hine
gun
SN
R
Signal-to-noise ratio (SNR)
SN
RS
NR
SN
R
Non
-sta
tiona
ryS
tatio
nary
More sound examples at
www.mikkelschmidt.dk
-
28/10/200814 DTU Informatics, Technical University of Denmark14DTU Informatics, Technical University of Denmark
Jan Larsen
Potential ways ahead• Prior model speech activity pattern, e.g. using HMM• Harmonic prior for speech basis• Full Bayesian inference in the model (working on Gibbs
sampling approach)• Better residual models be including phase uncertainty
(Rayleigh distribution) (Parry and Essa 2007)• Advanced post-processing (weighted, thresholded spectral
subtraction, and smoothing) can help
-
28/10/200815 DTU Informatics, Technical University of Denmark15DTU Informatics, Technical University of Denmark
Jan Larsen
• A probabilistic non-negative latent variable decomposition method was presented
• A full Bayesian inference is possible, we resorted to simple a step-wise procedure
• The method has potential over classical methods for handling very non-stationary strong noise conditions
• As an essential output of the method is a good noise estimate, it can also be used as an integral part of other methods we are based on noise estimation.
Conclusion
Thank you for your attention!
Reduction of non-stationary noise using a non-negative latent variable decompositionNoise ReductionSingle channel source separation is hardThe spectrum of alternative methodsOur approach in briefSignal RepresentationNon-negative latent variable modelNon-negative latent variable modelA three-step simple approximate learning procedure for speechExperimental setupQuality MeasureExample: Bursts of machine gun shotsResultsPotential ways aheadConclusion