Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic...
-
date post
20-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic...
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Meeting 7
Esfandiar Zavarehei
Department of Electronic and Computer Engineering
Brunel University
23 November, 2005
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Contents
• Kalman Filter: Speech and noise tracking
• HNM Model: The degree of “Harmonicity”
• Bandwidth extension
• Future work: noise reduction using HNM
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Kalman Filter: Speech and noise tracking
• Previous method: Modelling speech with an AR model
nennn rrrr GSFS 1
nDnnX rrr HS
1 Tr r rn S n N S n S
nananana
n
NNN
r
121
1000
0100
0010
F
1
N
r k r rk
S n a n S n k e n
X: NoisyS: SpeechD: Noise
nnXnnn rrrrr SHKSS ˆˆˆ
12 nnnn rT
rT
rr HHPHPK Noise Variance
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Kalman Filter: Speech and noise tracking (cont.)
• New method: Modelling speech AND noise with AR models
1
M
r k r rk
D n b n D n k g n
1r r r c rn n n n X A X G E
r c rX n nH X
TT T
r r rn n n X S D
r
rr
nn
n
F 0A
0 B
1 2 1
0 1 0 0
0 0 1 0
0 0 0 1r
M M M
n
b n b n b n b n
B
ˆ ˆ ˆr r rc r c rn n n X n n X X K H X
1T T
rc rc c c rc cn n n K P H H P H
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Kalman Filter: Speech and noise tracking (cont.)
• The noise AR models are obtained from noise-only periods.
• Results from the new model sound more natural
Clean Speech
Noisy Speech
Kalman Old
Kalman New
SNR
Method -5 0 5 10
Car
DFTKUN 2.41 2.80 3.13 3.43
DFTKCN 2.51 2.90 3.20 3.49
MMSE 2.39 2.75 3.10 3.38
Wiener 2.36 2.74 3.10 3.36
PSS 2.44 2.79 3.08 3.28
Train
DFTKUN 1.81 2.22 2.62 2.98
DFTKCN 1.90 2.30 2.69 3.05
MMSE 1.78 2.20 2.58 2.89
Wiener 1.48 1.99 2.45 2.82
PSS 1.65 2.12 2.51 2.84
WGN
DFTKUN 1.90 2.29 2.64 2.92
DFTKCN 1.99 2.35 2.68 3.02
MMSE 1.90 2.22 2.58 2.90
Wiener 1.85 2.21 2.61 2.91
PSS 1.95 2.26 2.58 2.84
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
HNM Model
• Harmonic sub-bands are modelled as the sum of a Gaussian and some random noise
-60 -40 -20 0 20 40 600
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Hz
WH
Energy=1
2 1HW
2k
k
f f
kf f f
A X f
2
1
k
k
f f
k H kf f f
kk
X f A W f f
VA
2
2.2exp
60H
fW f
1k k H k k kX f A V W f f V R f f 2 1HR
R: Random Noise with Rayleigh Distribution
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
HNM Model
• Sample Reconstructed Frame
0 1000 2000 3000 4000 5000 6000 7000 8000-14
-12
-10
-8
-6
-4
-2
0
Hz
dB
Original
Reconstructed
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
HNM Model
0 5 10 15 20 25 30 35 40 45 505
10
15
Harmonic Index
Ak
0 5 10 15 20 25 30 35 40 45 500
0.2
0.4
0.6
0.8
Vk
Harmonic Index
0 1000 2000 3000 4000 5000 6000 7000 8000-15
-10
-5
0
Hz
Original Synthesized, PESQ:3.91
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
HNM Model
• Noise severely affects the Ak
• Pitch, Harmonicity and Harmonic frequencies are much less distorted by noise
• Simple analysis/synthesis of noisy speech improves its quality (SNR<10dB)
-5 0 5 10 15 201.8
2
2.2
2.4
2.6
2.8
3
3.2
3.4
3.6
3.8
SNR (dB)
PE
SQ
Clean Pitch
-5 0 5 10 15 201.8
2
2.2
2.4
2.6
2.8
3
3.2
3.4
3.6
3.8
SNR (dB)
PE
SQ
Noisy Pitch
All NoisyNoisy AmpNoisy TrackNoisy HarmonicityClean AllActual Noisy Signal
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
LP-HNM
• Decompose the signal to an LP model (AR or LSF) and an HNM model of the residual (fk,Ak,Vk)
• Amplitude can be assumed to be equal (whitened by inverse modelling)
• Frequencies also may be assumed to be multiples of the fundamental frequency (later displaced slightly by LP modelling)
LP-HNM synthesizedPESQ: 3.50
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Bandwidth Extension
• One application of the model is Bandwidth Extension for getting 16KHz speech quality from 8KHz Speech
Trained LP-HNM Model
LP-HNM Analysis LP-HNM Analysis
8KHz Speech Signal
16KHz Speech Signal
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Bandwidth Extension
• Codebook Mapping is used to obtain higher LPF coefficients from lower LPF coefficients extracted from 8KHz signal
• Similar method is used to obtain the harmonicity degree of higher sub-bands
LSF 1 LSF 12 LSF 1 LSF 24
LSF 13 LSF 24
OR
Codebook Shadow Codebook
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Bandwidth Extension
• A shadow codebook for LP gain ratio (G8/G16) is used for gain mapping
• Phase is extrapolated assuming a linear phase for the harmonics, some random noise is added to unvoiced sub-bands
Extract LSF1-12 G8 and
Excitation
Harmonicity degree of
Harmonics
Excitation
Codebook Mapping
Codebook Mapping
Codebook Mapping
L1-12
Vk8
G8
Pitch Extraction
Phase Reconstruction
L1-24
G16
Phase16KHz
Vk16
HNM Magnitude Synthesis
LP Magnitude Synthesis
x
LPF HPF+
• The performance of the system deteriorates in noise
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Future Work
• Tracking the HNM parameters using Kalman filter, in other words, rather than tracking DFT trajectories in one frequency bin, it might be better to track only the harmonic bins (reduced computational complexity) along the harmonic frequencies (intuitively makes more sense!)
0 100 200 300 400 500 600 700 800 900 10000
1000
2000
3000
4000
5000
6000
7000
8000
Frame index
Hz
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Future Work
• Some harmonics proved very difficult to recover from noise (e.g. 1-3). Investigate the possibility of a similar model based approach as the BWE method for estimating parameters of those harmonics. The Harmonicity of the sub-bands and the reciprocal noise level at those frequencies may be used as weights in the mapping process.
Clean Speech
De-noised Speech