Communications & Multimedia Signal Processing Refinement in FTLP-HNM system for Speech Enhancement...

Post on 21-Dec-2015

221 views 0 download

Tags:

Transcript of Communications & Multimedia Signal Processing Refinement in FTLP-HNM system for Speech Enhancement...

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Refinement in FTLP-HNM system for Speech Enhancement

Qin Yan

Communication & Multimedia Signal Processing Group

School of Engineering and Design, Brunel University

23 Nov, 2005

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing Outline

• Review of FTLP-HNM system;

• Parameters estimation of HNM (incl. pitch/harmonic tracking in noise)

• Objective results of pitch, harmonic tracking and FTLP-HNM system

• Demo of enhanced speeches from old archive recordings

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Overview of FTLP-HNM Speech Enhancement System

LP ModelDecomposition

Pre-cleaning

HNM of Residual

KalmanFilters

Noisy Speech

Formant Estimation

Kalman Filters

Synthesized LPModel

LP ModelRe-composition

Enhanced Speech

Pitch Estimation

Voiced/Unvoiced

Classification

Formant estimation

HNM estimation

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

)()()(ˆ tStStS nh

)(

)(

)(0)()(tL

tLk

ttjkwkh etAtS

)](),()[()( tbthtetSn

In HNM, speech is decomposed to two parts : Harmonic part and noise part.

where L(t) denotes the number of harmonic included in the harmonic part, ω0 denotes the pitch frequency.

Harmonic :

Noise :

Synthesized Speech :

where h the a time-varying autoregressive(AR) model and b is white Gaussian noise.

Harmonic plus Noise Model

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

HNM - Pitch Tracking

MaxF

k

MkF

MkFl

lXFEFE1

00

0

0

)(log)(

MaxF

k

MkF

MkFl

lXlWFEFE1

00

0

0

)(log)()(

• In noisy condition the error function is modified to including SNR dependent weights

The weighting function W(l) is a SNR-dependent given by)(1

)()(

lSNR

lSNRlW

• Error function in frequency domain

NOTE:• The input speech frame is bandpassed to eliminated the parts which don’t contain explicit harmonics.

• For Each speech frame, it outputs several pitch candidates (N=3) and Viterbi algorithm then generates the final pitch tracks.

•It might be useful to have candidates from this method and traditional autocorrelation method.

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

0

0.05

0.1

0.15

0.2

0 5 10 15 20SNR(dB)

Erro

r %

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 5 10 15 20SNR (dB)

Err

or %

Improved methed with weightsImproved method without weightsGriffin's method

Figure - Comparison of the performance of different pitch track methods for speech in (a) train noise (b) car noise from 0dB SNR to clean.

Results of Pitch Tracking

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing HNM - Harmonic Tracking

Peak picking

Pitch Tracking

Noise Speech

VADNoise model

FFTHarmonic

Frequency bin tracks

Harmonic Track

Candidates

Smoothed Harmonic

Magnitude by Kalman filter

Tracking

• Data structure of harmonic track candidates are improved and speed up the whole system.

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing Results of Harmonic Tracking in Clean Speech

Figure - An illustration of pitch tracks of a speech segment at sampling frequency of 8kHz.

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing Results of Harmonic Tracking in Noisy Speech

Pitch recovery

Harmonic Recovery

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing Synthesis of Excitation by HNM

Voiced Excitation :

Unvoiced Excitation :

)*exp(*))((*)()()()()(ˆ)(

)(

)(0

jmestdmbetAmemememL

mLk

mmjkknh

)*exp(*))((*)()()(ˆ jmestdmbmeme n

Where b(m) is unit white Gaussian noise , e(m) is original excitation and a is the phases of original excitation.

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing Results of Speech Enhancement

Figure - Comparison of the harmonicity of MMSE and FTLP-HNM systems on train noisy speech at different SNRs

15

18

21

24

0 5 10 15 20SNR(dB)

Har

mon

icity

noisy MMSE FTLP-HNM

1.4

1.7

2

2.3

2.6

2.9

0 5 10 15 20SNR(dB)

PE

SQ

noisyMMSEFTLP-HNM

110

1 , 1

110log

2frames

NHk k

N kframes k k

P PHarmonicity

NH N P

Figure - Performance of MMSE and FTLP-HNM on train noisy speech at different SNR levels.

Enhanced speech is synthesized by inverse filtering the HNM residual with cleaned LP shape.

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Original speech Enhanced speech

Demo (1)

Persian speech for Iranian King Mozaffareddin Shah

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Demo (2)

Florence Nightinggale 1890

Original speech Enhanced speech

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing