Signal Analysis Using Autoregressive Models of Amplitude Modulation

121
Signal Analysis Using Autoregressive Models of Amplitude Modulation Sriram Ganapathy Advisor - Hynek Hermansky 11-18-2011

description

Signal Analysis Using Autoregressive Models of Amplitude Modulation. Sriram Ganapathy Advisor - Hynek Hermansky 11-18-2011. Overview. Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary. Overview. Introduction AR Model of Hilbert Envelopes - PowerPoint PPT Presentation

Transcript of Signal Analysis Using Autoregressive Models of Amplitude Modulation

Page 1: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Signal Analysis Using Autoregressive Models of

Amplitude Modulation

Sriram Ganapathy

Advisor - Hynek Hermansky11-18-2011

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

=

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

=

>
>
>

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Non-Unique

AM

FM

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Non-Unique

AM

FM

119909 (119905 )=119898 (119905 )lowast cos 120596119900119905+120593 (119905 )

Desired Properties of AM Linearity

Continuity

Harmonicity

120572 119909 (119905 )120572119898 (119905 )

119909 (119905 )+120575 119909 (119905 )119898 (119905 )+120575119898 (119905 )

cos (120596119900119905)1

Uniquely satisfied by the analytic signal

Desired Properties of AM

119909 (119905) H 119895+iquest iquestoriquest 119898(119905)

120596 (119905)120596119900+120593 (119905)

119889119889119905

- Hilbert transform

119909119886 (119905)

- analytic signal ndash Hilbert envelope

However the Hilbert transform filter is infinitely long and can cause artifacts for finite length signals

Need for modeling the Hilbert envelope without explicit computation of the Hilbert transform

Desired Properties of AM

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

119883 [119896] 119883 119886 [119896]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N

N-point DCT

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with

N-zeros

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 2: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

=

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

=

>
>
>

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Non-Unique

AM

FM

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Non-Unique

AM

FM

119909 (119905 )=119898 (119905 )lowast cos 120596119900119905+120593 (119905 )

Desired Properties of AM Linearity

Continuity

Harmonicity

120572 119909 (119905 )120572119898 (119905 )

119909 (119905 )+120575 119909 (119905 )119898 (119905 )+120575119898 (119905 )

cos (120596119900119905)1

Uniquely satisfied by the analytic signal

Desired Properties of AM

119909 (119905) H 119895+iquest iquestoriquest 119898(119905)

120596 (119905)120596119900+120593 (119905)

119889119889119905

- Hilbert transform

119909119886 (119905)

- analytic signal ndash Hilbert envelope

However the Hilbert transform filter is infinitely long and can cause artifacts for finite length signals

Need for modeling the Hilbert envelope without explicit computation of the Hilbert transform

Desired Properties of AM

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

119883 [119896] 119883 119886 [119896]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N

N-point DCT

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with

N-zeros

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 3: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

=

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

=

>
>
>

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Non-Unique

AM

FM

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Non-Unique

AM

FM

119909 (119905 )=119898 (119905 )lowast cos 120596119900119905+120593 (119905 )

Desired Properties of AM Linearity

Continuity

Harmonicity

120572 119909 (119905 )120572119898 (119905 )

119909 (119905 )+120575 119909 (119905 )119898 (119905 )+120575119898 (119905 )

cos (120596119900119905)1

Uniquely satisfied by the analytic signal

Desired Properties of AM

119909 (119905) H 119895+iquest iquestoriquest 119898(119905)

120596 (119905)120596119900+120593 (119905)

119889119889119905

- Hilbert transform

119909119886 (119905)

- analytic signal ndash Hilbert envelope

However the Hilbert transform filter is infinitely long and can cause artifacts for finite length signals

Need for modeling the Hilbert envelope without explicit computation of the Hilbert transform

Desired Properties of AM

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

119883 [119896] 119883 119886 [119896]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N

N-point DCT

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with

N-zeros

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 4: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

=

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

=

>
>
>

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Non-Unique

AM

FM

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Non-Unique

AM

FM

119909 (119905 )=119898 (119905 )lowast cos 120596119900119905+120593 (119905 )

Desired Properties of AM Linearity

Continuity

Harmonicity

120572 119909 (119905 )120572119898 (119905 )

119909 (119905 )+120575 119909 (119905 )119898 (119905 )+120575119898 (119905 )

cos (120596119900119905)1

Uniquely satisfied by the analytic signal

Desired Properties of AM

119909 (119905) H 119895+iquest iquestoriquest 119898(119905)

120596 (119905)120596119900+120593 (119905)

119889119889119905

- Hilbert transform

119909119886 (119905)

- analytic signal ndash Hilbert envelope

However the Hilbert transform filter is infinitely long and can cause artifacts for finite length signals

Need for modeling the Hilbert envelope without explicit computation of the Hilbert transform

Desired Properties of AM

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

119883 [119896] 119883 119886 [119896]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N

N-point DCT

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with

N-zeros

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 5: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

=

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

=

>
>
>

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Non-Unique

AM

FM

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Non-Unique

AM

FM

119909 (119905 )=119898 (119905 )lowast cos 120596119900119905+120593 (119905 )

Desired Properties of AM Linearity

Continuity

Harmonicity

120572 119909 (119905 )120572119898 (119905 )

119909 (119905 )+120575 119909 (119905 )119898 (119905 )+120575119898 (119905 )

cos (120596119900119905)1

Uniquely satisfied by the analytic signal

Desired Properties of AM

119909 (119905) H 119895+iquest iquestoriquest 119898(119905)

120596 (119905)120596119900+120593 (119905)

119889119889119905

- Hilbert transform

119909119886 (119905)

- analytic signal ndash Hilbert envelope

However the Hilbert transform filter is infinitely long and can cause artifacts for finite length signals

Need for modeling the Hilbert envelope without explicit computation of the Hilbert transform

Desired Properties of AM

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

119883 [119896] 119883 119886 [119896]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N

N-point DCT

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with

N-zeros

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 6: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

=

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

=

>
>
>

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Non-Unique

AM

FM

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Non-Unique

AM

FM

119909 (119905 )=119898 (119905 )lowast cos 120596119900119905+120593 (119905 )

Desired Properties of AM Linearity

Continuity

Harmonicity

120572 119909 (119905 )120572119898 (119905 )

119909 (119905 )+120575 119909 (119905 )119898 (119905 )+120575119898 (119905 )

cos (120596119900119905)1

Uniquely satisfied by the analytic signal

Desired Properties of AM

119909 (119905) H 119895+iquest iquestoriquest 119898(119905)

120596 (119905)120596119900+120593 (119905)

119889119889119905

- Hilbert transform

119909119886 (119905)

- analytic signal ndash Hilbert envelope

However the Hilbert transform filter is infinitely long and can cause artifacts for finite length signals

Need for modeling the Hilbert envelope without explicit computation of the Hilbert transform

Desired Properties of AM

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

119883 [119896] 119883 119886 [119896]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N

N-point DCT

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with

N-zeros

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 7: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

=

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

=

>
>
>

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Non-Unique

AM

FM

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Non-Unique

AM

FM

119909 (119905 )=119898 (119905 )lowast cos 120596119900119905+120593 (119905 )

Desired Properties of AM Linearity

Continuity

Harmonicity

120572 119909 (119905 )120572119898 (119905 )

119909 (119905 )+120575 119909 (119905 )119898 (119905 )+120575119898 (119905 )

cos (120596119900119905)1

Uniquely satisfied by the analytic signal

Desired Properties of AM

119909 (119905) H 119895+iquest iquestoriquest 119898(119905)

120596 (119905)120596119900+120593 (119905)

119889119889119905

- Hilbert transform

119909119886 (119905)

- analytic signal ndash Hilbert envelope

However the Hilbert transform filter is infinitely long and can cause artifacts for finite length signals

Need for modeling the Hilbert envelope without explicit computation of the Hilbert transform

Desired Properties of AM

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

119883 [119896] 119883 119886 [119896]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N

N-point DCT

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with

N-zeros

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 8: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

=

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

=

>
>
>

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Non-Unique

AM

FM

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Non-Unique

AM

FM

119909 (119905 )=119898 (119905 )lowast cos 120596119900119905+120593 (119905 )

Desired Properties of AM Linearity

Continuity

Harmonicity

120572 119909 (119905 )120572119898 (119905 )

119909 (119905 )+120575 119909 (119905 )119898 (119905 )+120575119898 (119905 )

cos (120596119900119905)1

Uniquely satisfied by the analytic signal

Desired Properties of AM

119909 (119905) H 119895+iquest iquestoriquest 119898(119905)

120596 (119905)120596119900+120593 (119905)

119889119889119905

- Hilbert transform

119909119886 (119905)

- analytic signal ndash Hilbert envelope

However the Hilbert transform filter is infinitely long and can cause artifacts for finite length signals

Need for modeling the Hilbert envelope without explicit computation of the Hilbert transform

Desired Properties of AM

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

119883 [119896] 119883 119886 [119896]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N

N-point DCT

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with

N-zeros

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 9: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

=

>
>
>

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Non-Unique

AM

FM

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Non-Unique

AM

FM

119909 (119905 )=119898 (119905 )lowast cos 120596119900119905+120593 (119905 )

Desired Properties of AM Linearity

Continuity

Harmonicity

120572 119909 (119905 )120572119898 (119905 )

119909 (119905 )+120575 119909 (119905 )119898 (119905 )+120575119898 (119905 )

cos (120596119900119905)1

Uniquely satisfied by the analytic signal

Desired Properties of AM

119909 (119905) H 119895+iquest iquestoriquest 119898(119905)

120596 (119905)120596119900+120593 (119905)

119889119889119905

- Hilbert transform

119909119886 (119905)

- analytic signal ndash Hilbert envelope

However the Hilbert transform filter is infinitely long and can cause artifacts for finite length signals

Need for modeling the Hilbert envelope without explicit computation of the Hilbert transform

Desired Properties of AM

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

119883 [119896] 119883 119886 [119896]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N

N-point DCT

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with

N-zeros

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 10: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Non-Unique

AM

FM

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Non-Unique

AM

FM

119909 (119905 )=119898 (119905 )lowast cos 120596119900119905+120593 (119905 )

Desired Properties of AM Linearity

Continuity

Harmonicity

120572 119909 (119905 )120572119898 (119905 )

119909 (119905 )+120575 119909 (119905 )119898 (119905 )+120575119898 (119905 )

cos (120596119900119905)1

Uniquely satisfied by the analytic signal

Desired Properties of AM

119909 (119905) H 119895+iquest iquestoriquest 119898(119905)

120596 (119905)120596119900+120593 (119905)

119889119889119905

- Hilbert transform

119909119886 (119905)

- analytic signal ndash Hilbert envelope

However the Hilbert transform filter is infinitely long and can cause artifacts for finite length signals

Need for modeling the Hilbert envelope without explicit computation of the Hilbert transform

Desired Properties of AM

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

119883 [119896] 119883 119886 [119896]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N

N-point DCT

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with

N-zeros

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 11: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Introduction Sub-band speech and audio signals - product of

smooth modulation with a fine carrier

Non-Unique

AM

FM

119909 (119905 )=119898 (119905 )lowast cos 120596119900119905+120593 (119905 )

Desired Properties of AM Linearity

Continuity

Harmonicity

120572 119909 (119905 )120572119898 (119905 )

119909 (119905 )+120575 119909 (119905 )119898 (119905 )+120575119898 (119905 )

cos (120596119900119905)1

Uniquely satisfied by the analytic signal

Desired Properties of AM

119909 (119905) H 119895+iquest iquestoriquest 119898(119905)

120596 (119905)120596119900+120593 (119905)

119889119889119905

- Hilbert transform

119909119886 (119905)

- analytic signal ndash Hilbert envelope

However the Hilbert transform filter is infinitely long and can cause artifacts for finite length signals

Need for modeling the Hilbert envelope without explicit computation of the Hilbert transform

Desired Properties of AM

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

119883 [119896] 119883 119886 [119896]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N

N-point DCT

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with

N-zeros

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 12: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Desired Properties of AM Linearity

Continuity

Harmonicity

120572 119909 (119905 )120572119898 (119905 )

119909 (119905 )+120575 119909 (119905 )119898 (119905 )+120575119898 (119905 )

cos (120596119900119905)1

Uniquely satisfied by the analytic signal

Desired Properties of AM

119909 (119905) H 119895+iquest iquestoriquest 119898(119905)

120596 (119905)120596119900+120593 (119905)

119889119889119905

- Hilbert transform

119909119886 (119905)

- analytic signal ndash Hilbert envelope

However the Hilbert transform filter is infinitely long and can cause artifacts for finite length signals

Need for modeling the Hilbert envelope without explicit computation of the Hilbert transform

Desired Properties of AM

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

119883 [119896] 119883 119886 [119896]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N

N-point DCT

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with

N-zeros

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 13: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Uniquely satisfied by the analytic signal

Desired Properties of AM

119909 (119905) H 119895+iquest iquestoriquest 119898(119905)

120596 (119905)120596119900+120593 (119905)

119889119889119905

- Hilbert transform

119909119886 (119905)

- analytic signal ndash Hilbert envelope

However the Hilbert transform filter is infinitely long and can cause artifacts for finite length signals

Need for modeling the Hilbert envelope without explicit computation of the Hilbert transform

Desired Properties of AM

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

119883 [119896] 119883 119886 [119896]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N

N-point DCT

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with

N-zeros

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 14: Signal Analysis Using Autoregressive Models of Amplitude Modulation

However the Hilbert transform filter is infinitely long and can cause artifacts for finite length signals

Need for modeling the Hilbert envelope without explicit computation of the Hilbert transform

Desired Properties of AM

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

119883 [119896] 119883 119886 [119896]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N

N-point DCT

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with

N-zeros

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 15: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

119883 [119896] 119883 119886 [119896]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N

N-point DCT

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with

N-zeros

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 16: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

119883 [119896] 119883 119886 [119896]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N

N-point DCT

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with

N-zeros

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 17: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

119883 [119896] 119883 119886 [119896]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N

N-point DCT

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with

N-zeros

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 18: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119883 119886 [119896] =

Signal with zero mean in time and frequency domain for n = 0hellipN-1

0 for k geN 22 119883 [119896] for k lt N 2

119883 [119896] 119883 119886 [119896]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N

N-point DCT

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with

N-zeros

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 19: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N

N-point DCT

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with

N-zeros

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 20: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N

N-point DCT

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with

N-zeros

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 21: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with

N-zeros

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 22: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Discrete-time analytic signal

AR Model of Hilbert Envelopes

119876119886 [119896] =

Let - even-symmetrized version of

0 k geN2119876[119896 ] k lt N

119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT

119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 23: Signal Analysis Using Autoregressive Models of Amplitude Modulation

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

analytic signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 24: Signal Analysis Using Autoregressive Models of Amplitude Modulation

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 25: Signal Analysis Using Autoregressive Models of Amplitude Modulation

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

SignalSpectrum

PowerSpectrum

Auto-corr

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 26: Signal Analysis Using Autoregressive Models of Amplitude Modulation

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]

Spectrum ofeven-sym

Analytic Signal

Zero-paddedDCT sequence

We have shown -

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 27: Signal Analysis Using Autoregressive Models of Amplitude Modulation

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Spectrum ofHilbert env for

even-sym signal

Auto-correlation of

DCT sequence

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 28: Signal Analysis Using Autoregressive Models of Amplitude Modulation

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

Hilb env ofeven-symm

signal

Auto-corr ofDCT

F

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 29: Signal Analysis Using Autoregressive Models of Amplitude Modulation

AR Model of Hilbert Envelopes

119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -

F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]

AR model ofHilb env

Auto-corr ofDCT

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 30: Signal Analysis Using Autoregressive Models of Amplitude Modulation

LP in Time and Frequency

Duality

TimePowerSpec

LP

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 31: Signal Analysis Using Autoregressive Models of Amplitude Modulation

LP in Time and Frequency

Duality

TimePowerSpec

LP

Duality

DCTHilbEnv

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 32: Signal Analysis Using Autoregressive Models of Amplitude Modulation

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 33: Signal Analysis Using Autoregressive Models of Amplitude Modulation

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 34: Signal Analysis Using Autoregressive Models of Amplitude Modulation

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

DCT

LP

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 35: Signal Analysis Using Autoregressive Models of Amplitude Modulation

FDLPLinear prediction on the cosine transform of the signal

Hilb Env

FDLP Env

Speech

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 36: Signal Analysis Using Autoregressive Models of Amplitude Modulation

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 37: Signal Analysis Using Autoregressive Models of Amplitude Modulation

FDLP for Speech Representation

DCT

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 38: Signal Analysis Using Autoregressive Models of Amplitude Modulation

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 39: Signal Analysis Using Autoregressive Models of Amplitude Modulation

FDLP for Speech Representation

DCT

LP

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 40: Signal Analysis Using Autoregressive Models of Amplitude Modulation

FDLP for Speech Representation

DCTLP

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 41: Signal Analysis Using Autoregressive Models of Amplitude Modulation

FDLP Spectrogram

FDLP for Speech Representation

Time

Freq

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 42: Signal Analysis Using Autoregressive Models of Amplitude Modulation

FDLP Spectrogram

Conventional Approaches

FDLP for Speech Representation

Time

Freq

Time

Freq

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 43: Signal Analysis Using Autoregressive Models of Amplitude Modulation

FDLP versus Mel Spectrogram

FDLP

Mel

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 44: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 45: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 46: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Resolution of FDLP Analysis

FDLP

Mel

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 47: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Resolution of FDLP Analysis

FDLP

Mel

Res = (Critical Width)-1

Sig

FDLP Env

Sig

FDLP Env

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 48: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Resolution of FDLP Analysis

FDLP

Mel

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 49: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 50: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Properties of FDLP Analysis

FDLP

Mel

bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of

smoothness ndash AR model captures perceptually important high energy

regions of the signal

bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion

bull Analysis in long-term windows and narrow sub-bands

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 51: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

CleanSpeech

RoomResponse = Revb

Speech

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 52: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the long-term DFT domain this translates

CleanSpeech

RoomResponse = Revb

Speech

xCleanDFT

ResponseDFT = Revb

DFT

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 53: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 54: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Reverberation

When speech is corrupted with convolutive distortion like room reverberation

In the DFT domain this translates to a multiplication In the sub-band

In narrow bands is slowly varying

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 55: Signal Analysis Using Autoregressive Models of Amplitude Modulation

FDLP envelope of band using all-pole parameters hellip is given by

When the sub-band signal is multiplied by the gain is modified

Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with

Gain Normalization in FDLP

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 56: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Gain Normalization in FDLPWithout gain norm

With gain norm

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 57: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 58: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 59: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009

FM

AM

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 60: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Short-term Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 61: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 62: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)

Integration in frequency axis to convert to mel scale

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 63: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Short-term Features

Sub-band

Window

InputDCT FDLP Gain

Norm

Log+

DCT

FeatEnergyInt

Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis

Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 64: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Speech Recognition

TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally

reverberated HMM-GMM system

Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)

Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 65: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Speech Recognition

SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008

Clean Reverb0

10

20PLP-CMSLTLSSFDLP-SW

ER

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 66: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)

Has telephone speech and far-field speech

GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss

Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 67: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Tel Mic Cross domain

10

20

30

MFCCFDLP-SD

CF

Speaker Verification

SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 68: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 69: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Modulation Features

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 70: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 71: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 72: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 73: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components

14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 74: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Phoneme Recognition

TIMIT Database (8 kHz) Clean training data test data can be clean additive noise

reverberated or telephone channel Multi-layer perceptron (MLP) based system

MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)

Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 75: Signal Analysis Using Autoregressive Models of Amplitude Modulation

SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010

Clean Add Noise

Reverb Tel 30

45

60

75

PLP-9ETSI-9FDLP-M

PE

R

Phoneme Recognition

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 76: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Outline of Applications

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 77: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Audio Coding

Sub-bandDecomposition FDLP

GainNorm

Quant

Short-term Features for Speaker amp

Speech Recog

Modulation Features for Phoneme Recog

Wide-band Speech amp Audio Coding

Input

Signal

FM

AM

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 78: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Audio Coding

QMFAnalysis

FDLPInput

1

32

MDCT

Env

Carr

Q

Q

Q-1

Q-1 IMDCT

MulQMF

Synthesis

1

32

Output

SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 79: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Subjective Evaluations

SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010

Series10

20

40

60

80

100

Hidden RefLPF7kMP3FDLPAAC

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 80: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 81: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Overview

Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 82: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 83: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 84: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Summary

Employing AR modeling for estimating amplitude modulations

Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum

Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 85: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 86: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 87: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Our Contributions

Simple mathematical analysis for AR model of Hilbert envelopes

Investigating the resolution properties of FDLP

Gain normalization of FDLP Envelopes

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 88: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 89: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 90: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Our Contributions

Short-term feature extraction using FDLP ndashImprovements in reverb speech recog

Modulation feature extraction ndash Phoneme recognition in noisy speech

Speech and audio codec development using AM-FM signals from FDLP

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 91: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella

Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen

Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss

IBM personnel ndash Jason Pelecanos Mohamed Omar

Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 92: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Thank You

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 93: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 94: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Evidences

Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]

Psycho-physical evidences - Perceptual importance of modulation frequencies

[Drullman et al 1994] Syllable recognition from temporal modulations with

minimal spectral cues [Shannon et al 1995]

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 95: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Applications

Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 96: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

nn-1n-2n-3

a3a2

a1

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 97: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Linear Prediction ndash Time Domain

Current sample expressed as a linear combination of past samples

Model parameters are solved by minimizing the residual sum of squares

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 98: Signal Analysis Using Autoregressive Models of Amplitude Modulation

AR model of Power Spectrum

Filter interpretation [Makhoul 1975]

From Parsevalrsquos theorem

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 99: Signal Analysis Using Autoregressive Models of Amplitude Modulation

AR model of Power Spectrum

By definition Let

Thus parameters are solved by minimizing

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 100: Signal Analysis Using Autoregressive Models of Amplitude Modulation

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 101: Signal Analysis Using Autoregressive Models of Amplitude Modulation

AR model of Power Spectrum

Solution of the linear prediction yields an all-pole model of the power spectrum

Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 102: Signal Analysis Using Autoregressive Models of Amplitude Modulation

AR model of power spectrum

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 103: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Hilbert Envelope - Definition Analytic signal is the sum of the signal and its

quadrature component

where H denotes the Hilbert transform

Hilbert envelope is the squared magnitude of the analytic signal

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 104: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Hilbert Envelope

c FDLP Env

b Hilb Env

a Speech

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 105: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Duality

LP

FDLP

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 106: Signal Analysis Using Autoregressive Models of Amplitude Modulation

LP in Time and Frequency

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 107: Signal Analysis Using Autoregressive Models of Amplitude Modulation

AM-FM Decomposition

a Signalb Hilb Envc FDLP Envd AM compe FM comp

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 108: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Spectrogram Comparison

FDLP

SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010

PLP

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 109: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 110: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 111: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 112: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Dealing with Convolutive Distortions

Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames

to be similar ndash suppresses short-term artifacts

Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]

Gain normalization deals with short and long term distortions within a single long-term frame

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 113: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Feature Comparison

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 114: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Modulation Feature Extraction

Critical-band

Window

InputDCT FDLP

Static

Dynamic

DCT

DCT

200ms

Sub-bandFeat

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 115: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Modulation Features

SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009

a Signalb Hilb Envc FDLP Envd Log compe Dyn comp

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 116: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Noise Compensation in FDLP

Critical-band

Window

LinearPred

Signal

+Noise

DCT IDFT | |2 DFTFDLP

Env

When speech is corrupted with additive noise

y The noise component is additive in the non-parametric Hilbert envelope

domain (assuming the signal and noise are uncorrelated)

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 117: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Noise Compensation in FDLP

Critical-band

Window

InputDCT IDFT | |2 DFTWiener

Filtering

VAD

Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise

Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 118: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Noise Compensation in FDLP

SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 119: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Introduction

Time

Freq

uenc

y

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 120: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)
Page 121: Signal Analysis Using Autoregressive Models of Amplitude Modulation

Introduction

Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)

Spectrum is sampled at a preset rate before further modelingprocessing stages

Contextual information is typically processed with time-series models such as HMM

  • Signal Analysis Using Autoregressive Models of Amplitude Modula
  • Overview
  • Overview (2)
  • Introduction
  • Introduction (2)
  • Introduction (3)
  • Introduction (4)
  • Introduction (5)
  • Introduction (6)
  • Introduction (7)
  • Introduction (8)
  • Desired Properties of AM
  • Desired Properties of AM (2)
  • Desired Properties of AM (3)
  • Overview (3)
  • Overview (4)
  • AR Model of Hilbert Envelopes
  • AR Model of Hilbert Envelopes (2)
  • AR Model of Hilbert Envelopes (3)
  • AR Model of Hilbert Envelopes (4)
  • AR Model of Hilbert Envelopes (5)
  • AR Model of Hilbert Envelopes (6)
  • AR Model of Hilbert Envelopes (7)
  • AR Model of Hilbert Envelopes (8)
  • AR Model of Hilbert Envelopes (9)
  • AR Model of Hilbert Envelopes (10)
  • AR Model of Hilbert Envelopes (11)
  • AR Model of Hilbert Envelopes (12)
  • AR Model of Hilbert Envelopes (13)
  • LP in Time and Frequency
  • LP in Time and Frequency (2)
  • FDLP
  • FDLP (2)
  • FDLP (3)
  • FDLP (4)
  • FDLP for Speech Representation
  • FDLP for Speech Representation (2)
  • FDLP for Speech Representation (3)
  • FDLP for Speech Representation (4)
  • FDLP for Speech Representation (5)
  • FDLP for Speech Representation (6)
  • FDLP for Speech Representation (7)
  • FDLP versus Mel Spectrogram
  • Overview (5)
  • Overview (6)
  • Resolution of FDLP Analysis
  • Resolution of FDLP Analysis (2)
  • Resolution of FDLP Analysis (3)
  • Properties of FDLP Analysis
  • Properties of FDLP Analysis (2)
  • Reverberation
  • Reverberation (2)
  • Reverberation (3)
  • Reverberation (4)
  • Gain Normalization in FDLP
  • Gain Normalization in FDLP (2)
  • Overview (7)
  • Overview (8)
  • Outline of Applications
  • Short-term Features
  • Short-term Features (2)
  • Short-term Features (3)
  • Short-term Features (4)
  • Speech Recognition
  • Speech Recognition (2)
  • Speaker Verification
  • Speaker Verification (2)
  • Outline of Applications (2)
  • Modulation Features
  • Modulation Feature Extraction
  • Modulation Feature Extraction (2)
  • Modulation Feature Extraction (3)
  • Modulation Feature Extraction (4)
  • Phoneme Recognition
  • Phoneme Recognition (2)
  • Outline of Applications (3)
  • Audio Coding
  • Audio Coding (2)
  • Subjective Evaluations
  • Overview (9)
  • Overview (10)
  • Summary
  • Summary (2)
  • Summary (3)
  • Our Contributions
  • Our Contributions (2)
  • Our Contributions (3)
  • Our Contributions (4)
  • Our Contributions (5)
  • Our Contributions (6)
  • Acknowledgements
  • Slide 92
  • Evidences
  • Evidences (2)
  • Applications
  • Linear Prediction ndash Time Domain
  • Linear Prediction ndash Time Domain (2)
  • AR model of Power Spectrum
  • AR model of Power Spectrum (2)
  • AR model of Power Spectrum (3)
  • AR model of Power Spectrum (4)
  • AR model of power spectrum
  • Hilbert Envelope - Definition
  • Hilbert Envelope
  • Duality
  • LP in Time and Frequency (3)
  • AM-FM Decomposition
  • Spectrogram Comparison
  • Dealing with Convolutive Distortions
  • Dealing with Convolutive Distortions (2)
  • Dealing with Convolutive Distortions (3)
  • Dealing with Convolutive Distortions (4)
  • Feature Comparison
  • Modulation Feature Extraction (5)
  • Modulation Features (2)
  • Noise Compensation in FDLP
  • Noise Compensation in FDLP (2)
  • Noise Compensation in FDLP (3)
  • Introduction (9)
  • Introduction (10)
  • Introduction (11)