Signal Analysis Using Autoregressive Models of Amplitude Modulation
description
Transcript of Signal Analysis Using Autoregressive Models of Amplitude Modulation
Signal Analysis Using Autoregressive Models of
Amplitude Modulation
Sriram Ganapathy
Advisor - Hynek Hermansky11-18-2011
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
=
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
=
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
=
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
=
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
=
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
=
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
=
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
=
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
=
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
=
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
=
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
=
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
=
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
=
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
=
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
=
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
=
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Non-Unique
AM
FM
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Non-Unique
AM
FM
119909 (119905 )=119898 (119905 )lowast cos 120596119900119905+120593 (119905 )
Desired Properties of AM Linearity
Continuity
Harmonicity
120572 119909 (119905 )120572119898 (119905 )
119909 (119905 )+120575 119909 (119905 )119898 (119905 )+120575119898 (119905 )
cos (120596119900119905)1
Uniquely satisfied by the analytic signal
Desired Properties of AM
119909 (119905) H 119895+iquest iquestoriquest 119898(119905)
120596 (119905)120596119900+120593 (119905)
119889119889119905
- Hilbert transform
119909119886 (119905)
- analytic signal ndash Hilbert envelope
However the Hilbert transform filter is infinitely long and can cause artifacts for finite length signals
Need for modeling the Hilbert envelope without explicit computation of the Hilbert transform
Desired Properties of AM
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119883 119886 [119896] =
Signal with zero mean in time and frequency domain for n = 0hellipN-1
0 for k geN 22 119883 [119896] for k lt N 2
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119883 119886 [119896] =
Signal with zero mean in time and frequency domain for n = 0hellipN-1
0 for k geN 22 119883 [119896] for k lt N 2
119883 [119896] 119883 119886 [119896]
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N
N-point DCT
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with
N-zeros
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
analytic signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
PowerSpectrum
Auto-corr
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
Analytic Signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Spectrum ofHilbert env for
even-sym signal
Auto-correlation of
DCT sequence
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Hilb env ofeven-symm
signal
Auto-corr ofDCT
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
AR model ofHilb env
Auto-corr ofDCT
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Introduction Sub-band speech and audio signals - product of
smooth modulation with a fine carrier
Non-Unique
AM
FM
119909 (119905 )=119898 (119905 )lowast cos 120596119900119905+120593 (119905 )
Desired Properties of AM Linearity
Continuity
Harmonicity
120572 119909 (119905 )120572119898 (119905 )
119909 (119905 )+120575 119909 (119905 )119898 (119905 )+120575119898 (119905 )
cos (120596119900119905)1
Uniquely satisfied by the analytic signal
Desired Properties of AM
119909 (119905) H 119895+iquest iquestoriquest 119898(119905)
120596 (119905)120596119900+120593 (119905)
119889119889119905
- Hilbert transform
119909119886 (119905)
- analytic signal ndash Hilbert envelope
However the Hilbert transform filter is infinitely long and can cause artifacts for finite length signals
Need for modeling the Hilbert envelope without explicit computation of the Hilbert transform
Desired Properties of AM
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119883 119886 [119896] =
Signal with zero mean in time and frequency domain for n = 0hellipN-1
0 for k geN 22 119883 [119896] for k lt N 2
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119883 119886 [119896] =
Signal with zero mean in time and frequency domain for n = 0hellipN-1
0 for k geN 22 119883 [119896] for k lt N 2
119883 [119896] 119883 119886 [119896]
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N
N-point DCT
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with
N-zeros
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
analytic signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
PowerSpectrum
Auto-corr
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
Analytic Signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Spectrum ofHilbert env for
even-sym signal
Auto-correlation of
DCT sequence
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Hilb env ofeven-symm
signal
Auto-corr ofDCT
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
AR model ofHilb env
Auto-corr ofDCT
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Desired Properties of AM Linearity
Continuity
Harmonicity
120572 119909 (119905 )120572119898 (119905 )
119909 (119905 )+120575 119909 (119905 )119898 (119905 )+120575119898 (119905 )
cos (120596119900119905)1
Uniquely satisfied by the analytic signal
Desired Properties of AM
119909 (119905) H 119895+iquest iquestoriquest 119898(119905)
120596 (119905)120596119900+120593 (119905)
119889119889119905
- Hilbert transform
119909119886 (119905)
- analytic signal ndash Hilbert envelope
However the Hilbert transform filter is infinitely long and can cause artifacts for finite length signals
Need for modeling the Hilbert envelope without explicit computation of the Hilbert transform
Desired Properties of AM
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119883 119886 [119896] =
Signal with zero mean in time and frequency domain for n = 0hellipN-1
0 for k geN 22 119883 [119896] for k lt N 2
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119883 119886 [119896] =
Signal with zero mean in time and frequency domain for n = 0hellipN-1
0 for k geN 22 119883 [119896] for k lt N 2
119883 [119896] 119883 119886 [119896]
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N
N-point DCT
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with
N-zeros
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
analytic signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
PowerSpectrum
Auto-corr
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
Analytic Signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Spectrum ofHilbert env for
even-sym signal
Auto-correlation of
DCT sequence
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Hilb env ofeven-symm
signal
Auto-corr ofDCT
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
AR model ofHilb env
Auto-corr ofDCT
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Uniquely satisfied by the analytic signal
Desired Properties of AM
119909 (119905) H 119895+iquest iquestoriquest 119898(119905)
120596 (119905)120596119900+120593 (119905)
119889119889119905
- Hilbert transform
119909119886 (119905)
- analytic signal ndash Hilbert envelope
However the Hilbert transform filter is infinitely long and can cause artifacts for finite length signals
Need for modeling the Hilbert envelope without explicit computation of the Hilbert transform
Desired Properties of AM
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119883 119886 [119896] =
Signal with zero mean in time and frequency domain for n = 0hellipN-1
0 for k geN 22 119883 [119896] for k lt N 2
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119883 119886 [119896] =
Signal with zero mean in time and frequency domain for n = 0hellipN-1
0 for k geN 22 119883 [119896] for k lt N 2
119883 [119896] 119883 119886 [119896]
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N
N-point DCT
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with
N-zeros
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
analytic signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
PowerSpectrum
Auto-corr
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
Analytic Signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Spectrum ofHilbert env for
even-sym signal
Auto-correlation of
DCT sequence
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Hilb env ofeven-symm
signal
Auto-corr ofDCT
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
AR model ofHilb env
Auto-corr ofDCT
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
However the Hilbert transform filter is infinitely long and can cause artifacts for finite length signals
Need for modeling the Hilbert envelope without explicit computation of the Hilbert transform
Desired Properties of AM
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119883 119886 [119896] =
Signal with zero mean in time and frequency domain for n = 0hellipN-1
0 for k geN 22 119883 [119896] for k lt N 2
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119883 119886 [119896] =
Signal with zero mean in time and frequency domain for n = 0hellipN-1
0 for k geN 22 119883 [119896] for k lt N 2
119883 [119896] 119883 119886 [119896]
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N
N-point DCT
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with
N-zeros
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
analytic signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
PowerSpectrum
Auto-corr
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
Analytic Signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Spectrum ofHilbert env for
even-sym signal
Auto-correlation of
DCT sequence
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Hilb env ofeven-symm
signal
Auto-corr ofDCT
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
AR model ofHilb env
Auto-corr ofDCT
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119883 119886 [119896] =
Signal with zero mean in time and frequency domain for n = 0hellipN-1
0 for k geN 22 119883 [119896] for k lt N 2
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119883 119886 [119896] =
Signal with zero mean in time and frequency domain for n = 0hellipN-1
0 for k geN 22 119883 [119896] for k lt N 2
119883 [119896] 119883 119886 [119896]
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N
N-point DCT
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with
N-zeros
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
analytic signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
PowerSpectrum
Auto-corr
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
Analytic Signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Spectrum ofHilbert env for
even-sym signal
Auto-correlation of
DCT sequence
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Hilb env ofeven-symm
signal
Auto-corr ofDCT
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
AR model ofHilb env
Auto-corr ofDCT
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119883 119886 [119896] =
Signal with zero mean in time and frequency domain for n = 0hellipN-1
0 for k geN 22 119883 [119896] for k lt N 2
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119883 119886 [119896] =
Signal with zero mean in time and frequency domain for n = 0hellipN-1
0 for k geN 22 119883 [119896] for k lt N 2
119883 [119896] 119883 119886 [119896]
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N
N-point DCT
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with
N-zeros
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
analytic signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
PowerSpectrum
Auto-corr
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
Analytic Signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Spectrum ofHilbert env for
even-sym signal
Auto-correlation of
DCT sequence
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Hilb env ofeven-symm
signal
Auto-corr ofDCT
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
AR model ofHilb env
Auto-corr ofDCT
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119883 119886 [119896] =
Signal with zero mean in time and frequency domain for n = 0hellipN-1
0 for k geN 22 119883 [119896] for k lt N 2
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119883 119886 [119896] =
Signal with zero mean in time and frequency domain for n = 0hellipN-1
0 for k geN 22 119883 [119896] for k lt N 2
119883 [119896] 119883 119886 [119896]
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N
N-point DCT
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with
N-zeros
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
analytic signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
PowerSpectrum
Auto-corr
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
Analytic Signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Spectrum ofHilbert env for
even-sym signal
Auto-correlation of
DCT sequence
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Hilb env ofeven-symm
signal
Auto-corr ofDCT
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
AR model ofHilb env
Auto-corr ofDCT
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119883 119886 [119896] =
Signal with zero mean in time and frequency domain for n = 0hellipN-1
0 for k geN 22 119883 [119896] for k lt N 2
119883 [119896] 119883 119886 [119896]
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N
N-point DCT
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with
N-zeros
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
analytic signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
PowerSpectrum
Auto-corr
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
Analytic Signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Spectrum ofHilbert env for
even-sym signal
Auto-correlation of
DCT sequence
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Hilb env ofeven-symm
signal
Auto-corr ofDCT
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
AR model ofHilb env
Auto-corr ofDCT
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N
N-point DCT
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with
N-zeros
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
analytic signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
PowerSpectrum
Auto-corr
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
Analytic Signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Spectrum ofHilbert env for
even-sym signal
Auto-correlation of
DCT sequence
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Hilb env ofeven-symm
signal
Auto-corr ofDCT
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
AR model ofHilb env
Auto-corr ofDCT
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]119910 [119896] =4119877119890119883 [119896 ] klt N
N-point DCT
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with
N-zeros
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
analytic signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
PowerSpectrum
Auto-corr
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
Analytic Signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Spectrum ofHilbert env for
even-sym signal
Auto-correlation of
DCT sequence
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Hilb env ofeven-symm
signal
Auto-corr ofDCT
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
AR model ofHilb env
Auto-corr ofDCT
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ]DCT zero-padded with
N-zeros
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
analytic signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
PowerSpectrum
Auto-corr
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
Analytic Signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Spectrum ofHilbert env for
even-sym signal
Auto-correlation of
DCT sequence
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Hilb env ofeven-symm
signal
Auto-corr ofDCT
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
AR model ofHilb env
Auto-corr ofDCT
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Discrete-time analytic signal
AR Model of Hilbert Envelopes
119876119886 [119896] =
Let - even-symmetrized version of
0 k geN2119876[119896 ] k lt N
119876 [119896 ]=2119877119890 119883 [119896 ] Zero-padded DCT
119910 [119896]=4119877119890119883 [119896 ] k lt N0 k geN
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
analytic signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
PowerSpectrum
Auto-corr
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
Analytic Signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Spectrum ofHilbert env for
even-sym signal
Auto-correlation of
DCT sequence
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Hilb env ofeven-symm
signal
Auto-corr ofDCT
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
AR model ofHilb env
Auto-corr ofDCT
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
analytic signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
PowerSpectrum
Auto-corr
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
Analytic Signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Spectrum ofHilbert env for
even-sym signal
Auto-correlation of
DCT sequence
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Hilb env ofeven-symm
signal
Auto-corr ofDCT
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
AR model ofHilb env
Auto-corr ofDCT
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
PowerSpectrum
Auto-corr
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
Analytic Signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Spectrum ofHilbert env for
even-sym signal
Auto-correlation of
DCT sequence
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Hilb env ofeven-symm
signal
Auto-corr ofDCT
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
AR model ofHilb env
Auto-corr ofDCT
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
SignalSpectrum
PowerSpectrum
Auto-corr
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
Analytic Signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Spectrum ofHilbert env for
even-sym signal
Auto-correlation of
DCT sequence
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Hilb env ofeven-symm
signal
Auto-corr ofDCT
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
AR model ofHilb env
Auto-corr ofDCT
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]
Spectrum ofeven-sym
Analytic Signal
Zero-paddedDCT sequence
We have shown -
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Spectrum ofHilbert env for
even-sym signal
Auto-correlation of
DCT sequence
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Hilb env ofeven-symm
signal
Auto-corr ofDCT
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
AR model ofHilb env
Auto-corr ofDCT
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Spectrum ofHilbert env for
even-sym signal
Auto-correlation of
DCT sequence
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Hilb env ofeven-symm
signal
Auto-corr ofDCT
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
AR model ofHilb env
Auto-corr ofDCT
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
Hilb env ofeven-symm
signal
Auto-corr ofDCT
F
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
AR model ofHilb env
Auto-corr ofDCT
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
AR Model of Hilbert Envelopes
119876119886 [119896 ] = F 119902119886 [119899 ] = [119896 ]We have shown -
F iquest119902119886 [119899 ]or2 =119903119910 [120591 ]
AR model ofHilb env
Auto-corr ofDCT
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
LP in Time and Frequency
Duality
TimePowerSpec
LP
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
LP in Time and Frequency
Duality
TimePowerSpec
LP
Duality
DCTHilbEnv
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
DCT
LP
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
FDLPLinear prediction on the cosine transform of the signal
Hilb Env
FDLP Env
Speech
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
FDLP for Speech Representation
DCT
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
FDLP for Speech Representation
DCT
LP
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
FDLP for Speech Representation
DCTLP
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
FDLP Spectrogram
FDLP for Speech Representation
Time
Freq
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
FDLP Spectrogram
Conventional Approaches
FDLP for Speech Representation
Time
Freq
Time
Freq
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
FDLP versus Mel Spectrogram
FDLP
Mel
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Resolution of FDLP Analysis
FDLP
Mel
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Resolution of FDLP Analysis
FDLP
Mel
Res = (Critical Width)-1
Sig
FDLP Env
Sig
FDLP Env
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Resolution of FDLP Analysis
FDLP
Mel
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Properties of FDLP Analysis
FDLP
Mel
bull Summarizing the gross temporal variation with a few parametersndash Model order of FDLP controls the degree of
smoothness ndash AR model captures perceptually important high energy
regions of the signal
bull Suppressing reverberation artifactsndash Reverberation is a long-term convolutive distortion
bull Analysis in long-term windows and narrow sub-bands
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
CleanSpeech
RoomResponse = Revb
Speech
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the long-term DFT domain this translates
CleanSpeech
RoomResponse = Revb
Speech
xCleanDFT
ResponseDFT = Revb
DFT
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Reverberation
When speech is corrupted with convolutive distortion like room reverberation
In the DFT domain this translates to a multiplication In the sub-band
In narrow bands is slowly varying
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
FDLP envelope of band using all-pole parameters hellip is given by
When the sub-band signal is multiplied by the gain is modified
Normalization to convolutive distortions is achieved by reconstructing the FDLP envelope with
Gain Normalization in FDLP
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Gain Normalization in FDLPWithout gain norm
With gain norm
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
SGanapathySThomasPMotlicekandHHermanskyldquoApplications of Signal Analysis Using Autoregressive Models of Amplitude ModulationIEEEWASPAAOct2009
FM
AM
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Short-term Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Envelopes in each band are integrated along time (25 ms with a shift of 10 ms)
Integration in frequency axis to convert to mel scale
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Short-term Features
Sub-band
Window
InputDCT FDLP Gain
Norm
Log+
DCT
FeatEnergyInt
Sub-band energies are converted to cepstral coefficients by applying log and DCT along frequency axis
Delta and acceleration coefficients are appended to obtain 39 dim feat similar to conventional MFCC feat
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Speech Recognition
TIDIGITS Database (8 kHz) Clean training data test data can be clean or naturally
reverberated HMM-GMM system
Whole-word HMM models trained on clean speech Performance in terms of word error rate (WER)
Features PLP features with cepstral mean subtraction (CMS) Long term log spectral subtraction (LTLSS) [Gelbart 2002] FDLP short-term (FDLP-S) features ndash 39 dim
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Speech Recognition
SThomasSGanapathyandHHermanskyldquoRecognition of Reverberant Speech Using FDLPIEEESignalProcLetters2008
Clean Reverb0
10
20PLP-CMSLTLSSFDLP-SW
ER
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Speaker Verification NIST 2008 Speaker recognition evaluation (SRE)
Has telephone speech and far-field speech
GMM-UBM system Trained on a large set of development speakers Adapted on the enrollment data from the target speaker Nuisance attribute projection (NAP) on supervectors Detection cost function (DCF) = 099 Pfa + 01 Pmiss
Features with warping [Pelecanos 2001] Mel Frequency Cepstral Coefficients (MFCCs) FDLP short-term (FDLP-S) features
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Tel Mic Cross domain
10
20
30
MFCCFDLP-SD
CF
Speaker Verification
SGanapathyJPelecanosandMOmarldquoFeature Normalization for Speaker Verification in Room ReverberationICASSP2011
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Modulation Features
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Static compression is a logarithm ndash reduce the huge dynamic range in the in the sub-band envelope
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Dynamic compression is implemented by dynamic compression loops consisting of dividers and low pass filters [Kollmeier 1999]
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Compressed sub-band envelopes are DCT transformed to obtain modulation frequency components
14 static and dynamic modulation spectra (0-35 Hz) with 17 sub-bands gets a feature of 476 dim
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Phoneme Recognition
TIMIT Database (8 kHz) Clean training data test data can be clean additive noise
reverberated or telephone channel Multi-layer perceptron (MLP) based system
MLPs estimate phoneme posteriors Hidden Markov model (HMM) ndash MLP hybrid model Performance in phoneme error rate (PER)
Features Perceptual linear prediction (PLP) - 9 frame context Advanced ETSI standard [ETSI2002] ndash 9 frame context FDLP modulation (FDLP-M) features ndash 476 dim
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
SGanapathySThomasandHHermanskyldquoTemporal Envelope Compensation for Robust Phoneme Recognition Using Modulation SpectrumJASA2010
Clean Add Noise
Reverb Tel 30
45
60
75
PLP-9ETSI-9FDLP-M
PE
R
Phoneme Recognition
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Outline of Applications
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Audio Coding
Sub-bandDecomposition FDLP
GainNorm
Quant
Short-term Features for Speaker amp
Speech Recog
Modulation Features for Phoneme Recog
Wide-band Speech amp Audio Coding
Input
Signal
FM
AM
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Audio Coding
QMFAnalysis
FDLPInput
1
32
MDCT
Env
Carr
Q
Q
Q-1
Q-1 IMDCT
MulQMF
Synthesis
1
32
Output
SriramGanapathyPetrMotlicekandHHermanskyldquoAutoregressive Modeling of Hilbert Envelopes for Wide-band Audio CodingAES124thConventionAudioEngineeringSocietyMay2008
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Subjective Evaluations
SGanapathyPMotlicekandHHermanskyldquoAR Models of Amplitude Modulation in Audio CompressionIEEETransactionsonAudioSpeechandLanguageProc2010
Series10
20
40
60
80
100
Hidden RefLPF7kMP3FDLPAAC
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Overview
Introduction AR Model of Hilbert Envelopes FDLP and its Properties Applications Summary
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Summary
Employing AR modeling for estimating amplitude modulations
Long-term temporal analysis of signals forms an efficient alternative to conventional short-term spectrum
Provides AM-FM decomposition in sub-bands and acts as unified model for speech and audio signals
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Our Contributions
Simple mathematical analysis for AR model of Hilbert envelopes
Investigating the resolution properties of FDLP
Gain normalization of FDLP Envelopes
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Our Contributions
Short-term feature extraction using FDLP ndashImprovements in reverb speech recog
Modulation feature extraction ndash Phoneme recognition in noisy speech
Speech and audio codec development using AM-FM signals from FDLP
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Acknowledgements Lab Buddies ndash Samuel Thomas Sivaram Garimella
Padmanbhan Rajan Harish Mallidi Vijay Peddinti Thomas Janu Aren Jansen
Idiap personnel ndash Petr Motlicek Joel Pinto Mathew Doss
IBM personnel ndash Jason Pelecanos Mohamed Omar
Others ndash Xinhui Zhou Daniel Romero Marios Athineos David Gelbart Harinath Garudadri
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Thank You
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Evidences
Physiological evidences - Spectro-temporal receptive fields [Shamma etal 2001]
Psycho-physical evidences - Perceptual importance of modulation frequencies
[Drullman et al 1994] Syllable recognition from temporal modulations with
minimal spectral cues [Shannon et al 1995]
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Applications
Modulation spectra has been used in the past Speech intelligibility [Houtgast et al 1980] RASTA processing [Hermansky et al 1994] Speech recognition [Kingsbury et al 1998] AM-FM decomposition [Kumaresan et al 1999] Sound texture modeling [Athineos et al 2003] Sound source separation [King et al 2010]
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
nn-1n-2n-3
a3a2
a1
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Linear Prediction ndash Time Domain
Current sample expressed as a linear combination of past samples
Model parameters are solved by minimizing the residual sum of squares
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
AR model of Power Spectrum
Filter interpretation [Makhoul 1975]
From Parsevalrsquos theorem
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
AR model of Power Spectrum
By definition Let
Thus parameters are solved by minimizing
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
AR model of Power Spectrum
Solution of the linear prediction yields an all-pole model of the power spectrum
Numerator G denotes the gain of AR model (equal to minimum residual sum of squares)
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
AR model of power spectrum
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Hilbert Envelope - Definition Analytic signal is the sum of the signal and its
quadrature component
where H denotes the Hilbert transform
Hilbert envelope is the squared magnitude of the analytic signal
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Hilbert Envelope
c FDLP Env
b Hilb Env
a Speech
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Duality
LP
FDLP
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
LP in Time and Frequency
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
AM-FM Decomposition
a Signalb Hilb Envc FDLP Envd AM compe FM comp
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Spectrogram Comparison
FDLP
SriramGanapathySamuelThomasandHHermanskyldquoComparison of Modulation Frequency Features for Speech RecognitionICASSP2010
PLP
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Dealing with Convolutive Distortions
Cepstral mean subtraction (CMS) long-term log spectral subtraction (LTLSS) amp gain normalization CMS assumes distortion in neighboring frames
to be similar ndash suppresses short-term artifacts
Long-term subtraction deals with reverberation assuming over the same response over a window of long-term frames [Gelbart 2002]
Gain normalization deals with short and long term distortions within a single long-term frame
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Feature Comparison
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Modulation Feature Extraction
Critical-band
Window
InputDCT FDLP
Static
Dynamic
DCT
DCT
200ms
Sub-bandFeat
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Modulation Features
SriramGanapathySamuelThomasandHHermanskyldquoModulation Frequency Features for Phoneme Recognition in Noisy SpeechJASAExpressLetters2009
a Signalb Hilb Envc FDLP Envd Log compe Dyn comp
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Noise Compensation in FDLP
Critical-band
Window
LinearPred
Signal
+Noise
DCT IDFT | |2 DFTFDLP
Env
When speech is corrupted with additive noise
y The noise component is additive in the non-parametric Hilbert envelope
domain (assuming the signal and noise are uncorrelated)
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Noise Compensation in FDLP
Critical-band
Window
InputDCT IDFT | |2 DFTWiener
Filtering
VAD
Voice activity detector (VAD) provides information about the non-speech regions which are used for estimating the temporal envelope of the noise
Noise subtraction tries to subtract the estimate the noise envelope from the noisy speech envelope
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Noise Compensation in FDLP
SGanapathySThomasandHHermanskyldquoTemporal Envelope Subtraction for Robust Speech Recognition using Modulation SpectrumIEEEASRU2009
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Introduction
Time
Freq
uenc
y
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-
Introduction
Conventional signal analysis ndash starts with the estimation of short-term spectrum (10-40 ms)
Spectrum is sampled at a preset rate before further modelingprocessing stages
Contextual information is typically processed with time-series models such as HMM
- Signal Analysis Using Autoregressive Models of Amplitude Modula
- Overview
- Overview (2)
- Introduction
- Introduction (2)
- Introduction (3)
- Introduction (4)
- Introduction (5)
- Introduction (6)
- Introduction (7)
- Introduction (8)
- Desired Properties of AM
- Desired Properties of AM (2)
- Desired Properties of AM (3)
- Overview (3)
- Overview (4)
- AR Model of Hilbert Envelopes
- AR Model of Hilbert Envelopes (2)
- AR Model of Hilbert Envelopes (3)
- AR Model of Hilbert Envelopes (4)
- AR Model of Hilbert Envelopes (5)
- AR Model of Hilbert Envelopes (6)
- AR Model of Hilbert Envelopes (7)
- AR Model of Hilbert Envelopes (8)
- AR Model of Hilbert Envelopes (9)
- AR Model of Hilbert Envelopes (10)
- AR Model of Hilbert Envelopes (11)
- AR Model of Hilbert Envelopes (12)
- AR Model of Hilbert Envelopes (13)
- LP in Time and Frequency
- LP in Time and Frequency (2)
- FDLP
- FDLP (2)
- FDLP (3)
- FDLP (4)
- FDLP for Speech Representation
- FDLP for Speech Representation (2)
- FDLP for Speech Representation (3)
- FDLP for Speech Representation (4)
- FDLP for Speech Representation (5)
- FDLP for Speech Representation (6)
- FDLP for Speech Representation (7)
- FDLP versus Mel Spectrogram
- Overview (5)
- Overview (6)
- Resolution of FDLP Analysis
- Resolution of FDLP Analysis (2)
- Resolution of FDLP Analysis (3)
- Properties of FDLP Analysis
- Properties of FDLP Analysis (2)
- Reverberation
- Reverberation (2)
- Reverberation (3)
- Reverberation (4)
- Gain Normalization in FDLP
- Gain Normalization in FDLP (2)
- Overview (7)
- Overview (8)
- Outline of Applications
- Short-term Features
- Short-term Features (2)
- Short-term Features (3)
- Short-term Features (4)
- Speech Recognition
- Speech Recognition (2)
- Speaker Verification
- Speaker Verification (2)
- Outline of Applications (2)
- Modulation Features
- Modulation Feature Extraction
- Modulation Feature Extraction (2)
- Modulation Feature Extraction (3)
- Modulation Feature Extraction (4)
- Phoneme Recognition
- Phoneme Recognition (2)
- Outline of Applications (3)
- Audio Coding
- Audio Coding (2)
- Subjective Evaluations
- Overview (9)
- Overview (10)
- Summary
- Summary (2)
- Summary (3)
- Our Contributions
- Our Contributions (2)
- Our Contributions (3)
- Our Contributions (4)
- Our Contributions (5)
- Our Contributions (6)
- Acknowledgements
- Slide 92
- Evidences
- Evidences (2)
- Applications
- Linear Prediction ndash Time Domain
- Linear Prediction ndash Time Domain (2)
- AR model of Power Spectrum
- AR model of Power Spectrum (2)
- AR model of Power Spectrum (3)
- AR model of Power Spectrum (4)
- AR model of power spectrum
- Hilbert Envelope - Definition
- Hilbert Envelope
- Duality
- LP in Time and Frequency (3)
- AM-FM Decomposition
- Spectrogram Comparison
- Dealing with Convolutive Distortions
- Dealing with Convolutive Distortions (2)
- Dealing with Convolutive Distortions (3)
- Dealing with Convolutive Distortions (4)
- Feature Comparison
- Modulation Feature Extraction (5)
- Modulation Features (2)
- Noise Compensation in FDLP
- Noise Compensation in FDLP (2)
- Noise Compensation in FDLP (3)
- Introduction (9)
- Introduction (10)
- Introduction (11)
-