Gabor presentation
-
Upload
jom-kantapon -
Category
Documents
-
view
220 -
download
0
Transcript of Gabor presentation
![Page 1: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/1.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 1/28
Easy Does It:Robust Spectro-Temporal Many-
Stream ASR without Fine Tuning
Streams
Ravuri, Morgan, UC Berkeley
Presented by JJ
![Page 2: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/2.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 2/28
Motivation
• Physiological experiments indifferent mammal species : alarge percentage of neurons inthe primary auditory cortex (A1)respond differently to upward-versus downward-moving ripplesin the spectrogram of the input(Depireux et al., 2001).
• Spectro-temporal receptivefields (STRFs) : individual neurons
are sensitive to specific spectro-temporal modulation frequenciesin the incoming sound signal
![Page 3: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/3.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 3/28
Introduction
• Cortically-inspired TF features, which capture
spectral and temporal modulations speech
recognition and discrimination.
• Basically, spectro-temporal features are
derived from filtering spectrograms with
particular filters.
• In this case, the GABOR filter is applied to the
auditory spectrogram.
![Page 4: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/4.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 4/28
Example
![Page 5: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/5.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 5/28
Example Gabor Filters
![Page 6: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/6.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 6/28
Example Gabor Filters
Gaussian envelope
complex sinusoid s(n, k)
![Page 7: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/7.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 7/28
1D Gabor
Gaussian envelope complex sinusoid s(n, k)
![Page 8: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/8.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 8/28
2D GaborGaussian envelope complex sinusoid s(n, k)
![Page 9: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/9.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 9/28
Example Gabor Filters
Gaussian envelope
complex sinusoid s(n, k)
![Page 10: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/10.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 10/28
Their Gabor Filters
![Page 11: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/11.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 11/28
Their Gabor Filters
parametersDummy
indices
![Page 12: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/12.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 12/28
Tons of Combinations!
![Page 13: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/13.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 13/28
System
Stream
…….
…….
…….
Stream
Merge MLP outputs
PCA
MFCC Output
![Page 14: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/14.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 14/28
System
Stream
…….
…….
…….
Stream
Merge MLP outputs
PCA
MFCC Output
![Page 15: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/15.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 15/28
System
Stream
…….
…….
…….
Stream
Merge MLP outputs
PCA
MFCC Output
• MLP (Multilayer Perceptron)
• The structure of the MLP
depends on the type of feature
and corpus.
Number of Spectral Cepstral
input units 567 351
frames of context 9 9
hidden units 160 for Aurora2
500 for Number95
160 for Aurora2
500 for Number95
output units 56 56
56D
32D
56D
45D
![Page 16: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/16.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 16/28
System
Stream
…….
…….
…….
Stream
Merge MLP outputs
PCA
MFCC Output
• The outputs of the MLP stream
provide an estimate of the
posterior probability distribution
for phones.
• Then, combine each of these
phone probability estimates
across streams by inverse
entropy.56D
32D
56D
71D
![Page 17: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/17.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 17/28
System
Stream
…….
…….
…….
Stream
Merge MLP outputs
PCA
MFCC Output
• then apply the KL
Transform to the log
probabilities of the
merged MLPs
Principal Components Analysis
56D
32D
56D
71D
![Page 18: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/18.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 18/28
System
Stream
…….
…….
…….
Stream
Merge MLP outputs
PCA
MFCC Output
• then apply the KLTransform to the logprobabilities of themerged MLPs
• reduced to 32D
• orthogonalized
• the features are meanand variance normalized
by utterance• finally appended to the
MFCC feature
56D
32D
56D
71D
![Page 19: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/19.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 19/28
System
Stream
…….
…….
…….
Stream
Merge MLP outputs
PCA
MFCC Output
• Features HMM
56D
32D
56D
71D39D 32D
![Page 20: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/20.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 20/28
Experiments
Database
• Aurora 2 (0 – 20 dB)
• Numbers95
• consists of various numeric portionsextracted from telephone dialogues .
• vocabulary size of 32 words
• training set contains 3590 utterancesof clean data, totaling roughly 3 hrs
• 2 test sets contains 1227 utterances.
• The first contains only clean data
• The second contains the sameutterances with noise added at five
SNR (20dB, 15dB, 10dB, 5dB, and0dB).
• Additive noise
Baseline
• 39 MFCC
• 4-stream system
• 28-stream system
Uni-modulation system
• 150 stream
• spectral only and spectral/cepstral
Metric: Word Error Rate (WER)
![Page 21: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/21.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 21/28
ResultsAurora 2
Numbers 95
![Page 22: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/22.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 22/28
ResultsAurora 2
Numbers 95
![Page 23: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/23.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 23/28
ResultsAurora 2
Numbers 95
![Page 24: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/24.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 24/28
ResultsAurora 2
Numbers 95
Discussion 1
![Page 25: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/25.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 25/28
ResultsAurora 2
Numbers 95
Discussion 2
![Page 26: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/26.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 26/28
ResultsAurora 2
Numbers 95
Discussion 3
![Page 27: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/27.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 27/28
ResultsAurora 2
Numbers 95
![Page 28: Gabor presentation](https://reader030.fdocuments.in/reader030/viewer/2022021200/577d21d81a28ab4e1e96027c/html5/thumbnails/28.jpg)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 28/28
Stream
…….
…….
…….
Stream
Merge MLP outputs
PCA
MFCC Output
• Not just additive noise
• Another TF feature
might not work
• Log-mel filterbank? Orpower like PNCC?
• How to combine MLP?
Inverse Entropy?
56D
32D
56D
71D39D 32D
Future Work