Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification...
-
Upload
gtebuddy -
Category
Technology
-
view
1.590 -
download
3
description
Transcript of Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification...
![Page 1: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/1.jpg)
MAJOR PROJECT MID-TERM PRESENTATION :
SPEAKER VERIFICATION FOR REMOTE AUTHENTICATION
Members:
Ganesh Tiwari (063BCT510)
Madhav Pandey(063BCT514)
Manoj Shrestha(063BCT518)
Supervisor :
Dr. Subarna Shakya
Associate Professor
![Page 2: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/2.jpg)
INTRODUCTION
Voice biometric system user login
Text-Prompted system The claimant is asked to speak a prompted text Speech and Speaker Recognition/Verification More secure to playback attack.
Web Application Client (Adobe Flex) : Voice Capture, preprocessing and
feature extraction Server (JAVA) : Training / Classification BlazeDS RPC for JAVA-Flex Connectivity
![Page 3: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/3.jpg)
BLOCK DIAGRAM OF SPEAKER / SPEECH RECOGNITION SYSTEM
![Page 4: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/4.jpg)
Signal Capture and Pre-Processing
![Page 5: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/5.jpg)
CAPTURE AND PREPROCESSING
Get the audio signal i.e., ADC
Make suitable for feature extraction
Capture
PCM Extract
Silence Removal
Pre-Emphasis
Framing
Windowing
![Page 6: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/6.jpg)
CAPTURE AND PREPROCESSING : CAPTURE
22050 Hz 16-bits, Signed Little Endian Mono Uncompressed PCM
Capture
PCM Extract
Silence Removal
Pre-Emphasis
Framing
Windowing
![Page 7: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/7.jpg)
CAPTURE AND PREPROCESSING : PCM EXTRACT
Capture
PCM Extract
Silence Removal
Pre-Emphasis
Framing
Windowing
![Page 8: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/8.jpg)
CAPTURE AND PREPROCESSING :
SILENCE REMOVAL
Algorithm described in paper‘a new method for silence removal and endpoint detection’ †
†G. Saha, Sandipan Chakroborty, Suman Senapati of Department of Electronics and
Electrical Communication Engineering, Indian Institute of Technology, Khragpur, India
0 1 2 3 4 5 6 7 8 9
x 104
-1
-0.5
0
0.5
1
0 0.5 1 1.5 2 2.5 3 3.5 4
x 104
-1
-0.5
0
0.5
1
Capture
PCM Extract
Silence Removal
Pre-Emphasis
Framing
Windowing
![Page 9: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/9.jpg)
CAPTURE AND PREPROCESSING : PRE-EMPHASIS
Boosting the high frequency energy
In time domain, y[n] = x[n]−αx[n−1], 0.9 ≤ α ≤ 1.0
0 2000 4000 6000 8000 10000 120000
0.01
0.02
0.03
0.04
0.05
Frequency (Hz)
|Y(f
)|
0 2000 4000 6000 8000 10000 120000
1
2
3
4
5x 10
-3
Frequency (Hz)
|Y(f
)|
Capture
PCM Extract
Silence Removal
Pre-Emphasis
Framing
Windowing
![Page 10: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/10.jpg)
CAPTURE AND PREPROCESSING :
FRAMING
Speech Signal is stationary (statistical properties) for 10-30 ms
50% overlapped frames each of 23ms is used
Capture
PCM Extract
Silence Removal
Pre-Emphasis
Framing
Windowing
![Page 11: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/11.jpg)
CAPTURE AND PREPROCESSING :
WINDOWING
Windowing is done on the frame blocked signal
Hamming window
0 10 20 30 40 50 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Hamming Window
0 200 400 600 800 1000 1200-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
0 200 400 600 800 1000 1200-0.05
-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
0.05
Capture
PCM Extract
Silence Removal
Pre-Emphasis
Framing
Windowing
![Page 12: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/12.jpg)
Feature Extraction
![Page 13: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/13.jpg)
FEATURE EXTRACTION
Transform the input audio signal into a sequence of acoustic feature vectors
MFCC : Mel Filter Cepstral Coefficients as Feature Perceptual approach Human Ear processes audio signal in Mel
scale Mel scale : linear up to 1KHz and
logarithmic after 1KHz
MFCC gives distribution of energy in Mel frequency band Calculated for each frame
Fourier Transform
Mel Filter
Log
IFT : DCT
Cepstral Mean Subtraction
Energy and Deltas
![Page 14: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/14.jpg)
FEATURE EXTRACTION :
FOURIER TRANSFORM
Gives information about the amount of energy at each frequency band
FFT used
0 0.5 1 1.5 2 2.5 3 3.5 4
x 104
-1
-0.5
0
0.5
1
0 2000 4000 6000 8000 10000 120000
1
2
3
4
5x 10
-3
Frequency (Hz)
|Y(f
)|
Fourier Transform
Mel Filter
Log
IFT : DCT
Cepstral Mean Subtraction
Energy and Deltas
![Page 15: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/15.jpg)
FEATURE EXTRACTION :
MEL FILTER
We used filter bank of triangular filters spaced in Mel scale
Fourier Transform
Mel Filter
Log
IFT : DCT
Cepstral Mean Subtraction
Energy and Deltas
![Page 16: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/16.jpg)
FEATURE EXTRACTION :
MEL FILTER (CONTD..)
Mel Filter
Where,
Fourier Transform
Mel Filter
Log
IFT : DCT
Cepstral Mean Subtraction
Energy and Deltas
![Page 17: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/17.jpg)
FEATURE EXTRACTION : LOG, IFT(DCT)
Log
DCT
MFCC
Fourier Transform
Mel Filter
Log
IFT : DCT
Cepstral Mean Subtraction
Energy and Deltas0 100 200 300 400 500 600 700 800 900
-20
-15
-10
-5
0
5
10
![Page 18: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/18.jpg)
FEATURE EXTRACTION :
CEPSTRAL MEAN SUBTRACTION
CMS: for minimizing channel effectFourier Transform
Mel Filter
Log
IFT : DCT
Cepstral Mean Subtraction
Energy and Deltas
![Page 19: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/19.jpg)
FEATURE EXTRACTION :
ENERGY AND DELTAS
For completeness of feature vector and for achieving high recognition rate
A Energy Feature
A delta or velocity feature, and a double delta or acceleration featureCalculated by linear regression of regression window M
Fourier Transform
Mel Filter
Log
IFT : DCT
Cepstral Mean Subtraction
Energy and Deltas
![Page 20: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/20.jpg)
COMPOSITION OF FEATURE VECTOR
12 MFCC Features 12 Delta MFCC 12 Delta-Delta MFCC 1 Energy Feature 1 Delta Energy Feature 1 Delta-Delta Energy Feature
39 Features from each frame
![Page 21: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/21.jpg)
Speaker Recognition/Verification by GMM
![Page 22: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/22.jpg)
GAUSSIAN MIXTURE MODEL
Parametric probability density function Based on clustering technique M Gaussian components
: a k-dimensional random vector: mixture weight of mth component
: k-dimensional Gaussian function (pdf)
= (, )
![Page 23: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/23.jpg)
GMM TRAINING
Goal: estimate the parameters Method: Maximum Likelihood estimation Input: X = {}
) Maximize with Expectation Maximization
algorithm Iterative process:
initial model: new model: P(X/ ) ≥ P(X/ )
Convergence Condition:
P(X/ ) - P(X/ ) <
![Page 24: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/24.jpg)
VERIFICATION
Decision: Hypothesis TestH0: the speaker is the claimed speaker
H1: the speaker is an imposter Based on likelihood ratio
= Decision by threshold
< reject identity claim > accept identity claim
![Page 25: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/25.jpg)
Speech Recognition by HMM/VQ
![Page 26: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/26.jpg)
HIDDEN MARKOV MODEL :DEFINITION
Hidden Markov Model (HMM) is the statistical model
HMM is the extension of Markov Process
HMM has hidden states and observable symbols per states
HMM Model :
Observed data : feature vector Hidden states : phonemes
(A,B, )
![Page 27: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/27.jpg)
CODEBOOK GENERATION
K-Means Clustering Clustering the whole database & Codebook
Generation
VQ : Vector Quantization is used for mapping each input feature vector to discrete quantized symbols Codebook for each incoming feature vector is built Compare it to each of the prototype vectors in
codebook Select the one which is closest (by some distance
metric) Replace the input vector by the index of this
prototype vector observation sequence
![Page 28: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/28.jpg)
SPEECH RECOGNITION SYSTEM: BY : HMM / VQ
![Page 29: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/29.jpg)
HIDDEN MARKOV MODEL :TRAINING Training by:
Forward backward (Baum-Welch) algorithm
Forward-backward algorithm iteratively re-estimates the parameters and improves the probability that given observation are generated by the new parameters
Three parameters need to be re-estimated: Initial state distribution: πi
Transition probabilities: ai,j
Emission probabilities: bi(ot)
Input is observation sequence, given by VQ
![Page 30: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/30.jpg)
HIDDEN MARKOV MODEL :VERIFICATION/MATCHING
Viterbi algorithm is used
Input is Observation sequence, given by VQ HMM model of the word
Best matched word is returned
![Page 31: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/31.jpg)
PROBLEM FACED
Learning curve Complex Mathematics
Flex & Java Connectivity (initially) Data conversion
![Page 32: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/32.jpg)
REMAINING TASKS
Speech Training Data Collection
Model Training (HMM, GMM)
Module Integration
Testing
![Page 33: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation](https://reader033.fdocuments.in/reader033/viewer/2022061115/5464a5d3af795950608b5223/html5/thumbnails/33.jpg)
Thanks