Development of voice password based speaker verification system
-
Upload
niranjan-kumar -
Category
Technology
-
view
16 -
download
0
Transcript of Development of voice password based speaker verification system
Voice Password Based Speaker Verification Using Vowel Region
Under guidance of Dr. G. PradhanNIT PATNA (ECE dept.)
Presented By:Piyush Kumar(1104091)Kamlesh Kalvaniya(1104080)Niranjan Kumar(1104087)
Content
• Introduction• Motivation for present work• Issues in speaker verification• Development of baseline• Proposed speaker verification system• Summary • Conclusion
Introduction
• Speaker Verification is a task of validating identity claim of a person from his/her voice.
• Voice password based speaker verification system – Speaker is free to choose his/her password – Password remains same for training and
verification
Motivation
• Development of a low complexity speaker verification system with reasonable performance using few seconds of speech data– For mobile based applications– Low security person authentication
Issues in limited Speech Speaker Verification
• Information in human speech– Message, Language, Speaker, Emotion/ health
Recording environment, channel, sensor etc.• Speaker specific information extracted from
speech data varies depending on other factors• Challenge – Enhance the speaker specific information – Normalize other variability's in speech data
Baseline System
• Gaussian Mixture Model (Text Independent)Database: NIST-2003VAD: Energy based VAD (0.6 * average
energy)Feature vector: 13 dimension MFCC
appended with delta and delta-deltaModeling: GMMGMM size: 8, 16, 32, 64 Comparison: log Likelihood score
Flowchart for GMM based SV system
04/15/2023 N.I.T. PATNA ECE, DEPTT. 8
GMM based SV system EER
.
GAUSSIAN SIZE
8
16
32
64
TEST 15 SecTRAIN 15 SEC
TEST 15 SecTrain Full
TEST Full Train 15 Sec
Test FullTrain Full
EQUAL ERRORRATE(%)
EQUAL ERRORRATE(%)
EQUAL ERRORRATE(%)
EQUAL ERRORRATE(%)
34.90 33.18 34.24 32.70
33.05 30.50 32.28 29.67
32.46 28.78 32.92 27.77
32.82 27.42 33.06 26.05
Conclusion
• Performance is sensitive to duration of testing and training data.
• Performance is more sensitive to duration of training data compared to testing data.
• GMM based SV system may not suitable for limited data.
Baseline system for Voice password based system
• Data Collection Data of 100 speakers was collected. Each speaker utter his/her full name or roll no as the
voice password which was recorded over phone. No of male speaker: 81, No of female speaker: 19 Duration of data: 2 -5 Sec No of training session: 3, No of testing session: 5 With
minimum gap of one day between each sessions During verification task each speaker was compared
with its own & 19 other imposter speakers.
Dynamic Time Warping
• DTW is a template matching technique• Test Features and Template (Model) are
sequence of feature vectors• Aim is to find distortion between Test Features
and Template • They may have different length • DTW uses dynamic programming to find
optimal path for normalizing the length variation.
04/15/2023 N.I.T. PATNA ECE, DEPTT. 12
Experimental results for DTW based system for Voice password database
13 39 13 39 13 39 13 39 13 39
25 28 14 14.6 25 27.9 14.7 15 25.2 26.3
31 34 17 18.9 30 33.6 18 19.3 29.4 32.6
28 29 18 19 29 31 18.7 20 31.5 32.6
31 32 15 16 32 32.3 16.1 17.5 30.5 32.6
31 33 17 18 32 34 18.2 20.7 34.7 35.7
13 39
14.7 15.7
20 21.05
20 21.94
16.8 18.94
18.9 21.05
Start to End
VAD Start to end VAD Start to end VAD
Session1(EER %) Session2(EER%) Session3(EER%)
1
2
3
4
5
Train
Test
04/15/2023 N.I.T. PATNA ECE, DEPTT. 13
Experimental results for GMM based system for Voice password database
17.9 19.7 21.2
18.34 18.1 20.3
18.69 19.6 18.7
19.8 20.1 18.9
20.6 20.6 20
Session 1(EER%) Session 2(EER%) Session 3(EER%)
Session 1
Session 2
Session 3
Session 4
Session 5
TrainTest
04/15/2023 N.I.T. PATNA ECE, DEPTT. 14
DTW using only mean vector of GMM
15.9 19.7 21.2
16.26 18.1 20.3
18.69 19.6 18.7
19.8 20.1 18.9
20.6 20.6 20
Session1(EER%) Session2(EER%) Session3(EER%)
1
2
3
4
5
TrainTest
04/15/2023 N.I.T. PATNA ECE, DEPTT. 15
Verification result comparison and discussion
DTW based system best EER :14%GMM based system best EER :17.9%DTW using mean vector of GMM best EER :15.9%Best result was obtained for DTW.Performance of DTW based system depends on
detection of end points.Performance of DTW based system may be improved
by robust end point detection and enhancing more speaker specific regions
Hence the motivation for the present work
Vowel Regions In Speech Signal
• VOP and VEP are two important events in speech signal– VOP: instants at which onset of vowel takes place
in speech signal– VEP: instants at which offset of vowel takes place
in speech signal
Vowel Regions In Speech Signal
VOP (circle) and VEP (arrow head) events for an utterance /the sea/
Vowel Regions In Speech Signal
• Vowel regions are prominent regions in speech signal:– High amplitude– Near Periodic Excitation– Long Duration– Lower Zero Crossing rate
• Due to high amplitude SNR of vowel regions are high.
04/15/2023 N.I.T. PATNA ECE, DEPTT. 19
Empirical Mode Decomposition
• Empirical Mode Decomposition (EMD)• Data-driven, multi-scale, robust to non-stationary signal• Fast oscillating signal can be superimposed to slow oscillating signals• Local mean of decomposed signals is zero and the signals are symmetric to
its local mean.• Impact of noise on the signal can be reduced
• Decomposed signals are defined as Implicit Mode Function (IMF), if it satisfies following conditions
• The number of extrema and the number of zero crossing differs only by one
• The local average is zero. This implies that envelop mean of upper envelop and lower envelop is zero.
.
EMD Algorithm• For a given input signal X to decompose
Identify the local extrema of the signal X. Construct upper envelop E max & lower envelop Emin by interpolating maximum
&minimum,respectively Approximate local average by envelop mean Em taking average of two
envelops E max &Emin.
Compute candidate implicit mode h1=X-Em. If h1 is IMF,decompose the signal X as IMF imf= hi& the residue signal r=X-
imf.Otherwise repeat above steps.• If r has implicit oscillation mode,set r as input signal & repeat the steps.• A signal S(n) can be represented through IMFs as follows
S(n)= +r(n)Where r(n) is the residue.
04/15/2023 N.I.T. PATNA ECE, DEPTT. 21
MOTIVATION FOR USE OF EMD
• Environmental effect on the speech data can be deemphasized
• Excitation information present in different frequency range can be analyzed separately.
• To emphasize the weak transitions in case of nasal-vowel, semivowel-vowel & Dipthongs.
04/15/2023 N.I.T. PATNA ECE, DEPTT. 22
Flowchart for VOP detection
04/15/2023 N.I.T. PATNA ECE, DEPTT. 23
VOP EVIDENCE PLOT
04/15/2023 N.I.T. PATNA ECE, DEPTT. 24
Experiment
Speech data• Complete TIMIT database• Number of Male speakers: 438• Number of Female speakers: 192• Sampling Frequency=8 KHz• VOP experiment was performed on 100 speakers.
04/15/2023 N.I.T. PATNA ECE, DEPTT. 25
Performance measure
• Identification rate (IR): Percentage of reference VOPs (VEPs) that are matched by detected VOPs (VEPs) with in vowel regions
• Spurious rate (SR): Percentage of detected VOPs (VEPs), which are detected outside vowel regions
04/15/2023 N.I.T. PATNA ECE, DEPTT. 26
Performance of proposed VOP detection method
Baseline 47 74 78 88 15
Proposed 62 83 90 96 13
Detection Rate % Spurious Rate%
Method 10ms 20ms 30ms 40ms
Observation:•Performance of proposed method is better than baseline in terms of both Detection rate & Spurious Rate.•83% detection is achieved in 20ms window which is beneficial when used for comparison of strings of vowel regions.
SV System by applying DTW on Vowel regions only
SV System by applying DTW on mean of vowel regions only
Score Normalization
DET Plot for DTW & Normalized DTW
DET Plot for DTW on vowel regions only
DET Plot of DTW on mean of vowel regions
Conclusion
• The proposed VOP Detection algorithm performed better than the best method present in the literature.
• The performance of proposed algorithm for voice password SV system is better than the any of the baseline system.
• The complexity of the proposed algorithm for voice password SV system is less than any baseline system which makes it useful for online SV task.