GMM-SVM+UP-AVR

Acoustic Vector Re-sampling for GMMSVM-Based Speaker Verification

Man-Wai MAK and Wei RAOThe Hong Kong Polytechnic University

enmwmak@polyu.edu.hkhttp://www.eie.polyu.edu.hk/~mwmak/

Outline

GMM-UBM for Speaker Verification GMM-SVM for Speaker Verification Data-Imbalance Problem in GMM-SVM Utterance Partitioning for GMM-SVM Experiments on NIST SRE

Speaker Verification

To verify the identify of a claimant based on his/her own voices

Is this Mary’s voice?

I am Mary

FeatureExtraction

John’sModel

ImpostorModel

Score Normalization and Decision

Making

DecisionThreshold

Accept/Reject

John’s “Voiceprint”

Impostors “Voiceprints”

I’m John

Scores

Verification Process

Acoustic Features Speech is a continuous evolution of the vocal tract Need to extract a sequence of spectra or sequence of spectral coefficients Use a sliding window - 25 ms window, 10 ms shift

DCTLog|X(ω)|MFCC

)()()()( ),|()|( xx

GMM-UBM for Speaker Verification

• The acoustic vectors (MFCC) of speaker s is modeled by a prob. density function parameterized by

)()()()( },,{

• Gaussian mixture model (GMM) for speaker s:

)()()()( },,{

jjjj pp

)ubm()ubm()ubm()ubm( ),|()|( xx

• The acoustic vectors of a general population is modeled by another GMM called the universal background model (UBM):

• Parameters of the UBM

Mjjjj 1

)ubm()ubm()ubm()ubm( },,{

Client Speaker Model

Universal Background

Enrollment Utterance (X(s)) of Client Speaker

)1()( )ubm()()(jj

2-class Hypothesis problem:H0: MFCC sequence X(c) comes from to the true speakerH1: MFCC sequence X(c) comes from an impostor

Verification score is a likelihood ratio:

)|(log)|(log)1|(

)0|(logScore ubm)()()()(

XpXpHXp

Featureextraction

BackgroundModel

Decision+−

accept Score

reject Score

SpeakerModel )(s

GMM-UBM Scoring

Outline

GMM-UBM for Speaker Verification GMM-SVM for Speaker Verification Data-Imbalance Problem in GMM-SVM Acoustic Vector Resampling for GMM-SVM Results on NIST SRE

)(s)(sutt

Feature Extraction

)(sX Mean Stacking

MAPAdaptation

1 2 Mi

GMM GMM supervectorsupervector

Mapping)(sX

GMM-SVM for Speaker Verification

)( Bbutt

)( 2butt

Feature Extraction

)()( ,,1 Bbb XX

Compute GMM-Supervector of Target

Speaker s

Compute GMM-Supervectors of

Background Speakers

Feature Extraction

)(cXCompute GMM-Supervector of

Claimant c

)(sutt

)(cutt

GMM-SVM Scoring

)( 1butt

)( )(SVM-GMM

SVM ScoringSVM Scoring

),( )()( sc XXK

),( )()( 1bc XXK

),( )()( Bbc XXK

bc BBXXK1

)()()( 21

)()()(

bkg fromSV

)()()()(0

)(SVM-GMM ),(),()( sbc

scsc dXXKXXKXS i

GMM-UBM Scoring Vs. GMM-SVM Scoring

)()()(

bkg fromSV

)()()()(0

)(SVM-GMM ),(),()( sbc

scsc dXXKXXKXS i

)|(log)|(log)( ubm)()()()()(UBM-GMM cscc XpXpXS

GMM-UBM:

GMM-SVM:

)()()(

sc XXK

Normalized GMM-supervector of

claimant’s utterance

Normalized GMM-supervector of target-speaker’s utterance

Outline

GMM-UBM for Speaker Verification GMM-SVM for Speaker Verification Data-Imbalance Problem in GMM-SVM Utterance Partitioning for GMM-SVM Results on NIST SRE

150 1 2 3 4 5 6 7 8

x 2Linear SVM, C=10.0, #SV=3, slope=-1.00

Speaker ClassImpostor Class

For each target speaker, we only have one utterance (GMM-supervector) from the target speaker and many utterances from the background speakers.

So, we have a highly imbalance learning problem.

Only one training

vector from the target speaker

Data Imbalance in GMM-SVM

0 1 2 3 4 5 6 7 80

x 2Linear SVM, C=10.0, #SV=3, slope=-1.44

Orientation of the decision boundary

depends mainly on impostor-class

A 3-dim two-class problem illustrating the problem that the SVM decision plane is largely governed by the impostor-class supervectors.

Impostor Class

Speaker Class

Region for which the target-speaker vector can be located without

changing the orientation of the decision plane

Outline

GMM-UBM for Speaker Verification GMM-SVM for Speaker Verification Data-Imbalance Problem in GMM-SVM Utterance Partitioning for GMM-SVM Results on NIST SRE

Partition an enrollment utterance of a target speaker into number of sub-utterances, with each sub-utterance producing one GMM-supervector.

Utterance Partitioning

,,1 Bbb

)(utt Bb

Target-speaker’s Enrollment Utterance

Feature Extraction

Background-speakers’ Utterances

Feature Extraction(s)0X

(s)1X (s)

2X (s)4X(s)

1X)(b1

1X )(b4

1X)(b3

2X)(b1

2X )(b4

2X)(b3

BX)(b1

BX )(b4

BX)(b3

MAP Adaptation and

Mean Stacking

SVM Training

(s)0 ,, XX

)( 1utt b

)( 2utt b

(s)utt

SVM of Target Speaker s

Utterance Partitioning

Length-Representation Trade-off

• When the number of partitions increases, the length of sub-utterance decreases.

• If the utterance-length is too short, the supervectors of the sub-utterances will be almost the same as that of the UBM

(s)utt

0 1 2 3 4 5 6 7 80

Linear SVM, C=10.0, #SV=3, slope=-1.44

Supervector corresponding to

the UBM

1. Randomly rearrange the sequence of acoustic vectors in an utterance;

2. Partition the acoustic vectors of an utterance into N segments;

3. If Step 1 and Step 2 are repeated R times, we obtain RN+1 target-speaker’s supervectors .

Utterance Partitioning with Acoustic Vector Resampling (UP-AVR)

Procedure of UP-AVR:

Goal: Increase the number of sub-utterances without compromising their representation power

MFCC seq. before randomization

MFCC seq. after randomization

,,1 Bbb

)(utt Bb

Target-speaker’s Enrollment Utterance

Feature Extraction andIndex Randomization

Background-speakers’ Utterances

(s)1X (s)

2X (s)4X(s)

1X)(b1

1X )(b4

2X)(b1

2X )(b4

BX)(b1

4BX)(b

MAP Adaptation and

Mean Stacking

SVM Training

(s)0 ,, XX

)( 1utt b

)( 2utt b

(s)utt

SVM of Target Speaker s

Feature Extraction andIndex Randomization

• Characteristics of supervectors created by UP-AVR Average pairwise distance between sub-utt SVs is larger than the

average pairwise distance between sub-utt SVs and full-utt SV. Average pairwise distance between speaker-class’s sub-utt SVs and

impostor-class’s SVs is smaller than the average pairwise distance between speaker-class’s full-utt SV and impostor-class’s SVs.

Imposter-class

Speaker-class

Sub-utt supervector

Full-utt supervector

Nuisance Attribute Project (NAP) [Solomonoff et al., ICASSP2005]

Nuisance Attribute Projection

Sub-space representing session variability.Defined by V

),()( hss mPm

),( hsT mVV),( hsm

),(),(),(),( 21

),( hsT

hchshc XXK

Recall the GMM-supervector kernel:

Define the session- and speaker-dependent supervector as

sessionfor stands andspeaker for stands where,),(),( 21

hshshs m

Remove the session-dependent part (h) by removing the sub-space that causes the session variability:

),(),()( )( hsThss mVVImPm

The New kernel becomes

),(),(

)()()()( ),(hsThc

sTcsc XXK

Goal: To reduce the effect of session variability

Nuisance Attribute Project (NAP) [Solomonoff et al., ICASSP2005]

Nuisance Attribute Projection

otherwise0

speaker same the tocorrespond and 1

minarg,

),(),(*

hjhiij mPmPP

Sub-space representing session variability.Defined by V

),()( hss mPm

),( hsT mVV),( hsm

Enrollment Process of

GMM-SVM with UP-AVR

MFCCs of an utterance from

target-speaker s

MAP and Mean Stacking

Session-dependent

supervectors

Session-independent supervectors

SVM Training

),( hsX

),( hsim

Resampling/Partitioning

),( hsiX

SVM of target-speaker s

)( jbim

Verification Process of

GMM-SVM with UP-AVR

MFCCs of a test utterance

from claimant c

MAP and Mean Stacking

Session-dependent supervector

Session-independent supervector

SVM Scoring T-NormNormalized

scorescore

TnormModels

)( )(cXS )(~ )(cXS

),( hcm

SVM of target-speaker s

T-Norm (Auckenthaler, 2000)

)( )(cXS

SVM Scoring

T-Norm SVM 1

SVM Scoring

T-Norm SVM R

ComputeMeanand

StandardDeviation

)()()(

Z-norm)(

from test utterance

Goal: To shift and scale the verification scores so that a global decision threshold can be used for all speakers

T-Norm

Normalized scorescore

TnormModels

)( )(cXS

T-Norm

Normalized scorescore

TnormModels

)( )(cXS

Outline

GMM-UBM for Speaker Verification GMM-SVM for Speaker Verification Data-Imbalance Problem in GMM-SVM Utterance Partitioning for GMM-SVM Experiments on NIST SRE

Evaluations on NIST SRE 2002 and 2004 NIST SRE 2002:

Use NIST’01 for computing the UBMs, impostor-class supervectors of SVMs, Tnorm models, and NAP parameters

2983 true-speaker trials and 36287 impostor attempts 2-min utterances for training and about 1-min utt for test

NIST SRE 2004: Use the Fisher corpus for computing UBMs, impostor-class supervectors of

SVMs, and Tnorm models NIST’99 and NIST’00 for computing NAP parameters 2386 true-speaker trials and 23838 impostor attempts 5-min utterances for training and testing

Experiments

Speech Data

12 MFCC + 12 ΔMFCC with feature warping 1024-mixture GMMs for GMM-UBM 256-mixture GMMs for GMM-SVM MAP relevance factor = 16 300 impostor-class supervectors for GMM-SVM 200 T-norm models 64-dim session variability subspace (NAP corank, rank of V)

Experiments

Features and Models

No. of mixtures in GMM-SVM (NIST’02)

Results

Large number of features with small

variance

Threshold below which the variances

of feature are deemed too small

Effects of NAP on Different NIST SRE

Results

Large eigenvalues mean large session variation

Effect of NAP Corank on Performance

Results

No NAP

Results

Comparing discriminative power of GMM-SVM and GMM-SVM with UP-AVR

Results

EER and MinDCF vs. No. of Target-Speaker Supervectors

NIST’02

Results

Varying the number of resampling (R) and number of partitions (N)

NIST’02

Results

NIST’02

Performance on NIST’02

EER=9.05%EER=9.05%

EER=9.39%EER=9.39%

EER=8.16%EER=8.16%

Experiments and Results

EER=9.46%EER=9.46%EER=10.42%EER=10.42%

EER=16.05%EER=16.05%

Performance on NIST’04

Experiments and Results

GMM-UBM

GMM-SVMGMM-SVM

w/ UP-AVR

1. S.X. Zhang and M.W. Mak "Optimized Discriminative Kernel for SVM Scoring and its Application to Speaker Verification", IEEE Trans. on Neural Networks, to appear.

2. M.W. Mak and W. Rao, "Utterance Partitioning with Acoustic Vector Resampling for GMM-SVM Speaker Verification", Speech Communication, vol. 53 (1), Jan. 2011, Pages 119-130.

2. M.W. Mak and W. Rao, "Acoustic Vector Resampling for GMMSVM-Based Speaker Verification, Interspeech 2010. Sept. 2010, Makuhari, Japan, pp. 1449-1452.

3. S.Y. Kung, M.W. Mak, and S.H. Lin. Biometric Authentication: A Machine Learning Approach, Prentice Hall, 2005

4. W. M. Campbell, D. E. Sturim, and D. A. Reynolds, “Support vector machines using GMM supervectors for speaker verification,” IEEE Signal Processing Letters, vol. 13, pp. 308–311, 2006.

5. D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, “Speaker verification using adapted Gaussian mixture models,” Digital Signal Processing, vol. 10, pp. 19–41, 2000.

References

GMM-SVM+UP-AVR

Documents

Transcript of GMM-SVM+UP-AVR

Mikrokontroler AVR. Arsitektur AVR Peta Memori AVR.

Download Avr 1700 Avr 170 Avr 170 230C - Harman Kardon

Design Gmm Guide

Gmm Chapter

AVR 3650, AVR 365, AVR 2650, AVR 265static.highspeedbackbone.net/pdf/Harman Kardon AVR 3650... · 2011. 12. 28. · AVR 3650, AVR 365, AVR 2650, AVR 265 Quick-Start Guide. AVR Introduction,

GMM Assignment Saket

Gmm Brands

GMM ASSI(2)

GMM 2015 Brochure

AVR 1710, AVR 171/230, AVR 171/230C, AVR 1610, AVR 161/230 ... · AVR 6 Connections Roku Streaming Stick™ (AVR 1710/AVR 171/AVR 1610/AVR 161 only): If you have a Roku Streaming

AVR 1700, AVR 170, AVR 170/230C

Gps Gmm u1lp

UNIT - IV GMM

Data sheet GCM MOD GMM EC.1 Communications … Data sheet GCM MOD GMM EC.1 V_1.0 Data sheet GCM MOD GMM EC.1 Communications module Modbus for GMM EC ERP no.: 5206415 GCM MOD GMM EC.1

GMM Ingles

Chap03 gmm prot_03_kh

AVR 1710, AVR 171/230, AVR 171/230C, AVR 1610, AVR …

AVR 1700, AVR 170, AVR 170/230C - EXCELIA HIFI

GMM Solar Energy by Rob Smith GMM Committee on Climate Change.

lIBRILLO GMM