benefitsofcondition-matchedcohortselectioninscorenormalization ... · 2017-08-28 · 0.1 0.2 0.5 1...

Post on 19-Mar-2020

0 views 0 download

Transcript of benefitsofcondition-matchedcohortselectioninscorenormalization ... · 2017-08-28 · 0.1 0.2 0.5 1...

Analysis ofmutual durationandnoise effects in speaker recognition:benefits of condition-matchedcohort selection in scorenormalization

Andreas Nautsch? Rahim Saeidi†

Christian Rathgeb? Christoph Busch?

? da/sec – Biometrics and Internet Security Research Group, Hochschule Darmstadt, Germany† Department of Signal Processing and Acoustics, Aalto University, Finland

andreas.nautsch@{cased|h-da}.de

Motivation

• Real-life conditions in commercial andforensic applications

• Vast broadness of conditions e.g.:

– Sample completeness (duration)– Ambient noise (AC / CROWD)

• True sample condition remains unknowndue to e.g., SNR estimations

• Unified Audio Characteristics seempromising for condition-matching score-normalization

Research Questions

Q1: How extensive are mutual effects to the per-formance of a State-of-the-Art system?

Q2: Does condition-informed cohort selectionbenefit from statistic approaches ratherthan from condition-matched cohorts?

Q3: Do conditions affect basic i-vector proper-ties, i.e. mean i-vectors?

Related Work

Unified Audio Characteristics

• Single multivariate Gaussian models inoriginal (raw) i-vector spaceΛi ∼ N (µi,Σ), i = 1, . . . , 55

– Condition-dependent µi– Shared full Σ by pooling

• Condition quality vector (q-vector) ~q asposterior probabilities for i-vector ~w:

~q(i) =P (~w |Λi)∑55i=1 P (~w |Λi)

Conventional AS Score Normalization

• Adaptive & Symmetric

– Refs vs. probe-alike cohort set

– Prbs vs. reference-alike cohort set

– Computation of: µref, σref, µprb, σprb

• AS-normalized scores SAS by averagedsymmetric zero-norm of score S:

SAS =1

2

(S − µref

σref+S − µprb

σprb

)Cohort Selection Criterion

• Symmetric Kullback-Leibler divergence:

1

2

55∑i=1

~qa(i) log~qa(i)

~qb(i)+ ~qb(i) log

~qb(i)

~qa(i)

• Selecting top-c cohort q-vectors

Experimental Set-up & Results

Condition 1 2 3 4 5 6 7 8 9 10 11 . . . 15 16 . . . 30 31 . . . 55

Duration 5 s 10 s 20 s 40 s full 5 s 10 s 20 s . . . full 5 s . . . full

Noise clean Air Conditioner (AC) CROWDSNR 0 dB 5 dB 10 dB 15 dB 20 dB 0 . . . 20 dB 0 dB . . . 20 dB 0 dB . . . 20 dB

I4U Data of NIST SRE’12

• Condition-dependent sampleversions from long-duration &clean samples

• VAD labels of clean samples

• LDA: 400 to 200

• PLDA: 200 factors

• No calibration

Baseline Results: impact of mutual conditions

5 s 10 s 20 s 40 s full

0.01

0.10.2

0.512

5

Duration

Cllr

5 s 10 s 20 s 40 s full

0.01

0.10.2

0.512

5

Duration

Cllr

0 dB

5 dB

10 dB

15 dB

20 dB

Clean

Cllr

Cminllr

(a) AC noise (b) CROWD noise

Comparison of AS-norm approaches

0 5 10 15 20

0.1

0.3

0.5

0.7

SNR (in dB)

Cm

inllr

5 s 10 s 20 s 40 s full

0.1

0.3

0.5

0.7

Duration

Cm

inllr

(c) 10s/CROWD (d) CROWD-0 dB

no norm uAS (matched cohorts)cAS uAS (automatic cohort selection)

Automatic cohort pre-selection

200 400 600 800

10

20

30

40

50

Cohort speaker ID

Con

dition

(e) Unique selections on 10s/CROWD-0 dB

Cohort pre-selection by:

• Similar conditions

• SNR more relevant than duration

Analysis: i-vector pool mean shift

• Multi-variate Student t-test:

t2 =nx nynx + ny

(~̄x− ~̄y)′W−1(~̄x− ~̄y),with:

W =

∑nx

i=1 (~xi − ~̄x)(~xi − ~̄x)′ +∑ny

i=1 (~yi − ~̄y)(~yi − ~̄y)′

nx + ny − 2

• P-values by cdf F of χ2 (D = 200):

t2 ∼ χ2D, p = 1− Fχ2

D(t2)

10 20 30 40 50

10

20

30

40

50

Condition

Con

dition

0

5 e3

10 e3

15 e3

20 e3

(f) t2 scores of Hotelling’s T 2-test

⇒ High significance: p

{= 1 same condition: same mean i-vector< 10−13 cross condition: different mean i-vector

Conclusion

• Mutual effects with performance impactsby log-duration and log-SNR

• Quality-based cohort pre-selection yieldssignificant gains

• Performance gap between degraded andnon-degraded samples still as open chal-lenge

• Means of cross-condition i-vectors differ

Acknowledgements

We would like to thank the I4U consortium fordatabase sharing.

This work has been partially funded by the Cen-ter for Advanced Security Research Darmstadt(CASED), the Hesse government (project no.467/15-09, BioMobile) and the Academy ofFinland (project no. 256961 and 284671).