benefitsofcondition-matchedcohortselectioninscorenormalization ... · 2017-08-28 · 0.1 0.2 0.5 1...

1
Analysis of mutual duration and noise effects in speaker recognition: benefits of condition-matched cohort selection in score normalization Andreas Nautsch ? Rahim Saeidi Christian Rathgeb ? Christoph Busch ? ? da/sec – Biometrics and Internet Security Research Group, Hochschule Darmstadt, Germany Department of Signal Processing and Acoustics, Aalto University, Finland andreas.nautsch@{cased|h-da}.de Motivation • Real-life conditions in commercial and forensic applications • Vast broadness of conditions e.g.: Sample completeness (duration) Ambient noise (AC / CROWD) • True sample condition remains unknown due to e.g., SNR estimations • Unified Audio Characteristics seem promising for condition-matching score- normalization Research Questions Q1: How extensive are mutual effects to the per- formance of a State-of-the-Art system? Q2: Does condition-informed cohort selection benefit from statistic approaches rather than from condition-matched cohorts? Q3: Do conditions affect basic i-vector proper- ties, i.e. mean i-vectors? Related Work Unified Audio Characteristics • Single multivariate Gaussian models in original (raw) i-vector space Λ i ∼N (μ i , Σ),i =1,..., 55 Condition-dependent μ i Shared full Σ by pooling • Condition quality vector (q-vector) ~ q as posterior probabilities for i-vector ~ w : ~ q (i)= P ( ~ w | Λ i ) 55 i=1 P ( ~ w | Λ i ) Conventional AS Score Normalization • Adaptive & Symmetric Refs vs. probe-alike cohort set Prbs vs. reference-alike cohort set Computation of: μ ref ref prb prb • AS-normalized scores S AS by averaged symmetric zero-norm of score S : S AS = 1 2 S - μ ref σ ref + S - μ prb σ prb Cohort Selection Criterion • Symmetric Kullback-Leibler divergence: 1 2 55 X i=1 ~ q a (i) log ~ q a (i) ~ q b (i) + ~ q b (i) log ~ q b (i) ~ q a (i) • Selecting top-c cohort q-vectors Experimental Set-up & Results Condition 1 2 3 4 5 6 7 8 9 10 11 . . . 15 16 . . . 30 31 . . . 55 Duration 5s 10 s 20 s 40 s full 5s 10 s 20 s . . . full 5s ... full Noise clean Air Conditioner (AC) CROWD SNR 0 dB 5 dB 10 dB 15 dB 20 dB 0 ... 20 dB 0 dB . . . 20 dB 0 dB . . . 20 dB I4U Data of NIST SRE’12 • Condition-dependent sample versions from long-duration & clean samples • VAD labels of clean samples • LDA: 400 to 200 • PLDA: 200 factors • No calibration Baseline Results: impact of mutual conditions 5s 10 s 20 s 40 s full 0.01 0.1 0.2 0.5 1 2 5 Duration C llr 5s 10 s 20 s 40 s full 0.01 0.1 0.2 0.5 1 2 5 Duration C llr 0 dB 5 dB 10 dB 15 dB 20 dB Clean C llr C min llr (a) AC noise (b) CROWD noise Comparison of AS-norm approaches 0 5 10 15 20 0.1 0.3 0.5 0.7 SNR (in dB) C min llr 5 s 10 s 20 s 40 s full 0.1 0.3 0.5 0.7 Duration C min llr (c) 10s/CROWD (d) CROWD-0 dB no norm uAS (matched cohorts) cAS uAS (automatic cohort selection) Automatic cohort pre-selection 200 400 600 800 10 20 30 40 50 Cohort speaker ID Condition (e) Unique selections on 10s/CROWD-0 dB Cohort pre-selection by: • Similar conditions • SNR more relevant than duration Analysis: i-vector pool mean shift • Multi-variate Student t-test: t 2 = n x n y n x + n y ( ~ ¯ x - ~ ¯ y ) 0 W -1 ( ~ ¯ x - ~ ¯ y ), with: W = n x i=1 ( ~x i - ~ ¯ x)( ~x i - ~ ¯ x) 0 + n y i=1 (~ y i - ~ ¯ y )(~ y i - ~ ¯ y ) 0 n x + n y - 2 • P-values by cdf F of χ 2 (D = 200): t 2 χ 2 D , p =1 - F χ 2 D (t 2 ) 10 20 30 40 50 10 20 30 40 50 Condition Condition 0 5 e3 10 e3 15 e3 20 e3 (f) t 2 scores of Hotelling’s T 2 -test High significance: p ( =1 same condition: same mean i-vector < 10 -13 cross condition: different mean i-vector Conclusion • Mutual effects with performance impacts by log-duration and log-SNR • Quality-based cohort pre-selection yields significant gains • Performance gap between degraded and non-degraded samples still as open chal- lenge • Means of cross-condition i-vectors differ Acknowledgements We would like to thank the I4U consortium for database sharing. This work has been partially funded by the Cen- ter for Advanced Security Research Darmstadt (CASED), the Hesse government (project no. 467/15-09, BioMobile) and the Academy of Finland (project no. 256961 and 284671).

Transcript of benefitsofcondition-matchedcohortselectioninscorenormalization ... · 2017-08-28 · 0.1 0.2 0.5 1...

Page 1: benefitsofcondition-matchedcohortselectioninscorenormalization ... · 2017-08-28 · 0.1 0.2 0.5 1 2 5 Duration C llr 5s 10s 20s 40s full 0.01 0.1 0.2 0.5 1 2 5 Duration C llr 0dB

Analysis ofmutual durationandnoise effects in speaker recognition:benefits of condition-matchedcohort selection in scorenormalization

Andreas Nautsch? Rahim Saeidi†

Christian Rathgeb? Christoph Busch?

? da/sec – Biometrics and Internet Security Research Group, Hochschule Darmstadt, Germany† Department of Signal Processing and Acoustics, Aalto University, Finland

andreas.nautsch@{cased|h-da}.de

Motivation

• Real-life conditions in commercial andforensic applications

• Vast broadness of conditions e.g.:

– Sample completeness (duration)– Ambient noise (AC / CROWD)

• True sample condition remains unknowndue to e.g., SNR estimations

• Unified Audio Characteristics seempromising for condition-matching score-normalization

Research Questions

Q1: How extensive are mutual effects to the per-formance of a State-of-the-Art system?

Q2: Does condition-informed cohort selectionbenefit from statistic approaches ratherthan from condition-matched cohorts?

Q3: Do conditions affect basic i-vector proper-ties, i.e. mean i-vectors?

Related Work

Unified Audio Characteristics

• Single multivariate Gaussian models inoriginal (raw) i-vector spaceΛi ∼ N (µi,Σ), i = 1, . . . , 55

– Condition-dependent µi– Shared full Σ by pooling

• Condition quality vector (q-vector) ~q asposterior probabilities for i-vector ~w:

~q(i) =P (~w |Λi)∑55i=1 P (~w |Λi)

Conventional AS Score Normalization

• Adaptive & Symmetric

– Refs vs. probe-alike cohort set

– Prbs vs. reference-alike cohort set

– Computation of: µref, σref, µprb, σprb

• AS-normalized scores SAS by averagedsymmetric zero-norm of score S:

SAS =1

2

(S − µref

σref+S − µprb

σprb

)Cohort Selection Criterion

• Symmetric Kullback-Leibler divergence:

1

2

55∑i=1

~qa(i) log~qa(i)

~qb(i)+ ~qb(i) log

~qb(i)

~qa(i)

• Selecting top-c cohort q-vectors

Experimental Set-up & Results

Condition 1 2 3 4 5 6 7 8 9 10 11 . . . 15 16 . . . 30 31 . . . 55

Duration 5 s 10 s 20 s 40 s full 5 s 10 s 20 s . . . full 5 s . . . full

Noise clean Air Conditioner (AC) CROWDSNR 0 dB 5 dB 10 dB 15 dB 20 dB 0 . . . 20 dB 0 dB . . . 20 dB 0 dB . . . 20 dB

I4U Data of NIST SRE’12

• Condition-dependent sampleversions from long-duration &clean samples

• VAD labels of clean samples

• LDA: 400 to 200

• PLDA: 200 factors

• No calibration

Baseline Results: impact of mutual conditions

5 s 10 s 20 s 40 s full

0.01

0.10.2

0.512

5

Duration

Cllr

5 s 10 s 20 s 40 s full

0.01

0.10.2

0.512

5

Duration

Cllr

0 dB

5 dB

10 dB

15 dB

20 dB

Clean

Cllr

Cminllr

(a) AC noise (b) CROWD noise

Comparison of AS-norm approaches

0 5 10 15 20

0.1

0.3

0.5

0.7

SNR (in dB)

Cm

inllr

5 s 10 s 20 s 40 s full

0.1

0.3

0.5

0.7

Duration

Cm

inllr

(c) 10s/CROWD (d) CROWD-0 dB

no norm uAS (matched cohorts)cAS uAS (automatic cohort selection)

Automatic cohort pre-selection

200 400 600 800

10

20

30

40

50

Cohort speaker ID

Con

dition

(e) Unique selections on 10s/CROWD-0 dB

Cohort pre-selection by:

• Similar conditions

• SNR more relevant than duration

Analysis: i-vector pool mean shift

• Multi-variate Student t-test:

t2 =nx nynx + ny

(~̄x− ~̄y)′W−1(~̄x− ~̄y),with:

W =

∑nx

i=1 (~xi − ~̄x)(~xi − ~̄x)′ +∑ny

i=1 (~yi − ~̄y)(~yi − ~̄y)′

nx + ny − 2

• P-values by cdf F of χ2 (D = 200):

t2 ∼ χ2D, p = 1− Fχ2

D(t2)

10 20 30 40 50

10

20

30

40

50

Condition

Con

dition

0

5 e3

10 e3

15 e3

20 e3

(f) t2 scores of Hotelling’s T 2-test

⇒ High significance: p

{= 1 same condition: same mean i-vector< 10−13 cross condition: different mean i-vector

Conclusion

• Mutual effects with performance impactsby log-duration and log-SNR

• Quality-based cohort pre-selection yieldssignificant gains

• Performance gap between degraded andnon-degraded samples still as open chal-lenge

• Means of cross-condition i-vectors differ

Acknowledgements

We would like to thank the I4U consortium fordatabase sharing.

This work has been partially funded by the Cen-ter for Advanced Security Research Darmstadt(CASED), the Hesse government (project no.467/15-09, BioMobile) and the Academy ofFinland (project no. 256961 and 284671).