Similarity normalization for speaker verification by fuzzy fusion

7
* Corresponding author. Tel.: # 612-6201-2394; fax: # 612- 6201-5231. E-mail address: tuanp@ise.canberra.edu.au (T. Pham) Pattern Recognition 33 (2000) 309}315 Similarity normalization for speaker veri"cation by fuzzy fusion Tuan Pham*, Michael Wagner Faculty of Information Sciences and Engineering, University of Canberra ACT 2601, Australia Received 11 June 1998; accepted 25 January 1999 Abstract Similarity or likelihood normalization techniques are important for speaker veri"cation systems as they help to alleviate the variations in the speech signals. In the conventional normalization, the a priori probabilities of the cohort speakers are assumed to be equal. From this standpoint, we apply the theory of fuzzy measure and fuzzy integral to combine the likelihood values of the cohort speakers in which the assumption of equal a priori probabilities is relaxed. This approach replaces the conventional normalization term by the fuzzy integral which acts as a non-linear fusion of the similarity measures of an utterance assigned to the cohort speakers. We illustrate the performance of the proposed approach by testing the speaker veri"cation system with both the conventional and the fuzzy algorithms using the commercial speech corpus TI46. The results in terms of the equal error rates show that the speaker veri"cation system using the fuzzy integral is more #exible and more favorable than the conventional normalization method. ( 1999 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Speaker veri"cation; Similarity normalization; Fusion; Fuzzy measure; Fuzzy integral 1. Introduction Speaker veri"cation is one of the challenging areas of speech research and has many applications including telecommunications, security systems, banking transac- tions, database management, forensic tasks, command and control, and others. Technically, it is one of the two tasks in speaker recognition. In other words, a speaker recognition system can be divided into two categories: speaker identixcation and speaker verixcation. A speaker identi"cation recognizer tries to assign an unknown speaker to one of the reference speakers based on the closet measure of similarity, whereas a speaker veri"ca- tion recognizer is aimed to either accept or reject an unknown speaker by verifying the identity claim. Thus, the main point to distinguish between these two tasks is the number of decision alternatives. For speaker identi- "cation, the decision alternatives are equal to the number of the speakers. For speaker veri"cation, there are only two alternatives, i.e. either accept or reject the claimed speaker. Di!erent tasks of recognition can be used to serve di!erent purposes. The veri"cation systems are more appropriate for most commercial applications; whereas the identi"cation systems are useful for the study of parametric and speech material modeling. For more details in recent developments on speaker recognition, the readers are referred to Refs. [1}3]. In speaker veri"cation systems, the normalization techniques are important as they help to alleviate the variations in the speech signals, which are due to noise, di!erent recording and transmission conditions [1]. There are two types of normalization techniques for speaker recognition: parameter and similarity. Some typical works in the parameter type were proposed by Atal [4], Furui [5], and in the similarity type were by Higgin et al. [6], Matsui and Furui [7]. It has also been reported that most of speaker veri"cation systems are based on the similarity-domain normalization [8]. We therefore, in this paper, will focus our attention to the veri"cation mode with respect to the similarity nor- malization. 0031-3203/99/$20.00 ( 1999 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 0 4 2 - 4

Transcript of Similarity normalization for speaker verification by fuzzy fusion

Page 1: Similarity normalization for speaker verification by fuzzy fusion

*Corresponding author. Tel.: #612-6201-2394; fax: #612-6201-5231.

E-mail address: [email protected] (T. Pham)

Pattern Recognition 33 (2000) 309}315

Similarity normalization for speaker veri"cation by fuzzy fusion

Tuan Pham*, Michael Wagner

Faculty of Information Sciences and Engineering, University of Canberra ACT 2601, Australia

Received 11 June 1998; accepted 25 January 1999

Abstract

Similarity or likelihood normalization techniques are important for speaker veri"cation systems as they help toalleviate the variations in the speech signals. In the conventional normalization, the a priori probabilities of the cohortspeakers are assumed to be equal. From this standpoint, we apply the theory of fuzzy measure and fuzzy integral tocombine the likelihood values of the cohort speakers in which the assumption of equal a priori probabilities is relaxed.This approach replaces the conventional normalization term by the fuzzy integral which acts as a non-linear fusion of thesimilarity measures of an utterance assigned to the cohort speakers. We illustrate the performance of the proposedapproach by testing the speaker veri"cation system with both the conventional and the fuzzy algorithms usingthe commercial speech corpus TI46. The results in terms of the equal error rates show that the speaker veri"cationsystem using the fuzzy integral is more #exible and more favorable than the conventional normalization method.( 1999 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.

Keywords: Speaker veri"cation; Similarity normalization; Fusion; Fuzzy measure; Fuzzy integral

1. Introduction

Speaker veri"cation is one of the challenging areas ofspeech research and has many applications includingtelecommunications, security systems, banking transac-tions, database management, forensic tasks, commandand control, and others. Technically, it is one of the twotasks in speaker recognition. In other words, a speakerrecognition system can be divided into two categories:speaker identixcation and speaker verixcation. A speakeridenti"cation recognizer tries to assign an unknownspeaker to one of the reference speakers based on thecloset measure of similarity, whereas a speaker veri"ca-tion recognizer is aimed to either accept or reject anunknown speaker by verifying the identity claim. Thus,the main point to distinguish between these two tasks isthe number of decision alternatives. For speaker identi-"cation, the decision alternatives are equal to the number

of the speakers. For speaker veri"cation, there are onlytwo alternatives, i.e. either accept or reject the claimedspeaker. Di!erent tasks of recognition can be used toserve di!erent purposes. The veri"cation systems aremore appropriate for most commercial applications;whereas the identi"cation systems are useful for the studyof parametric and speech material modeling. For moredetails in recent developments on speaker recognition,the readers are referred to Refs. [1}3].

In speaker veri"cation systems, the normalizationtechniques are important as they help to alleviate thevariations in the speech signals, which are due to noise,di!erent recording and transmission conditions [1].There are two types of normalization techniquesfor speaker recognition: parameter and similarity. Sometypical works in the parameter type were proposed byAtal [4], Furui [5], and in the similarity type wereby Higgin et al. [6], Matsui and Furui [7]. It has alsobeen reported that most of speaker veri"cation systemsare based on the similarity-domain normalization [8].We therefore, in this paper, will focus our attention tothe veri"cation mode with respect to the similarity nor-malization.

0031-3203/99/$20.00 ( 1999 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 0 4 2 - 4

Page 2: Similarity normalization for speaker verification by fuzzy fusion

Generally in most similarity normalization techniques,the likelihood values of the utterance coming from thecohort speakers, whose models are closest to the claim-ant model, are assumed to be equal likely. In reality,however, this assumption is not often true as the sim-ilarity measures between each cohort speaker and theclient speaker may be di!erent. Basing our motivation onthis drawback, we introduce a new normalized log-likeli-hood method using the concept of fuzzy fusion. We relaxthe assumption of equal likelihood by imposing the fuzzymeasures of the similarities between the cohort speakermodels and the client model. Then the scoring of thecohort models can be obtained by the fuzzy integralwhich acts as a fusion operator with respect to the fuzzymeasures. The rest of this paper is organized as follows.In Section 2, we present the basic formulations of thenormalization techniques according to the similarity do-main. In Section 3, the concepts of fuzzy measure andfuzzy integral are introduced. The fuzzy fusion forscoring the normalized log likelihood is implemented inSection 4. We compare the performance between theconventional and the proposed techniques using a com-mercial speech database in Section 5. Finally, Section 6concludes the new application for speaker recognitionand suggests possible development.

2. Similarity-domain normalization

Given an input set of speech feature vectorsX"Mxo 1, xo 2, 2, xo NN, the veri"cation system has to deci-de if X was spoken by the client (for the sake of simplicity,from now on we will denote xo as x). Based on thesimilarity domain, this can be seen as a statistical testbetween H

0: S and H

1: S@ where H

0is the null hypothesis

that the claimant is the client S@, while H1

is the alterna-tive hypothesis that the claimant is an impostor S@. Thedecision according to the Bayesian rule for minimum riskis given by

¸(X)"p(XDS)

p(XDS@)G'h : X3H

0,

)h : X3H1,

(1)

where h is a prescribed threshold. Taking the logarithm,the likelihood ratio of Eq. (1) becomes

log ¸(X)"log p (XDS)!log p(XDS@)G'log h : X3H

0,

)log h : X3H1,

(2)

where log ¸(X) is also called the normalized log-likeli-hood score. The normalized log-likelihhood value of Xgiven the client model can be determined as

log p(XDS)"1

N

N+n/1

log p(xnDS). (3)

Two common methods called the geometric mean andthe maximum [9] can be used to calculate the normal-ized log-likelihood score given not the client model.For a set of background speaker models of size B: S@"MS

1, S

2,2, S

BN, the geometric mean method is de"ned

as

log p(XDS@)"1

B

B+b/1

log p(XDSb). (4)

The maximum method is de"ned as

log p(XDS@)"maxSb|S{

log p(XDSbN, (5)

where the term log p(XDSb) in both Eqs. (4) and (5) can be

calculated as in Eq. (3), and except for the scale 1/N, it isthe log likelihood of an utterance X coming from one ofthe cohort speakers with the assumption that the a prioriprobabilities being equal.

As the main purpose of this paper is to attempt toimprove the scoring of the similarity normalization, wewill simply use the vector quantization (VQ) method togenerate the acoustical models. Thus, the log likelihoodin terms of the VQ distortion measure between the set ofvectors X of the claimed speaker and the codebook of aspeaker S can be expressed as

log p(xnDS)"!min

k

[d(xn, b

k(S))], k"1, 2,2, K, (6)

where xn3X, b

k(S) is a codeword of speaker S, K is the

codebook size, and d(xn, b

k(S)) is the Euclidean distance

between xnand b

k(S).

3. Fuzzy measure and fuzzy integral

Stemming from the concept of fuzzy sets by Zadeh[10], the theory of fuzzy measures and fuzzy integralswere "rst introduced by Sugeno [11]. Fuzzy measuresare used as subjective scales for grades of fuzziness thatcan be expressed as `grade of importancea, or `grade ofclosenessa, etc. In mathematical terms, a fuzzy measure isa set function with monotonicity but not always additiv-ity. Based on the notion of a fuzzy measure, a fuzzyintegral is a functional with monotonicity which is usedfor aggregating information from multiple sources withrespect to the fuzzy measure. For more details in theoryof fuzzy measure and fuzzy integral, the reader is referredto Refs. [11}13].

3.1. Fuzzy measure

Let > be an arbitrary set, and B be a Borel "eld of >.A set function g de"ned on B is a fuzzy measure if itsatis"es the following three axioms:

1. Boundary conditions: g(0)"0, g(>)"1.

310 T. Pham, M. Wagner / Pattern Recognition 33 (2000) 309}315

Page 3: Similarity normalization for speaker verification by fuzzy fusion

Table 1gj measures on the power set of >

Subset A gj(A)

0 0My

1N 0.34

My2N 0.32

My3N 0.33

My1, y

2N 0.6633

My2, y

3N 0.6532

My1, y

3N 0.6734

My1, y

2, y

3N 1

2. Monotonicity: g(A))g(B) if ALB, andA, B3B.3. Continuity: lim

i?=g(A

i)"g(lim

i?=A

i) if A

i3B and

MAiN is monotone (an increasing sequence of measur-

able sets).

A gj-fuzzy measure is also proposed by Sugeno [11]which satis"es another condition known as the j-rule(j'!1):

g(AXB)"g(A)#g(B)#jg(A)g(B),

A, BL>, and AWB"0.

It is noted that when j"0, the gj-fuzzy measurebecomes a probability measure [14]. In general, the valueof the constant j can be determined by the properties ofthe gj-fuzzy measure as follows.

Let >"My1, y

2,2, y

mN. If the fuzzy density of the

gj-fuzzy measure is de"ned as a functiong : y

i3>P[0, 1] such that g

i"gj(MyiN), i"1, 2,2, m,

then the gj-fuzzy measure of a "nite set can be obtainedas [15]

gj(>)

"

n+i/1

gi#j

m~1+

i1/1

n+

i2/i1`1

gi1gi2#2#jm~1g

1g22

gm.

(7)

Provided that jO0, Eq. (7) can be rewritten as

gj(>)"1

j Cm<i/1

(1#jgi)!1D. (8)

With boundary condition g(>)"1, the constant j can bedetermined by solving the following equation:

j#1"m<i/1

(1#jgi). (9)

It has been proved [16] that for a "xed set of gi,

0(gi(1, there exists a unique root of j'!1, and

jO0, using Eq. (9). And also from Eq. (9) it can be seenthat if the values of g

iare known, then j can be cal-

culated.

3.2. Fuzzy integral

Let (>, B, g) be a fuzzy measure space andf : >P[0, 1] be a B-measurable function. A fuzzy inte-gral over AL> of the function f with respect to a fuzzymeasure g is de"ned by

PA

f (y) " g( ) )e" supa|*0, 1+

[min(a, g(fa))], (10)

where fa is the a level set of f, fa"My : f (y)*aN.The fuzzy integral in Eq. (10) is called the Sugeno

integral. When >"My1, y

2, 2, y

nN is a "nite set, and

0)f (y1))f (y

2)2)f (y

n))1, (if not, the elements of

> are rearranged to make this relation hold), the Sugenointegral can be computed by

PA

f (y)"g ( ) )" mmaxi/1

[min( f (yi), g(A

i))], (11)

where Ai"My

i, y

i`1,2, y

mN, and g(A

i) can be recursively

calculated in terms of the gj-fuzzy measure as [14]

g(Ai)"g

i#g(A

i~1)#jg

ig(A

i~1), 1(i)m. (12)

It can be seen that the above de"nition is not a properextension of the usual Lebesgue integral, which is notrecovered when the measure is additive. In order toovercome this drawback, the so-called Choquet integralwas proposed by Murofushi and Sugeno [17]. TheChoquet integral of f with respect to a fuzzy measure g isde"ned as follows:

PA

f (y) dg( ) )"m+i/1

[ f (yi)!f (y

i~1)]g(A

i) (13)

in which f (y0)"0.

To help further understand the concepts of fuzzymeasures and fuzzy integrals, let us consider a simpleexample as follows [18,19]: Let >"My

1, y

2, y

3N, and

given that the fuzzy densities are g1"0.34,

g2"0.32, g

3"0.33. Using Eq. (9), we obtain the quad-

ratic equation 0.0359j2#0.3266j !0.001"0. Theparameter j can be obtained by taking the unique rootj'!1, which gives j"0.0305. Also using Eq. (9), wecan calculate all the fuzzy measures on the power subsetsof >, whose values are shown in Table 1. Suppose thatf (y

1)"0.6, f (y

2)"0.7, and f (y

3)"0.1. Thus, we need

to rearrange the elements in >, which yields g1"

0.33, g2"0.34, and g

3"0.32 in order to satisfy f (y

1)"

0.1(f (y2)"0.6(f (y

3)"0.7. Using Eq. (11), the

Sugeno integral is computed as: max[min(0.1, 1),min(0.6, 0.66), min(0.7, 0.32)]"0.6. Using Eq. (13), theChoquet integral is obtained as (0.1!0)(1.0)#(0.6!0.1)(0.66)#(0.7!0.6)(0.32)"0.462.

T. Pham, M. Wagner / Pattern Recognition 33 (2000) 309}315 311

Page 4: Similarity normalization for speaker verification by fuzzy fusion

4. Fuzzy-fusion based normalization

It has been mentioned in the foregoing sections thatthe a priori probability of an utterance given that it isfrom one of the cohort speakers is assumed to be equal inthe conventional similarity normalization methods, weuse the concept of the fuzzy measure to calculate thegrades of similarity or closeness between each cohortspeaker model and the client model, i.e. the fuzzy density,and the multi-attributes of these fuzzy densities. The "nalscore for the normalization of the cohort speakers canthen be determined by combining all of these fuzzymeasures with the corresponding likelihood values usingthe Choquet integral. We express the proposed model inmathematical terms as

log ¸(X)"log p(XDS)!log F(XDS@), (14)

where F(XDS@) is the fuzzy integral of the likelihoodvalues of an utterance X coming from the cohortspeaker set S@"MS

b: b"1, 2,2, BN with respect to the

fuzzy measures of speaker similarity. It is de"ned asfollows:

F(XDS@)"B+b/1

[p(XDSb)!p(XDS

b~1)]g(Z

bDS), (15)

where p(XDSb) has been previously de"ned, Z

b"

MSb, S

b`1, 2, S

BN, g(Z

bDS) is the fuzzy measure of Z

bgiven that the true speaker is S, p(XDS

0)"0, and the

relation 0)p(XDS1))p(XDS

2),2, p(XDS

B) holds, other-

wise the elements in S@ need to be rearranged.From the previous presentation of the fuzzy measure

and the fuzzy integral, it is noticed that the key factor forthe fuzzy fusion process is the fuzzy density. If the fuzzydensities can be determined then the fuzzy measures canbe identi"ed, which make it ready for the operation of thefuzzy integral. For the fusion of similarity measures, weconsider the fuzzy density as the degree of similarity orcloseness between the acoustic model of a cohort speakerand that of the client, i.e. the greater the value of the fuzzydensity is, the closer the two models are. Therefore, wede"ne the fuzzy density as

g(SbDS)"1!exp(!aEvo b!vo SE2), (16)

where a is a positive constant, E ) E2 is the Euclidean normwhich indicates the root-mean-square averaging process,vo b is the mean code vector of a cohort speaker S

b, and vo S

is the mean code vector of the client speaker S.It is reasonable to assume that some acoustic models

of a cohort speaker, say S1, may be more similar to those

of the client speaker S than those of another cohortspeaker, say S

2. However, some other acoustic models of

S2

may be more similar to those of S than those of S1.

Since the mean code vectors are globally generated fromthe codebooks including all di!erent utterances of the

speakers, we therefore introduce the constant a inEq. (16) for each cohort speaker in order to "ne-tune thefuzzy density with respect to the Euclidean distancemeasure. At present we select the values of a by means ofthe training data and will further discuss this issue in theexperimental section.

5. Experiments

5.1. Measure of performance

One of the most common performance measures forspeaker veri"cation systems is the equal error rate (EER)which applies an a posteriori threshold to make the falseacceptance error rate equal to the false rejection errorrate. If the score of an identity claim is above a certainthreshold then it is veri"ed as the true speaker, otherwisethe claim is rejected. If the threshold is set high thenthere is a risk of rejecting a true speaker. On the contrary,if the threshold is set low then there is a risk of accept-ing an impostor. In order to balance the trade-o!between these two situations, the threshold is selectedat a level which makes the percentage of the false accept-ance error and the false rejection error equal based onthe distributions of client and impostor scores. Thus,the EER o!ers an e$cient way for measuring the degreeof separation between the client and the impostormodels. Using the EER as an indicator of system per-formance means that the smaller the EER is, the higherthe performance is.

5.2. The database

The commercial TI46 speech data corpus is used herefor the experiments. The TI46 corpus contains 46 ut-terances spoken repeatedly by 8 female and 8 malespeakers, labeled f1}f8 and m1}m8, respectively. Thevocabulary contains a set of 10 computer commands:Menter, erase, go, help, no, rubout, repeat, stop, start, yesN.Each speaker repeated the words 10 times in a singletraining session, and then again twice in each of 8 testingsessions. The corpus is sampled at 12,500 samples/s and12 bits/sample. The data were processed in 20.48 msframes at a frame rate at 125 frames/s. The frames wereHamming windowed and preemphasized with k"0.9.Fortysix mel-spectral bands of a width of 110 mel and 20mel-frequency cepstral coe$cients (MFCC) were deter-mined for each frame.

In the training session, each speaker's 100 trainingtokens (10 utterances]1 training session]10 repeti-tions) were used to train the speaker-based VQ codebookby clustering the set of all the speakers' MFCC intocodebooks of 32, 64 and 128 codewords using the LBGalgorithm [20].

312 T. Pham, M. Wagner / Pattern Recognition 33 (2000) 309}315

Page 5: Similarity normalization for speaker verification by fuzzy fusion

Table 2Equal error rates (%EERs) for the 16 speakers using geometricmean (GM) fuzzy fusion (FF) based normalization methods

GM FF

Codebook size Codebook size

Speaker 32 64 128 32 64 128

f1 4.17 3.01 2.40 1.80 1.19 1.19f2 5.98 1.19 1.79 1.19 0.60 1.20f3 9.90 5.66 3.67 7.79 3.70 2.33f4 0.00 0.00 0.00 0.00 0.00 0.00f5 1.78 1.78 0.59 1.19 0.60 0.00f6 6.67 3.01 1.80 2.41 0.59 0.00f7 7.38 4.32 3.61 6.48 4.00 2.30f8 12.76 9.73 9.22 10.05 8.22 7.62m1 3.07 3.05 3.06 3.03 3.03 2.43m2 4.17 1.28 1.22 3.14 1.22 1.22m3 7.03 7.00 6.32 6.87 6.85 5.92m4 10.77 8.28 7.90 8.29 6.89 6.91m5 2.70 2.44 1.80 1.62 0.63 1.19m6 8.43 7.44 6.53 7.53 5.47 4.72m7 7.18 5.88 4.83 6.86 4.88 3.65m8 1.83 3.01 2.40 1.80 1.21 1.19

Female 6.08 3.66 2.89 3.50 2.48 1.92

Male 5.65 4.80 4.17 4.89 3.86 3.40

Average 5.87 4.23 3.53 4.20 3.17 2.66

5.3. The results

The veri"cation was tested in the text-dependentmode. Since both the geometric mean and the fuzzyfusion methods operate on the principle of integrationand depend on the size of the cohort set, we thereforecompare the performances of these two methods. This isa closed set test as the cohort speakers in the trainig arethe same as those in the testing. For the purpose ofcomparison and due to a limited number of speakers, weselect for each claimed speaker a cohort set of three (samegender) whose acoustic models are closest to the claimedmodel. In the testing mode, each cohort speaker's 160 testtokens (10 utterances]8 testing sessions]2 repeti-tions) are tested against each claimed speakers' 10-wordmodels.

To identify the fuzzy densities for the cohort speakers,we select the values of a by means of the training data.The range of a was speci"ed to be from 1 to 50, and a unitstep size was applied in the incremental trial process. Itwas observed that using di!erent values of a for di!erentspeakers could give more reduction in the equal errorrates. However, as an intial investigation we chose thesame value for each gender set, that is a"10 for the

female cohort set and a"1 for the male cohort set. Asa result, Table 2 shows the mean equal-error rates for the16 speakers with three codebook sizes of 32, 64 and 128entries. For the veri"cation of the female speakers, usingthe fuzzy fusion the average EERs are reduced to(6.08!3.50)"2.58%, (3.66!2.48)"1.18%, (2.89!1.92)"0.97% for the codebook sizes of 32, 64 and 128,respectively, For the model of the male speakers, theaverage EERs are reduced to (5.65!4.89)"0.76%,(4.80!3.86)"0.94%, (4.17!3.40)"0.77% for thecodebook sizes of 32, 64 and 128, respectively, The totalaverage EER reductions in both models for the threecodebook sizes of 32, 64 and 128 are (5.87!4.20)"1.67%, (4.23!3.17)"1.06%, (3.53!2.66)"0.87%,respectively.

Through these results, it can be seen that the speakerveri"cation system using the fuzzy fusion is more favor-able than using the geometric mean method.

6. Conclusions

A fusion algorithm based on the fuzzy integral hasbeen proposed and implemented in the similarity nor-malization for speaker veri"cation. Then the experi-mental results show that the application of the proposedmethod is superior to that of the conventional normaliz-ation. The key di!erence between the two methods is thatthe assumption of equal a priori probabilities is notnecessary for the fuzzy integral-based normalization dueto the concept of the fuzzy measure. In fact, applicationsof fuzzy measures and fuzzy integrals have been attract-ing great attention among researchers in the "eld ofpattern recognition [21}24]. Two useful aspects of fuzzymeasures are that the importance and interaction of fea-tures are taken into account, and fuzzy integrals serve asa basis for modeling these representations [25]. For thisproblem of speaker recognition, we interpret the import-ance of features as the similarity between the acousticmodels of cohort and client speakers. There are threekinds of interaction: redundancy, complementarity, andindependency. The "rst interaction is meant by that thescoring of the cohort models do not increase signi"cantlyif the joint similarity is not greater than the sum ofindividual similarities. The second type is the converse,that is the scoring is increased signi"cantly when the jointsimilarity is greater than the sum of individual similarit-ies. The last type indicates that each similarity measurecontributes to the total scoring process.

The complexity involves in the proposed method is thedetermination of the fuzzy densities and the computationof the fuzzy integrals that require more computationale!ort than the conventional method. However, the di!er-ence in computer running time between the two methodswas found to be negligible. One important issue arisinghere for further investigation is the optimal identi"cation

T. Pham, M. Wagner / Pattern Recognition 33 (2000) 309}315 313

Page 6: Similarity normalization for speaker verification by fuzzy fusion

of the fuzzy densities, which can o!er #exibility and havegreat e!ect in the fuzzy fusion. At present, the fuzzydensities were determined based on a rough estimate ofthe values for a using a small range of integers. Oneconvenient and promising method for "nding such asolution is the optimizing process of the genetic algo-rithms which are random-search algorithms based on theprinciples of natural genetics, and have attracted greatattention as function optimizers. Using genetic algo-rithms, the fuzzy densities can be identi"ed in such a waythat the error for the training data is minimized in theleast-squares sense. Some typical similar problems indata fusion which have been sucessfully tackled by gen-etic algorithms can be found in Ref. [26,27].

References

[1] S. Furui, An overview of speaker recognition technology,Proceedings of Workshop on Automatic Speaker Recogni-tion, Identi"cation and Veri"cation, Martigny, Switzer-land, 1994, pp. 1}9.

[2] J.P. Campbell, Speaker recognition: a tutorial, Proc. IEEE85 (1997) 1437}1462.

[3] G.R. Doddington, Speaker recognition evaluation meth-odology } an overview and perspective, Proceedings ofWorkshop on Speaker Recognition and its Commercialand Forensic Applications (RLA2C), Avignon (France),1998, pp. 60}66.

[4] B.S. Atal, E!ective of linear prediction characteristics ofspeech wave for automatic speaker identi"cation and veri-"cation, J. Acoust. Soc. Am. 55 (1974) 1304}1312.

[5] S. Furui, Cepstral analysis techniques for automaticspeaker veri"cation, IEEE Trans. Acoust. Speech SignalProcess. 29 (1981) 254}272.

[6] A.L. Higgins, L. Bahler, J. Porter, Speaker veri"cationusing randomnized phrase prompting, Digital Signal Pro-cessing 1 (1991) 89}106.

[7] T. Matsui, S. Furui, Concatenated phoneme models fortext variable speaker recognition, Proceedings of IEEEInternational of Conference Acoustics, Speech, and SignalProcessing, Minneapolis, USA, 1993, pp. 391}394.

[8] G. Gravier, G. Chollet, Comparison of normalization tech-niques for speaker veri"cation, Proceedings of Workshopon Speaker Recognition and its Commercial and ForensicApplications (RLA2C), Avignon, France, 1998, pp. 97}100.

[9] C.S. Liu, H.C. Wang, C.H. Lee, Speaker veri"cation usingnormalization log-likelihood score, IEEE Trans. SpeechAudio Process. 4 (1996) 56}60.

[10] L.A. Zadeh, Fuzzy sets, Inform. and Controls 8 (1965)338}353.

[11] M. Sugeno, Fuzzy measures and fuzzy integrals } a survey,in: M.M. Gupta, G.N. Saridis, B.R. Gaines (Eds.), FuzzyAutomata and Decision Processes, North-Holland,Amsterdam, 1977, pp. 89}102.

[12] Z. Wang, G.J. Klir, Fuzzy Measure Theory, Plenum Press,New York, 1992.

[13] M. Grabisch, II.T. Nguyen, E.A. Walker, Fundamentals ofUncertainty Calculi with Applications to Fuzzy Inference,Kluwer Academic Publishers, Dordrecht, Netherland,1995.

[14] G. Banon, Distinction between several subsets of fuzzymeasures, Fuzzy Sets and Systems 5 (1981) 291}305.

[15] K. Leszczynski, P. Penczek, W. Grochulski, Sugeno's fuzzymeasure and fuzzy clustering, Fuzzy Sets and Systems 15(1985) 147}158.

[16] H. Tahani, J.M. Keller, Information fusion in computervision using the fuzzy integral, IEEE Trans. Systems ManCybernet 20 (1990) 733}741.

[17] T. Murofushi, M. Sugeno, An interpretation of fuzzymeasure and the Choquet integral as an integral withrespect to a fuzzy measure, Fuzzy Sets and Systems 29(1989) 201}227.

[18] T.D. Pham, H. Yan, A kriging fuzzy integral, Inform. Sci.98 (1997) 157}173.

[19] S.B. Cho, On-line handwriting recognition with neural-fuzzy method, Proc. IEEE FUZZ-IEEE/IFES'95,Yokohama, Japan, 1995, pp. 1131}1136.

[20] Y. Linde, A. Buzo, R.M. Gray, An algorithm for vectorquantization, IEEE Trans. Commun. 28 (1980) 84}95.

[21] J.M. Keller, P. Gader, H. Tahani, J.H. Chiang, M.Mohamed, Advances in fuzzy integration for patternrecognition, Fuzzy Sets and Systems 65 (1994)273}283.

[22] M. Grabisch, J.M. Nicolas, Classi"cation by fuzzy integral:performance and tests, Fuzzy Sets and Systems 65 (1994)255}271.

[23] S.B. Cho, J.H. Kim, Combining multiple neural networksby fuzzy integral for robust classi"cation, IEEE Trans.Systems Man Cybernet 25 (1995) 380}384.

[24] Z. Chi, H. Yan, T. Pham, Fuzzy Algorithms with Applica-tions in Image Processing and Pattern Recognition, WorldScienti"c, Singapore, 1996.

[25] M. Grabisch, The representation of importance and inter-action of features by fuzzy measures, Pattern RecognitionLett. 17 (1996) 567}575.

[26] A.L. Buczak, R.E. Uhrig, Information fusion by fuzzy setoperation and genetic algorithms, Simulation 65 (1995)52}66.

[27] T.D. Pham, H. Yan, Fusion of handwritten numeralclassi"ers based on fuzzy and genetic algorithms,Proceedings North America Fuzzy Information Process-ing Society (NAFIPS)'97, New York, USA, 1997, pp.257}262.

About the Author*TUAN D. PHAM received the B.E. degree (1990) in Civil Engineering from the University of Wollongong, the Ph.D.degree (1995) in Civil Engineering, with a thesis on fuzzy-set modeling in the "nite element analysis of engineering problems, from theUniversity of New South Wales. From 1994 to 1995, he was a senior systems analyst with Engineering Computer Services Ltd, and from1996 to early 1997 he was a post-doctoral fellow with the Laboratory for Imaging Science and Engineering in the Department ofElectrical Engineering at the University of Sydney. From 1997 to 1998 he held a research fellow position with the Laboratory forHuman-Computer Communication in the Faculty of Information Sciences and Engineering at the University of Canberra, and he is

314 T. Pham, M. Wagner / Pattern Recognition 33 (2000) 309}315

Page 7: Similarity normalization for speaker verification by fuzzy fusion

now a lecturer in the School of Computing in the same Faculty. He is a co-author of 2 monographs, author and co-author of over 40technical papers published in popular journals and conferences. His main research interests include the applications of computationalintelligence and statistical techniques to pattern recognition, particularly in image processing, speech and speaker recognition. Dr. Phamis a member of the IEEE.

About the Author*MICHAEL WAGNER received a Diplomphysiker degree from the University of Munich in 1973 and a Ph.D. inComputer Science from the Australian National University in 1979 with a thesis on learning networks for speaker recognition.Dr. Wagner has been involved in speech and speaker recognition research since and has held research and teaching positions at theTechnical University of Munich, National University of Singapore, University of Wollongong, University of New South Wales and theAustralian National University. He was the Foundation President of the Australian Speech Science and Technology Association from1986 to 1992 and is currently a professor and head of the School of Computing at the University of Canberra. Dr. Michael Wagner isa fellow of IEAust and a member of ASSTA, ESCA and IEEE.

T. Pham, M. Wagner / Pattern Recognition 33 (2000) 309}315 315