Ambiguity reduction in speaker identification by the relaxation labeling process

6
Pattern Recognition 32 (1999) 1249}1254 Ambiguity reduction in speaker identi"cation by the relaxation labeling process Tuan Pham*, Michael Wagner Faculty of Information Sciences and Engineering, University of Canberra , ACT 2601, Australia Received 25 November 1997; received in revised form 28 August 1998; accepted 28 August 1998 Abstract A nonlinear probabilistic model of the relaxation labeling (RL) process is implemented in the speaker identi"cation task in order to disambiguate the labeling of the speech feature vectors. In this proposed algorithm, the deterministic labeling of the vector quantization (VQ)-based speaker identi"cation is relaxed by means of introducing initial probabilistic weights to the labeling process of the speech feature vectors. This process is then iteratively updated until no further signi"cant improvement is found. Experimental results on speaker identi"cation using a commercial speech corpus show that the relaxation labeling outperforms the conventional VQ method. ( 1999 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Relaxation labeling; Nonlinear probabilistic model; Speaker identi"cation; Speaker-based VQ codebook 1. Introduction Speaker recognition is one of the challenging areas of speech research and has many applications including telecommunications, robotics, security systems, database management, command and control, and others. Speaker recognition is a generic term which refers to the classi"ca- tion of speakers based on their speech characteristics. This general task can be subdivided into two categories: speaker identi,cation and speaker veri,cation. Speaker identi"cation is to assign an unknown or unlabeled voice token of a speaker to one of the reference speakers based on the closet measure of similarity, whereas speaker veri"cation is aimed to either accept or reject the identity of an unlabeled voice token claiming to belonging to a speci"c reference speaker. The key di!erence between speaker identi"cation and speaker veri"cation is the *Corresponding author. Tel.: 00 612 6201 2394; Fax: 00 612 6201 5231; E-mail: tuanp@ise.canberra.edu.au number of decision alternatives. The number of decision in speaker identi"cation is equal to the population size, and therefore its performance decreases with increasing population size. Speaker veri"cation, however, is una!ec- ted by the population size, as there are only two possible decisions based on an acceptance/rejection threshold. Both identi"cation and veri"cation categories are also involved in text-dependent and text-independent speech tokens. In a text-dependent speaker recognition, the speech tokens used for both training and testing have the same text; whereas in a text-independent mode, speech tokens in both training and testing do not subject to a speci"c text. Therefore, speaker recognition in a text-independent mode is more di$cult than that in a text-dependent mode since the recognizer must deal with more variabil- ity in the unlabeled and the reference tokens. In general, speaker veri"cation task is suitable for most applications due to its tractability. However, speaker identi"cation remains active in research as it is suitable for evaluating the performance of di!erent recognizers [1} 4]. 0031-3203/99/$20.00 ( 1999 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 8 ) 0 0 1 3 4 - 4

Transcript of Ambiguity reduction in speaker identification by the relaxation labeling process

Page 1: Ambiguity reduction in speaker identification by the relaxation labeling process

Pattern Recognition 32 (1999) 1249}1254

Ambiguity reduction in speaker identi"cationby the relaxation labeling process

Tuan Pham*, Michael Wagner

Faculty of Information Sciences and Engineering, University of Canberra , ACT 2601, Australia

Received 25 November 1997; received in revised form 28 August 1998; accepted 28 August 1998

Abstract

A nonlinear probabilistic model of the relaxation labeling (RL) process is implemented in the speaker identi"cationtask in order to disambiguate the labeling of the speech feature vectors. In this proposed algorithm, the deterministiclabeling of the vector quantization (VQ)-based speaker identi"cation is relaxed by means of introducing initialprobabilistic weights to the labeling process of the speech feature vectors. This process is then iteratively updated until nofurther signi"cant improvement is found. Experimental results on speaker identi"cation using a commercial speechcorpus show that the relaxation labeling outperforms the conventional VQ method. ( 1999 Pattern Recognition Society.Published by Elsevier Science Ltd. All rights reserved.

Keywords: Relaxation labeling; Nonlinear probabilistic model; Speaker identi"cation; Speaker-based VQ codebook

1. Introduction

Speaker recognition is one of the challenging areas ofspeech research and has many applications includingtelecommunications, robotics, security systems, databasemanagement, command and control, and others. Speakerrecognition is a generic term which refers to the classi"ca-tion of speakers based on their speech characteristics.This general task can be subdivided into two categories:speaker identi,cation and speaker veri,cation. Speakeridenti"cation is to assign an unknown or unlabeled voicetoken of a speaker to one of the reference speakers basedon the closet measure of similarity, whereas speakerveri"cation is aimed to either accept or reject the identityof an unlabeled voice token claiming to belonging toa speci"c reference speaker. The key di!erence betweenspeaker identi"cation and speaker veri"cation is the

*Corresponding author. Tel.: 00 612 6201 2394; Fax: 00 6126201 5231; E-mail: [email protected]

number of decision alternatives. The number of decisionin speaker identi"cation is equal to the population size,and therefore its performance decreases with increasingpopulation size. Speaker veri"cation, however, is una!ec-ted by the population size, as there are only two possibledecisions based on an acceptance/rejection threshold.Both identi"cation and veri"cation categories are alsoinvolved in text-dependent and text-independent speechtokens. In a text-dependent speaker recognition, the speechtokens used for both training and testing have the sametext; whereas in a text-independent mode, speech tokensin both training and testing do not subject to a speci"ctext. Therefore, speaker recognition in a text-independentmode is more di$cult than that in a text-dependentmode since the recognizer must deal with more variabil-ity in the unlabeled and the reference tokens. In general,speaker veri"cation task is suitable for most applicationsdue to its tractability. However, speaker identi"cationremains active in research as it is suitable for evaluatingthe performance of di!erent recognizers [1}4].

0031-3203/99/$20.00 ( 1999 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.PII: S 0 0 3 1 - 3 2 0 3 ( 9 8 ) 0 0 1 3 4 - 4

Page 2: Ambiguity reduction in speaker identification by the relaxation labeling process

There are a number of techniques for speaker identi-"cation such as the speaker-based VQ codebook ap-proach, dynamic time warping, second-order statisticalmethod, discrete hidden Markov models, neural net-works, and others [2}3,5}7]. Among these, the VQ code-book approach is one of the most popular methodsimplemented in many speaker recognition systems as itrelieves the computational complexity and o!ers goodperformance. While the VQ approach has been com-monly used for speech and speaker recognition, it is notalways e!ective because the ambiguity inherently existingin the labeling of speech input tokens is treated in anin#exible way by its deterministic rules. Basing our mo-tivation on this reason, we propose an improved algo-rithm over the speaker-based VQ codebook approachusing the relaxation labeling in which the deterministicclassi"cation of the VQ-based approach is only an initialprocess of the probabilistic labeling. Results from thisinitial labeling will then be updated until convergenceis reached. Relaxation labeling was "rst introduced byRosenfeld et al. [8] to solve problems in pattern recogni-tion. It is a parallel algorithm that updates the probabilit-ies of labels or classes by using interactive informationbetween unknown objects with respect to the referencelabels to reduce uncertainty among labels having inter-changing properties. Since being "rst developed to tackleproblems in image analysis, its #exible framework hasattracted many researchers in the broad "eld of patternrecognition [9}13].

In this paper we apply the framework of relaxationlabeling to dealing with the VQ-based text-dependentspeaker identi"cation. The rest of this paper is organizedas follows. In Section 2 we brie#y review the standardspeaker-based VQ codebook approach, then describe theprocedures of relaxation labeling and how the task ofspeaker identi"cation can be modeled with its algorithmin Sections 3 and 4, respectively. Experiments aregiven in Section 5 to compare the performance of theproposed approach with the conventional VQ-basedmethod. Finally, conclusions about the present workand suggestion for further investigation are made inSection 6.

2. Speaker-based VQ codebook approach

For speaker identi"cation based on VQ codebookapproach [4], the codebook for each speaker is gene-rated by partitioning a set of training feature vectorsMv

1, v

2,2 , v

TN into the feature vector space

MS1, S

2,2 , S

JN, and each partition S

jis represented by

a centroid vector bj. Thus, there are N codebooks gener-

ated for N speakers. In the testing phase, the distortionbetween a set of testing feature vectors Mv

1, v

2,2 , v

LN

and each codebook is to be measured, then an average

distortion Dito the ith codebook is taken, that is

Di"

1

¸

L+l/1

minj

d (vl, b

j), j"1, 2,2 , J, (1)

where d (vl, b

j) is the distortion measure (usually a Euclid-

ean distance) between two vector vland b

j.

The recognized speaker i* is then decided by taking theminimum of the N resultant average distortion measures:

i*"arg mini

Di, i"1, 2,2 , N. (2)

3. Relaxation Labeling

The relaxation labeling algorithms consist of fourmodels when "rst introduced by Rosenfeld et al. [8] }discrete, fuzzy, linear probabilistic, and nonlinear prob-abilistic models. However, the last model o!ers the bestperformance for classi"cation problems and due to thisreason, only the nonprobabilistic relaxation model isstudied here for the speaker identi"cation. Its algorithmis described as follows.

Let a set of objects A"Ma1, a

2,2 , a

nN and a set of

labels ""Mj1, j

2,2, j

mN. An initial probability is given

to each object aihaving each label j, which is denoted as

pi(j). These probabilities satisfy the following condition:

+j|"

pi(j)"1, ∀a

i3A, 0)p

i(j))1. (3)

The relaxation labeling updates the probabilities pi(j)

in Eq. (3) using a set of compatibility coe$cients rij(j, j@),

where rij(j, j@ ) : "]">[!1, 1], whose magnitude de-

notes the strength of compatibility. The meaning of thesecompatibility coe$cients can be interpreted as follows:

rij(j, j@)G

(0: j, j@ are incompatible for ai

and aj,

"0: j, j@ are independent for ai

and aj,

'0: j, j@ are compatible for ai

and aj.

(4)

The updating factor for the estimate pi(j) at kth iter-

ation is

q(k)i

(j)"+j

dijC+j@

rij(j, j@ ) p(k)

j(j@)D , (5)

where dij

are the parameters that weight the contribu-tions to a

icoming from its neighbors a

j, and subject to

+j

dij"1. (6)

The updated probability p(k`1)i

(j) for object aiis given

by

p(k`1)i

(j)"p(k)i

(j) [1#q(k)i

(j)]

+jp(k)i

(j) [1#q(k)i

(j)]. (7)

1250 T. Pham, M. Wagner / Pattern Recognition 32 (1999) 1249}1254

Page 3: Ambiguity reduction in speaker identification by the relaxation labeling process

Thus, the iterative process given by Eqs. (5) and (7)establish the relaxation labeling, and it is stopped whenconvergence is achieved. It now becomes clear that fora successful performance of the relaxation process, theinitial label probabilities and the compatibility coe$-cients need to be well determined. Wrong estimates ofthese parameters will lead to algorithmic instabilities.

Two possible methods for computing the compatibilitycoe$cients are based on those developed by Peleg andRosenfeld [11]. The two methods employ the concepts ofstatistical correlation and mutual information. The cor-relation-based estimate of the compatibility coe$cients isde"ned by

rij(j, j@)"

+i[p

i(j)!pN (j)] [p

j(j@ )!pN (j@ )]

p (j)p(j@), (8)

where pj(j@) is the probability of a

jhaving label j@, and

ajbe the neighbors of a

i, pN (j@) is the mean of p

j(j@ ) for all

aj, and p(j@ ) is the standard deviation of p

j(j@). To allevi-

ate the e!ect of dominance among labels, the modi"edcoe$cients are

r*ij(j, j@)"[1!pN (j)][1!pN (j@)] r

ij(j, j@). (9)

The mutual-information-based estimate of the com-patibility coe$cients is

rij(j, j@)"log

n+ipi(j) p

j(j@ )

+ipi(j) +

ipi(j@)

, (10)

where, for the present problem, n is the number of featurevectors of an unknown speaker. The compatibility coe$-cients in Eq. (10) are divided by 5 in order to take valuesin the range [!1, 1].

4. Relaxation labeling for speaker identi5cation

For the classi"cation of speech samples from an un-known speaker to the best "t out of a population ofspeakers, some sets of feature vectors characterizing thevariabilities of di!erent speakers are likely to overlap;therefore, in the spirit of relaxation labeling, each featurevector is considered as an object a

i3A where A is a set of

feature vectors of a speaker, and each speaker is con-sidered as a label j in the speaker population ". We willdiscuss how to estimate the initial probabilities, how toe!ectively implement the relaxation labeling process byRosenfeld et al. [8] for the problem of speaker identi"ca-tion, and we also outline this proposed algorithm in thefollowing subsections.

4.1. Estimation of initial probabilities

Using the VQ distortion measures, the initial probabil-ity that expresses the local measurement of a vector

aibelonging to a speaker j can be estimated as

pi(j)(0)"

exp(!Dij)

+j exp(!Dij)

(11)

in which Dij is the minimum distortion measure between

aiand the set of codewords of speaker j, that is

Dij"min

k

[D(ai, b

k(j)], k"1, 2,2 , K,

where K is the codebook size.

4.2. Implementation of the RL process for speakeridentixcation

In the case of image analysis, it is important to con-sider the con"dence contributions from pixels lying in theneighborhood of a pixel, as its m-connected neighboringpixels may belong to di!erent regions. Therefore, theresulting weight of the pixel is strongly a!ected by thecon"dence weights of its neighborhood. However, forspeech analysis, it is known that the set of speech featurevectors Ma

iN (each vector a

iis equivalent to an image

pixel) is to belong to a certain speaker j. Therefore, thereis no need to consider the contributions of its adjacentvectors. On the other hand, we only consider the com-patibility of the vector Ma

iN itself with respect to speaker

j and speaker j@. With this argument, the original com-patibility coe$cients as de"ned in Eq. (4) can be ex-pressed in another form as

rii(j, j@)"r

i(j, j@) G

(0 : j, j@ are incompatible for ai,

"0 : j, j@ are independent for ai,

'0 : j, j@ are compatible for ai.

(12)

Following the above reason, the updating factor forthe estimate p

i(j) at kth iteration is rewritten as follows:

q(k)i

(j)"C+j@ri(j, j@) p(k)

i(j@)D , (13)

where the summation of dij

as de"ned in Eq. (5) is nowomitted as the contributions from the adjacent vectorsare not considered.

We assume that the majority of the speech featurevectors a

iwell belong to the speaker j, i.e. the amount of

the feature vectors having overlapping properties is lessthan that of the feature vectors having more distinctiveproperties. If this assumption is true, then the compatibil-ity coe$cients r

i(j, j@) tend to be negative as j and j@ are

incompatible for MaiN. This also leads to a negative value

for the updating factor q(k)i

(j) in Eq. (13), which is de"nedin terms of the compatibility coe$cients. From thisstandpoint, if Eq. (7) is used for updating the probability,then the con"dence for a distinctive or overlapping vec-tor a

ibelonging to the speaker j will be decreased or

T. Pham, M. Wagner / Pattern Recognition 32 (1999) 1249}1254 1251

Page 4: Ambiguity reduction in speaker identification by the relaxation labeling process

increased instead of being increased or decreased, respec-tively. Therefore, the plus sign in Eq. (7) should becomea minus sign, that is

p(k`1)i

(j)"p(k)i

(j) [1!q(k)i

(j)]

+j p(k)i

(j) [1!q(k)i

(j)]. (14)

We rewrite the computations of the compatibility coef-"cients according to Eq. (12) as follows. For the correla-tion-based estimate of the compatibility coe$cients:

ri(j, j@)"

+i[p

i(j)!pN (j)] [p

i(j@)!pN (j@)]

p (j)p(j@ ), (15)

where pi(j@) is the probability of a

ihaving label j@, pN (j@)

is the mean of pi(j@) for all a

i, and p(j@) is the standard

deviation of pi(j@ ). And the modi"ed coe$cients becomes

r*i(j, j@)"[1!pN (j)] [1!pN (j@)] r

i(j, j@). (16)

Finally, the mutual-information-based estimate of thecompatibility coe$cients is now rewritten as

ri(j, j@ )"log

n+ipi(j) p

i(j@)

+ipi(j) +

ipi(j@ )

. (17)

4.3. Outline of relaxation labeling algorithm for speakeridentixcation problem

The relaxation labeling for speaker identi"cation canbe outlined as follows:

1. Estimate the initial probabilities for each unknownfeature vector using Eq. (11).

2. Compute the compatibility coe$cients using eitherEq. (16) or Eq. (17).

3. Calculate the updating factor de"ned in Eq. (13).4. Update the probabilities for each unknown vector

using Eq. (14).5. Repeat steps 3 and 4 until the change of the probabilit-

ies is less than a prescribed threshold or equal to aprescribed number of iterations.

6. The total probability of the set of vectors for eachspeaker j is P(j)"%

ipi(j).

7. Deterministic identi"cation for the best guessedspeaker j* can now be carried out using the followingclassi"cation rule:

j*"arg maxj

P (j)

5. Experiments on speaker identi5cation

Both VQ codebook approach and relaxation labeling(RL) are simulated and tested with a set of computercommands from the TI46 speech data corpus. The TI46corpus contains 46 utterances spoken repeatedly by eightfemale and eight male speakers, labeled f1}f8 andm1}m8, respectively. The vocabulary contains a set of 10

Table 1Speaker identi"cation rates (%) by VQ approach and RLalgorithms with codebook size of 32

Speaker VQ RL1 RL2

f1 95.62 95.62 93.12f2 98.75 98.75 98.75f3 84.38 87.50 83.12f4 98.75 98.12 98.75f5 100 99.38 99.38f6 98.75 98.12 97.50f7 95.62 91.88 90.00f8 96.25 97.50 96.88m1 79.61 90.79 91.45m2 78.12 94.38 96.88m3 99.36 100 99.36m4 93.55 94.84 91.61m5 96.18 94.90 95.54m6 40.88 69.18 83.65m7 48.12 76.25 83.12m8 55.62 77.50 72.50

Average 84.98 91.55 91.98

computer commands: Menter, erase, go, help, no, rubout,repeat, stop, start, yesN. Each speaker repeated the words10 times in a single training session, and then again twicein each of eight testing sessions. The corpus is sampled at12 500 samples/s and 12 bits/sample. The data were pro-cessed in 20.48 ms frames at a frame rate at 125 frames/s.The frames were Hamming windowed and preempha-sized with k"0.9. 46 mel-spectral bands of a width of110 mel and 20 mel-frequency cepstral coe$cients(MFCC) were determined for each frame. In the trainingsession, using the LBG algorithm [14], each speaker's100 training tokens (10 utterances]1 training session]10 repetitions) were used to train the speaker-basedVQ codebook by clustering the set of all the speakers'MFCC into codebooks of 32, 64 and 128 codewords.

The speaker identi"cation was tested in the text-dependent mode. Each speaker's 160 test tokens (10utterances]8 testing sessions]2 repetitions) were testedagainst all speakers' 10-word models. For the codebookof 32 entries, the average recognition rates for speakeridenti"cation are shown in Table 1, where the totalaverage rate for: VQ"84.98%, RL1"91.55% (RL us-ing correlation-based compatibility coe$cients be de-noted as RL1), and RL2"91.98% (RL using mutual-information-based compatibility coe$cients be denotedas RL2). For the codebook of 64 entries, the averagerecognition rates for speaker identi"cation are (Table 2):VQ"89.00%, RL1"94.03%, and RL2"94.26%. Fi-nally, for the codebook of 128 entries, the average recog-nition rates for speaker identi"cation are (Table 3):VQ"91.28 %, RL1"96.02 %, and RL2"96.65 %.

1252 T. Pham, M. Wagner / Pattern Recognition 32 (1999) 1249}1254

Page 5: Ambiguity reduction in speaker identification by the relaxation labeling process

Table 2Speaker identi"cation rates (%) by VQ approach and RLalgorithms with codebook size of 64

Speaker VQ RL1 RL2

f1 95.62 97.50 96.25f2 100 99.38 98.75f3 91.25 88.12 86.25f4 99.38 99.38 99.38f5 100 100 100f6 100 100 99.38f7 96.25 96.25 94.38f8 98.75 100 99.38m1 76.97 92.76 92.76m2 88.75 96.25 97.50m3 99.36 100 99.36m4 98.06 98.71 97.42m5 98.09 96.18 95.54m6 40.88 74.84 84.91m7 73.75 86.88 90.62m8 66.88 78.12 76.25

Average 89.00 94.03 94.26

Table 3Speaker identi"cation rates (%) by VQ approach and RLalgorithms with codebook size of 128

Speaker VQ RL1 RL2

f1 97.50 98.75 98.75f2 100 100 100f3 94.38 94.38 94.38f4 100 100 100f5 100 100 100f6 100 100 100f7 98.12 96.88 98.75f8 98.75 99.38 99.38m1 78.29 92.76 93.42m2 90.62 97.50 97.50m3 99.36 100 100m4 99.35 99.35 98.71m5 99.36 100 100m6 51.57 77.36 84.28m7 77.50 92.50 93.12m8 75.00 88.75 88.12

Average 91.28 96.10 96.65

It is observed that for the three codebook sizes bothVQ and RL methods give similar results when the recog-nition rates are high as in the case of the female speakers(f1}f8). However, both RL1 and RL2 signi"cantly im-prove the results when the VQ approach yields the lowrecognition rates as it can be seen in the case of the malespeakers (m1, m2, m6}m8). As the relaxation labelingtechnique was "rst proposed to deal with ambiguity andnoise in computer vision systems, better results can beobtained by revising the inconsistencies between objectshaving interchanging properties. For the present prob-lem, possible reasons for the relatively highly improved

results in the set of the male speakers are that the maleacoustic feature vectors may be subjected to more noiseor/and the degree of overlapping between the acousticproperties of the male speakers is more than that of thefemale speakers. Moreover, when high identi"cationrates for both set of speakers are already given by the VQthen the power of the RL may reach its limit for updatinglabel inconsistencies.

In comparison between the relaxation labeling usingthe correlation-based (RL1) and the mutual-information-based (RL2) compatibility functions, the total averageidenti"cation results for the three codebook sizes ob-tained from RL2 are only slightly higher than those fromRL1. Tables 1}3 also show the individual identi"cationrates for the 16 speakers with three di!erent codebooksizes using the VQ and the relaxation labeling (RL1 andRL2) algorithms. The convergence of both RL1 and RL2are obtained about 25 iterative steps, therefore the com-puter running time is not a problem for implementation.Based on the present experimental results, it can beconcluded that speaker identi"cation using the relax-ation labeling outperforms the VQ approach.

6. Conclusions

A relaxation labeling algorithm has been presented forsolving classi"cation problem in the speaker identi"ca-tion task. The #exibility embedded in the framework ofrelaxation labeling as well as the improved experimentalresults appear to be promising as a new approach forspeech research including the task of speaker veri"cationas the convergence property of the relaxation processeswill certainly ease the veri"cation decision based on anaccept/reject threshold, i.e. improves the speaker separ-ability. This is also under our present investigation. Evensuch promising results have been presented, what hasbeen discussed here is an early step of applying therelaxation algorithms to speaker recognition, thereforefurther study with other proposed relaxation methods[9,15}17] should be encouraged in order to fully explorethe power of the relaxation labeling that can o!er to the"eld of speech and speaker recognition.

7. Summary

A nonlinear probabilistic relaxation labeling forspeaker identi"cation is presented in this paper. Thisrelaxation scheme, which is an iterative and parallelprocess, o!ers a #exible and e!ective framework for deal-ing with uncertainty inherently existing in the labelingof the speech feature vectors. Basic concepts andformulations of the relaxation algorithms are outlined.We then discuss how to model the relaxation scheme tothe labeling of the speech feature vectors for the speaker

T. Pham, M. Wagner / Pattern Recognition 32 (1999) 1249}1254 1253

Page 6: Ambiguity reduction in speaker identification by the relaxation labeling process

About the Author*TUAN D. PHAM received the B.E. degree (1990) in civil engineering from the University of Wollongong, and thePh.D. degree (1995) in civil engineering, with a thesis on fuzzy-set modeling in the "nite element analysis of engineering problems, fromthe University of New South Wales. From 1994 to 1995, he was a senior systems analyst with Engineering Computer Services Ltd, andfrom 1996 to early 1997 he was a post-doctoral fellow with the Laboratory for Imaging Science and Engineering in the Department ofElectrical Engineering at the University of Sydney. He is now a research fellow and lecturer in the Faculty of Information Sciences andEngineering at the University of Canberra. He has published many journal papers in the areas of fuzzy systems, geomechanics,mathematical geology, speech and image processing, and information fusion. He is coauthor of the mini-monograph Fuzzy ¸ogicApplied to Numerical Modelling of Engineering Problems (North-Holland, 1995) in the series of Computational Mechanics Advances withS. Valliappan, and the book Fuzzy Algorithms:=ith Applications to Image Processing and Pattern Recognition (World Scienti"c, 1996)with Z. Chi and H. Yan. His main research interests include the applications of computational intelligence and statistical techniques toimage processing, speech and speaker recognition. Dr. Pham is a member of the IEEE.

About the Author*MICHALE WAGNER received a Diplomphysiker degree from the University of Munich in 1973 and a Ph.D. inComputer Science from the Australian National University in 1979 with a thesis on learning networks for speaker recognition.Dr Wagner has been involved in speech and speaker recognition research since and has held research and teaching positions atthe Technical University of Munich, National University of Singapore, University of Wollongong, University of New South Wales andthe Australian National University. He was the Foundation President of the Australian Speech Science and Technology Associationfrom 1986 to 1992 and is currently a professor and head of the School of Computing at the University of Canberra. Dr. Michael Wagneris a fellow of IEAust and a member of ASSTA, ESCA and IEEE.

identi"cation task. The implementation is tested on acommercial speech corpus TI46. The results usingseveral codebook sizes obtained from the proposed ap-proach are more favorable than those from the conven-tional VQ (vector quantization)-based method.

Acknowledgements

The authors would like to thank Dat Tran for hisassistance in computer programming, and the referees fortheir constructive comments.

References

[1] G.R. Doddington, Speaker recognition } identifyingpeople by their voices, Proc. IEEE 73 (1985) 1651}1664.

[2] S. Furui, An overview of speaker recognition technology,Proc. ESCA Workshop on Automatic Speaker Recogni-tion Identi"cation and Veri"cation, Martigny, Switzerland1994, pp. 1}9.

[3] J. He, L. Liu, G. Palm, A text-independent speaker identi-"cation system based on neural networks, Proc. Int. Conf.Spoken Language Processing (ICSLP94), Yokohama, Ja-pan, 1994, pp. 1851}1854.

[4] F.K. Soong, E. Rosenberg, B.H. Juang, L.R. Rabiner,A vector quantization approach to speaker recognition,AT & T Tech. J. 66 (1987) 14}26.

[5] D. Genoud, F. Bimbot , G. Gravier, G. Chollet, Combin-ing methods to improve speaker veri"cation decision,Proc. 4th Int. Conf. on Spoken Language Processing, PA,USA, 1996, pp. 1756}1759.

[6] E.S. Parris, M.J. Carey, Discriminative phonemes forspeaker identi"cation, Proc. Int. Conf. Spoken LanguageProcessing (ICSLP94), Yokohama, Japan, 1994, pp.1843}1846.

[7] M. Wagner, Combined speech-recognition/speaker-veri"-cation system with modest training requirements, Proc 6thAustralian Int Conf. on Speech Science and Technology,Adelaide, Australia, 1996, pp. 139}143.

[8] A. Rosenfeld, R.A. Hummel, S.W. Zucker, Scene labelingby relaxation operations, IEEE Trans. Systems ManCybernet. 6 (1976) 420}433.

[9] Q. Chen, J.Y.S. Luh, Ambiguity reduction by relaxationlabeling, Pattern Recognition 27 (1995) 1705}1722.

[10] Q. Chen, J.Y.S. Luh, Relaxation labeling algorithm forinformation integration and its convergence, Pattern Rec-ognition 28 (1995) 1705}1722.

[11] S. Peleg, A. Rosenfeld, Determining compatibility coe$-cients for curve enhancement relaxation processes, IEEETrans. Systems Man Cybernet. 8 (1978) 548}555.

[12] J.G. Postaire, S. Olejnik, A relaxation scheme for improv-ing a convexity based clustering method, Pattern Recogni-tion Lett. 15 (1994) 1211}1221.

[13] M. Sonka, V. Hlavac, R. Boyle, Image Processing, Analysisand Machine Vision, Chapman & Hall, Cambridge, 1993.

[14] Y. Linde, A. Buzo, R.M. Gray, An algorithm for vectorquantization, IEEE Trans. Commun. 28 (1980) 84}95.

[15] H. Ogawa, A fuzzy relaxation technique for partialshape matching, Pattern Recognition Lett. 15 (1994)349}355.

[16] L. Pelkowitz, A continuous relaxation labeling algorithmfor Markov random "elds, IEEE Trans. Systems ManCybernet. 20 (1990) 709}715.

[17] S.W. Zucker, E. Krishnamurthy, R. Haar, Relaxation pro-cesses for scene labeling: convergence, speed, and stability,IEEE Trans. systems Man and Cybernet. 8 (1978) 41}48.

1254 T. Pham, M. Wagner / Pattern Recognition 32 (1999) 1249}1254