Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer,...

Recognition of spoken and spelled proper names

Reporter : CHEN, TZAN HWEI

Author :Michael Meyer, Hermann Hild

2

Outline

Introduction

Experiments

Summary

3

Introduction

The recognition of increasingly large sets of spoken names is difficult : Very large recognition vocabularies contain

many easily confused words or even homophones.

In this paper, it compares the performance for proper name recognition when a name is spoken only, spelled only or both spoken and spelled.

4

Introduction (cont)

In what contexts do people speak and spell names

Table 1: Three scenarios for speaking and spelling a proper name

5

Experiments

Speech data : A database of about 2800 German last name

spoken by 57 different speaker, according to scenario 2.

Recorded with a close-talking microphone at a sampling rate of 16 kHz.

The boundaries between all spoken and spelled names were to identified to conduct scenario 1

6

Experiments (cont)

The pronunciation dictionary covers about half of the 2800 names of our speech data.

The set of 1337 spoken and spelled names is used all the experiments described below.

For experiments, we use a MS-TDNN as a specialized letter recognizer.

And use the LVCSR of the JANUS system as a spoken name recognizer.

7

Experiments (cont)

JANUS : 60.0% names correct were achieved on the

test set of the 1337 spoken last names.

To recognize the spelled name with JANUS, 93.3% correct names were achieved on the spelled names.

MS-TDNN Achieved 96.5% correct spelled names on the

test .

8

Experiments (cont)

Small list : We assume that the list of names to be

recognized is small enough, so that every name can be explicitly represented in the dictionary.

How can we combine the different information provided by the spoken and spelled names?

9

Experiments (cont)

After all, the pronunciations of the spelled letters represent in approximation the sounds of the letters in the fluently spoken words. “TOM” versus “T-O-M”

Exceptions are letters with unusual pronunciation and those letter combination which define their own pronunciation, such as : “Sch , ch”

In the following we will just combines the two representations on the basis of their acoustic scores only

10

Experiments (cont)

Scenario 1

0.96. at corret names 97.7% of rate nrecognitioa in

resultsit names, spelled theof list best -N thein foundalready were whichnames

for those computeonly is names spokenfluently for the list best -N theif However,

abserved. wasabsolute 0.5% ofimprovent ant insignific an 1, toclose factor, theAs

)1()(

scorenew itsby determined islist best -N combined thein name of position The

JANUS.by found as names spoken

fluently oflist best -N thein name same theof score thebe and ,recognizerletter

TDNN- MS theoflist best -N thein found name spelleda of score thebe Let

(i)Y

(i)Y

(i)Y(i)YiY

i

(i)Y

i(i)Y

L

F

FL

F

L

11

Experiments (cont)

Scenario 2 With 86.1% correct, the recognition on the entire

utterance is worse than on the spelled apart alone.

It is possible to adapt a similar approach as in scenario 1. The boundary of the first best hypothesis was used for

the weighting of all hypotheses. Resulting in a recognition rate of 89.1%

To incorporate the MS-TDNN letter recognizer, resulting in 95.8% recognition rate.

12

Experiments (cont)

Figure 1: % names correct for a –weighted combinationof the N-best list of spoken and spelled names (scenario 1 and 2)

13

Experiments (cont)

Large lists If the number of names exceeds the recognizer’s

maximum vocabulary size, a different approach has to be taken.

A two-step approach is employed. A coarse recognition run is used to get a reduced list of

name candidates. Then, these are processed in which all the previously

described techniques for small word lists can be applied.

14

Experiments (cont)

In the case of scenario 1, the list of candidates can be easily reduced if only the spelled names are considered in the first pass.

For scenario 2, only phonemes and letters in JANUS’s recognition vocabulary.

15

Experiments (cont)

For scenario 2 A special language model is employed.

A list of the most similar names can be retrieved, and then used in another JANUS recognition run.

The letter segments are then re-recognized with the MS-TDNN

16

Experiments (cont)

Table 3: Summary of results for the separated and combined recognition of fluently spoken and spelled last names

17

Summary

By combining the N-best lists of both the spoken and spelled recognition, the overall performance can be improved.

An input of either L or FL can be distinguish with almost 99% correct, resulting of 95.5% names correct without a priori knowing whether L or FL was spoken.

Caller Identification from Spelled-Out Personal Data Over the Telephone

Reporter : CHEN, TZAN HWEI

19

Outline

Introduction

The personal identification algorithm

Tests and results

Conclusions

20

Introduction

The problem of automatically identifying the caller in a telephone conversation from the information spoken in the call is extremely difficult.

The identification must take place despite rather substantial speech recognition errors that may be made by the machine.

21

Introduction (cont)

We can find a solution to the problem if we make two assumptions. We assume that there is a database of records

containing personal information about there the caller which can serve as a reference during the identification process.

We ask our caller to spell the personal identifying items so that the spoken vocabulary is small and we can look for correlations with the items in the database.

22

The personal identification algorithm

Fig 1. the algorithm of personal identification from spelled tokens

23

The personal identification algorithm (cont)

Bayesian computation which starts with an estimate, for each record in , of the probability that record represents the identity of the caller.

It uses the acquired information and updates each record’s probability that it corresponds to the current caller’s identity.

The incremental computation is

S

T

)(

)()|()|(

TP

recordPrecordTPTrecordP

24

The personal identification algorithm (cont)

Bayesian update of probabilities

)(

)(),(

)(

)()|(

)|()(

: follows as derived be can update The

)(

)(),()(

in record eachFor

in character eachFor

recognized been has that filed eachFor

ofcharacter

th thebe let and record cached theof filed th theof value thebe Let

ofcharacter th the

be let and ,recognized field th for the ASR by the returned value thebe Let

cached. been has that records ofset thebe Let

'

'

ij

ijij

ij

ijijij

ijij

ij

ijij

iij

i

i

iji

i

iji

fP

tPtfCM

fP

tPtfP

ftPtP

fP

tPtfCMtP

St

ff

f

t

jttit

fj

fi f

S

25

Tests and results

The system was tested using a database of one million records that was constructed by using random combinations of 4,375 female first names, 1,129 male first names, and 88,799 last names.

The account numbers were generated so that the values for the last four digits of the number occurred with equal frequency throughout the database.

The city, postal code, and phone number fields were generated to correspond the locations in the U.K.

26

Tests and results (cont)

Our test involved identifying 300 different records in the database.

If the system was unable to make an identification of the target record after asking the user for all of the information, the caller was asked to make a second attempt using the same information.

If the system failed to produce a result after the second attempt, the call was terminated at that point.

27


For each telephone call, the users were asked eight questions Enter your ID, using you telephone keypad, followed by

the pound key. If you make a mistake press the star key.

You entered (the value entered in (1)) if this is correct, press ‘1’. If it is not, press’2’.

Please say the first four letters of your last name.

Please say the first four digits of your first name.

28


Please say the last four digits of your card number.

Could you please spell the city currently listed on your account?

Please say you phone number.

Please say the postal code currently listed on your account.

29


Fig 2. summary of results from 300 calls.

30


Fig 3. Rate of ASR character misrecognition by filed

31


Fig. 4. Rate of misrecognition of field by ASR (misrecognition = at least one error made in spelled filed value)

32


Fig 5. Average cumulative number of records examined by system

33

Conclusions

The method tolerates high misrecognition rates.

The method can be used with off-the-shelf component; it doesn’t require specialized ASR.

To allow the personal identification information to be spoken instead of spelled tokens.

34

Record Rv that will be verified

oper

ator

Request T from caller

Collect subset near T

Add subset to S

Update the risk for eachRecord in S

Rm <- min risk in S

Rm == Rv

Accept !

Risk(Rv) < Risk(reject)

Another T?

Reject !

Select another T

No

Yes

Yes

No

No

Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer,...

Documents

Transcript of Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer,...