By Varsha Turkar1 Task oriented application of automatic speech recognition Chapter9.

By Varsha Turkar 1

Task oriented application of automatic speech

recognition

Chapter9

By Varsha Turkar 2

Task specific voice control and dialog system

• To integrate a speech recognition system into a task specific application to perform a useful task

• System consists of– A speech recognizer– A language analyzer– An expert system– A physical system being controlled by

the voice commands– Text to speech synthesizer

By Varsha Turkar 3

Speech recognizer

Languageanalyzer

Expert system

Text toSpeech

synthesizer

Vocabulary& grammar

modelSemantic

rules

System under voice control

executescommands

Reports status

Pronunciationrules

Output action

VoiceI/p

SpeechText Meaning Text Reply Speech

Issues command to system

Receives data from system

VoiceO/p

Converts I/p into grammatically correct text

Extract meaning from text

Selects desired action

Converts text reply in m/c generated speech

Fig:Block diagram of a task-specific voice control and dialog system

By Varsha Turkar 4

Speech recognizer: • The function of this block is to

convert speech I/p into a grammatically correct text.

• It is constrained by the recognizer vocabulary and grammar model.

• The text string is sent to a language analyzer

By Varsha Turkar 5

Language analyzer:• Extracts the meaning from the text

with the help of semantic rules • The decoded meaning is sent to the

expert systemExpert system:• First selects the desired action then

issues appropriate commands to a physical system under voice control to carry out the action then receives data on the command status

By Varsha Turkar 6

• Ex. “command carried out successfully” or “unsuccessfully” and then construct a textual reply

Text to speech synthesizer:• A text reply is converted into a

speech message with appropriate word pronunciation rules and played back to the user

• The system in the figure performs the specific task of interest

By Varsha Turkar 7

• Beneficial to the user• User friendly• Accurate• Real time

Characteristics of speech recognition applications

By Varsha Turkar 8

1. Proposed system must provide a real benefit to the used in the form of

• Increased productivity• Ease of use• Better m/c interface or a more

natural way of communication• If the application is not useful to the

user it do not succeed over time

By Varsha Turkar 9

2. The system must be user friendly. User should feel comfortable, it must provide friendly and helpful voice prompts and it must provide an effective means of communications.

3. The system must be accurate.4. The recognition system must

respond in real time. The response should be very fast

By Varsha Turkar 10

Methods of handling recognition errors

Four ways to deal with the errors:• Fail soft methods• Self-detection/correction of errors• Verification or multilevel decision

before proceeding • Rejection/pass to operator

By Varsha Turkar 11

Fail soft methods:• The cost (in terms of time) of

recognition error is low• Hence the error is acceptable• The error will be detected and

corrected at the later stage • The user can enter into a

correction mode to backtrack to the point where the error was made

By Varsha Turkar 12

Self-detection/correction of errors:• The recognition system utilizes

known task constraints (given database) to automatically detect and correct recognition errors

• Ex. Spelling of the name from finite list of names

By Varsha Turkar 13

Verification or multilevel decision before proceeding:

• The recognition system ask the user for help whenever likelihood score is high and it is difficult resolving small differences in the strings

• The recognizer ask the user to verify the first choice decision; if it is not verified, the recognizer ask the user to verify the second choice

By Varsha Turkar 14

Rejection/pass on to operator:• By recording all spoken I/ps in digital

format, the system can reduce the error rate by rejecting a small but finite percentage of the spoken strings, and passing on such strings to a human operator who makes the final decision based on listening to the spoken inputBy using all four techniques the accuracy of speech recognizer approaches 100%

By Varsha Turkar 15

Broad classes of speech recognition applications

Five broad classes:1. Office or business system2. Manufacturing3. Telephone or telecommunications4. Medical5. other

By Varsha Turkar 16

1. Office or business system• Data entry• Database management and

control• Keyboard enhancement 2. Manufacturing• Eyes-free , hand free monitoring

of manufacturing foe quality control

By Varsha Turkar 17

3. Telephone or telecommunications• Many applications are feasible

over dialed up telephones• Automation of operator assisted

services• Telemarketing• Call distribution by voice

By Varsha Turkar 18

4. Medical: The primary application is voice creation and editing of specialized medical reports

5. Other: • Voice controlled and operated

games and toys• Voice recognition aids for the

handicapped• Voice control in a moving vehicle• Climate control

By Varsha Turkar 19

Command and control applications

User can control the machines using simple commands

• Voice repertory dialer: a dialer allows a caller to place a calls be speaking the name of someone in the repertory (accumulation) rather than dialing the digit code.

• Used in mobile phone, within a car (eyes and hands free)

By Varsha Turkar 20

• A repertory dialer needs a speaker trained set of vocabulary pattern corresponding to repertory names (and their phone no.)

• Needs a speaker independent set of vocabulary patterns corresponding to the digits and set of command words for controlling normal telephone features (off-hook, dial, repeat, hang up)

By Varsha Turkar 21

Automated call type recognition

• The automation of operator-assisted to parallel calls

• Ex. Call made from a pay phone that normally require operator assistance, including collect calls, person to person calls, third party billing calls, operator assisted calls and credit card calls

By Varsha Turkar 22

Five options for this service a vocabulary consisting only five words is adequate:

• “Collect” to make collect calls• “Person” to make person to person

calls• “Third number” to make third party

billing calls• “Operator” to make operator assisted

calls• “Calling card” to make calling card

calls

By Varsha Turkar 23

• The system is speaker independent and can work over the standard dialed-up telephone network

• If the customer obeys the voice prompt and spoke one of the command words then the accuracy of the system is more than 99%

• Customer have to use the specific command word

• Or otherwise keyword spotting technique have to be used to find out the command words embedded within the sentence

By Varsha Turkar 24

Call distribution by voice commands

• A call is placed that will normally answered by an operator who then distributed the call to the appropriate location (person) based on the users responses to the questions asked by the attendant

• In this application the attendant function is automated via voice processing

By Varsha Turkar 25

• The voice response system poses a series of menu based questions, and based on the user responses, route the call appropriately

• Ex. Railway system

By Varsha Turkar 26

Directory listing retrieval

Speechrecognizer

Voice responseunit

Telephonedirectory

User

Phone

Directory information

Spelled name

Computer

By Varsha Turkar 27

• Provides the access to directory information from spoken spelled name

• To access the directory information for a name in the directory, the user spells the name using the word “stop” between the last name and the initials as in “Rabiner-stop-LR-stop”

• The speech recognizer demands the name in the given directory which best matches the spoken input and then speaks the directory information for that name to the user.

By Varsha Turkar 28

• Due to similar sounding letters there may be error, but the telephone directory provides task syntax that automatically detects and corrects improperly recognized letters

• System can handle common misspelling of names with a single insertion or deletion of letter

By Varsha Turkar 29

Credit card sales validation

• Merchant needs cc validation and does not have automatic card reader or modern dialer then

• He must call a specific number and provides an attendant with 10 digit merchant identification number, a 15 digit cc no. and the amount in rupees of the transaction

By Varsha Turkar 30

• In this case a speech recognition system uses a connected digit recognizer to recognize the merchant identification number and CC no. and a connected word recognizer for the transaction amount

• For amount the vocabulary size for recognition is larger than that of need for cc no

• Because same string can be spoken in various ways ( Rs. 137 Rs one three seven/ rs. One thirty seven etc )

By Varsha Turkar 31

End of Chapter 9

By Varsha Turkar1 Task oriented application of automatic speech recognition Chapter9.

Documents

Transcript of By Varsha Turkar1 Task oriented application of automatic speech recognition Chapter9.