By Varsha Turkar1 Task oriented application of automatic speech recognition Chapter9.
-
Upload
godfrey-gilbert -
Category
Documents
-
view
215 -
download
1
Transcript of By Varsha Turkar1 Task oriented application of automatic speech recognition Chapter9.
By Varsha Turkar 2
Task specific voice control and dialog system
• To integrate a speech recognition system into a task specific application to perform a useful task
• System consists of– A speech recognizer– A language analyzer– An expert system– A physical system being controlled by
the voice commands– Text to speech synthesizer
By Varsha Turkar 3
Speech recognizer
Languageanalyzer
Expert system
Text toSpeech
synthesizer
Vocabulary& grammar
modelSemantic
rules
System under voice control
executescommands
Reports status
Pronunciationrules
Output action
VoiceI/p
SpeechText Meaning Text Reply Speech
Issues command to system
Receives data from system
VoiceO/p
Converts I/p into grammatically correct text
Extract meaning from text
Selects desired action
Converts text reply in m/c generated speech
Fig:Block diagram of a task-specific voice control and dialog system
By Varsha Turkar 4
Speech recognizer: • The function of this block is to
convert speech I/p into a grammatically correct text.
• It is constrained by the recognizer vocabulary and grammar model.
• The text string is sent to a language analyzer
By Varsha Turkar 5
Language analyzer:• Extracts the meaning from the text
with the help of semantic rules • The decoded meaning is sent to the
expert systemExpert system:• First selects the desired action then
issues appropriate commands to a physical system under voice control to carry out the action then receives data on the command status
By Varsha Turkar 6
• Ex. “command carried out successfully” or “unsuccessfully” and then construct a textual reply
Text to speech synthesizer:• A text reply is converted into a
speech message with appropriate word pronunciation rules and played back to the user
• The system in the figure performs the specific task of interest
By Varsha Turkar 7
• Beneficial to the user• User friendly• Accurate• Real time
Characteristics of speech recognition applications
By Varsha Turkar 8
1. Proposed system must provide a real benefit to the used in the form of
• Increased productivity• Ease of use• Better m/c interface or a more
natural way of communication• If the application is not useful to the
user it do not succeed over time
By Varsha Turkar 9
2. The system must be user friendly. User should feel comfortable, it must provide friendly and helpful voice prompts and it must provide an effective means of communications.
3. The system must be accurate.4. The recognition system must
respond in real time. The response should be very fast
By Varsha Turkar 10
Methods of handling recognition errors
Four ways to deal with the errors:• Fail soft methods• Self-detection/correction of errors• Verification or multilevel decision
before proceeding • Rejection/pass to operator
By Varsha Turkar 11
Fail soft methods:• The cost (in terms of time) of
recognition error is low• Hence the error is acceptable• The error will be detected and
corrected at the later stage • The user can enter into a
correction mode to backtrack to the point where the error was made
By Varsha Turkar 12
Self-detection/correction of errors:• The recognition system utilizes
known task constraints (given database) to automatically detect and correct recognition errors
• Ex. Spelling of the name from finite list of names
By Varsha Turkar 13
Verification or multilevel decision before proceeding:
• The recognition system ask the user for help whenever likelihood score is high and it is difficult resolving small differences in the strings
• The recognizer ask the user to verify the first choice decision; if it is not verified, the recognizer ask the user to verify the second choice
By Varsha Turkar 14
Rejection/pass on to operator:• By recording all spoken I/ps in digital
format, the system can reduce the error rate by rejecting a small but finite percentage of the spoken strings, and passing on such strings to a human operator who makes the final decision based on listening to the spoken inputBy using all four techniques the accuracy of speech recognizer approaches 100%
By Varsha Turkar 15
Broad classes of speech recognition applications
Five broad classes:1. Office or business system2. Manufacturing3. Telephone or telecommunications4. Medical5. other
By Varsha Turkar 16
1. Office or business system• Data entry• Database management and
control• Keyboard enhancement 2. Manufacturing• Eyes-free , hand free monitoring
of manufacturing foe quality control
By Varsha Turkar 17
3. Telephone or telecommunications• Many applications are feasible
over dialed up telephones• Automation of operator assisted
services• Telemarketing• Call distribution by voice
By Varsha Turkar 18
4. Medical: The primary application is voice creation and editing of specialized medical reports
5. Other: • Voice controlled and operated
games and toys• Voice recognition aids for the
handicapped• Voice control in a moving vehicle• Climate control
By Varsha Turkar 19
Command and control applications
User can control the machines using simple commands
• Voice repertory dialer: a dialer allows a caller to place a calls be speaking the name of someone in the repertory (accumulation) rather than dialing the digit code.
• Used in mobile phone, within a car (eyes and hands free)
By Varsha Turkar 20
• A repertory dialer needs a speaker trained set of vocabulary pattern corresponding to repertory names (and their phone no.)
• Needs a speaker independent set of vocabulary patterns corresponding to the digits and set of command words for controlling normal telephone features (off-hook, dial, repeat, hang up)
By Varsha Turkar 21
Automated call type recognition
• The automation of operator-assisted to parallel calls
• Ex. Call made from a pay phone that normally require operator assistance, including collect calls, person to person calls, third party billing calls, operator assisted calls and credit card calls
By Varsha Turkar 22
Five options for this service a vocabulary consisting only five words is adequate:
• “Collect” to make collect calls• “Person” to make person to person
calls• “Third number” to make third party
billing calls• “Operator” to make operator assisted
calls• “Calling card” to make calling card
calls
By Varsha Turkar 23
• The system is speaker independent and can work over the standard dialed-up telephone network
• If the customer obeys the voice prompt and spoke one of the command words then the accuracy of the system is more than 99%
• Customer have to use the specific command word
• Or otherwise keyword spotting technique have to be used to find out the command words embedded within the sentence
By Varsha Turkar 24
Call distribution by voice commands
• A call is placed that will normally answered by an operator who then distributed the call to the appropriate location (person) based on the users responses to the questions asked by the attendant
• In this application the attendant function is automated via voice processing
By Varsha Turkar 25
• The voice response system poses a series of menu based questions, and based on the user responses, route the call appropriately
• Ex. Railway system
By Varsha Turkar 26
Directory listing retrieval
Speechrecognizer
Voice responseunit
Telephonedirectory
User
Phone
Directory information
Spelled name
Computer
By Varsha Turkar 27
• Provides the access to directory information from spoken spelled name
• To access the directory information for a name in the directory, the user spells the name using the word “stop” between the last name and the initials as in “Rabiner-stop-LR-stop”
• The speech recognizer demands the name in the given directory which best matches the spoken input and then speaks the directory information for that name to the user.
By Varsha Turkar 28
• Due to similar sounding letters there may be error, but the telephone directory provides task syntax that automatically detects and corrects improperly recognized letters
• System can handle common misspelling of names with a single insertion or deletion of letter
By Varsha Turkar 29
Credit card sales validation
• Merchant needs cc validation and does not have automatic card reader or modern dialer then
• He must call a specific number and provides an attendant with 10 digit merchant identification number, a 15 digit cc no. and the amount in rupees of the transaction
By Varsha Turkar 30
• In this case a speech recognition system uses a connected digit recognizer to recognize the merchant identification number and CC no. and a connected word recognizer for the transaction amount
• For amount the vocabulary size for recognition is larger than that of need for cc no
• Because same string can be spoken in various ways ( Rs. 137 Rs one three seven/ rs. One thirty seven etc )