How Spread Works. Spread Spread stands for Speech and Phoneme Recognition as Educational Aid for the...

46
How Spread Works

Transcript of How Spread Works. Spread Spread stands for Speech and Phoneme Recognition as Educational Aid for the...

How Spread Works

Spread

• Spread stands for Speech and Phoneme Recognition as Educational Aid for the Deaf and Hearing Impaired Children

• It is a game used to visually motivate deaf and hearing impaired children to learn to speak

CLIENT

How does Spread work?

Record

Selection

SERVER

SphinxSphinx

.wav file + current word Transcribe

ScoringresultFeedback

CLIENT

Selection

Selection

SERVER

Selection

• The user is presented with a screen showing the word to pronounce

Selection

• The user is presented with a screen showing the word to pronounce

Selection

• The user is presented with a screen showing the word to pronounce

CLIENT

Recording

Record

Selection

SERVER

Recording

• Recording begins once the user clicks the record button.

CLIENT

Transmission

Record

Selection

SERVER

.wav file + current word

Transmission

• Transmission begins once the stop button is pressed. • The wav file, the current word and the training

phoneme are sent to the server for processing.

transmission

CLIENT

K AA R

SERVER

Training PhonemeTraining

Phoneme

CLIENT

Transcribing & Sphinx

Record

Selection

SERVER

SphinxSphinx

.wav file + current word Transcribe

Transcribing

• Once the wav file arrives at the server, it is inputted into Sphinx in order to recognize what the user said

SphinxSphinx

Sphinx

• Sphinx is a Java-based Hidden Markov Model speech recognition system developed by Carnegie Mellon University

SphinxSphinx

Sphinx

• To decode the wav file, Sphinx needs three data sets– Acoustic Model– Dictionary– Language Model

SphinxSphinx

Acoustic Model

Acoustic Model

DictionaryDictionary

Language Model

Language Model

Acoustic Model

• The Acoustic Model maps sound features to units of speech called phonemes

• Derived through the sampling of a large data set of spoken words called a speech corpus

K

AA

R

Dictionary

• The dictionary maps words into phonemes

...CAN K AE NCAR K AA RCAT K AE T T...

Language Model

• The language model indicates the probability of a particular word appearing given the previous words– Not used since Spread only needs to recognize individual

words

Decoding

• Sphinx in Spread is configured to detect what phonemes were pronounced by the user

SPHINXSPHINX

KKAAAA

RR

Increasing Accuracy

• To increase accuracy, Sphinx in Spread is only made to recognize a limited number of phonemes per level

• 7 levels means 7 individually configured Sphinxes

Sphinx Level1Sphinx Level1

CAR, JAR, STAR…CAR, JAR, STAR…

Sphinx Level2Sphinx Level2

BED, NET, TENT…BED, NET, TENT…

Sphinx Level3Sphinx Level3

PLAY, PARTY, CIRCLE…

PLAY, PARTY, CIRCLE…

CLIENT

Scoring

Record

Selection

SERVER

SphinxSphinx

.wav file + current word Transcribe

Scoring

Scoring

• The server compares the decoded result against the expected result, taking note of the training phoneme

SphinxSphinx

You said: K AA R

You said: K AA R

K AA RExpected:

Training PhonemeTraining

Phoneme

CLIENT

Final result

Record

Selection

SERVER

SphinxSphinx

.wav file + current word Transcribe

ScoringresultFeedback

Feedback

• The result is sent over to the client to give feedback to the user

Preliminary results

• Tested with adult members of the hearing impaired community– Very positive. – "I wish I had this when I was learning speech"

• Problems: Too enthusiastic– Loud cheering noises reduced recognition rates

Preliminary results

• SPREAD was tested with hearing impaired students of the SPED division of the Batino Elementary School in Proj. 3, Quezon City– Accuracy testing and software evaluation

Working with the children

• Of the 40 students, only 5 volunteered to test the software– The children were generally shy and hesitant to perform

the speech

Working with the children

• The children only knew very few words– They knew how to sign some of the words but not to

vocalize them

• General mood was as if they were taking an exam that they were not prepared for

Working with the children

• Surprisingly, children were very good at conversational phrases– “Good morning”– “Good bye and thank you!”

Working with the teachers

• Teachers still need to help the students vocalize some words– System at yet cannot be left unsupervised with the

students

Working with the teachers

• Noisy screen distracts students– Need to have a simpler screen to focus on

Recognition Rates

• Sphinx recognition rates were low– Hampered by noisy environment

Conclusion

• Need to work closely with SPED teachers on speech curriculum– Test on just recently learned words

• Conversational phrases– Hearing impaired children use simple phrases rather than

words.– Conversational phrases spoken, other words signed

• UI improvements, simple is better• Accuracy improvements urgently needed

The Spread Team

Image Sources• Microphone - http://mmflc.com/images/microphone-stock-image.jpg• Crystal Project - http://www.everaldo.com/crystal/• Wave form - http://bipinb.com/converting-wav-file-to-gsm-file.htm

• Extra slides follow…

Scoring

• There are three possible outcomes– EXCELLENT– Good– Sorry

Scoring

• Getting the training phoneme correctly as well as the correct length of the phoneme gets an EXCELLENT score

K AA RExpected:

SphinxSphinx

You said: K AA R

You said: K AA R

3 Phonemes Long

Got the Training Phoneme

Scoring

• Note that Spread is only looking for the correct pronunciation of the training vowel

K AA RExpected:

SphinxSphinx

You said: K AA T

You said: K AA T

3 Phonemes Long

Got the Training Phoneme

Scoring

• Not getting the correct word length gets a Good score

K AA RExpected:

SphinxSphinx

You said: K AA R TYou said: K AA R T

3 Phonemes Long

Got the Training Phoneme

Scoring

• Not getting the training vowel means the user will have to try again– Length is no longer checked

K AA RExpected:

SphinxSphinx

You said: K AE R

You said: K AE R

Got the Training PhonemeSorry =(

Updates

• SPREAD has undergone BETA testing with a group of hearing impaired adults– Testing of original (pass/fail) algorithm

• Results– Low recognition rates even for recognizable speech– Puzzling due to high recognition rates with lab speech

Recognition RateWord Rate Close wordApple 60% Apple (60%)Art 6% Bat (66%)Banana 13% Apple(73%)Bat 66% Bat (66%)Car 0% Hand (46%)Fan 0% Hand (53%)Hand 20% Bat (33%)Jar 0% Hand (60%)Lamb 0% Apple (33%)Sofa 0% Hand (46%)Star 0% Fan (26%)Table 0% Apple (46%)Van 0% Art (26%)Wallet 0% Hand (60%)

Analysis

• Microphone

Lab test data

Live data

Recommendations

• Better microphone/setup– Sphinx has preprocessing modules for less noise

• Per word recognition– Use creative word combinations to isolate training

phoneme w/o having to go into per phoneme recognition

• Check out phoneme recognizers

Per phoneme recognition

• Per phoneme recognition is worse– Spread is highly dependent on full words for increased

recognition ratesRecognizing: Lamb2.wav I heard: ae ah mRecognizing: Lamb3.wav I heard: ae mRecognizing: Sofa1.wav I heard: s ow l owRecognizing: Sofa2.wav I heard: s aeRecognizing: Sofa3.wav I heard: s ow hh aaRecognizing: Star1.wav I heard: ao tRecognizing: Star2.wav I heard: s d aa rRecognizing: Star3.wav I heard: s d aa rRecognizing: Table1.wav I heard: ah d lRecognizing: Table2.wav I heard: ae ahRecognizing: Table3.wav I heard: ae ahRecognizing: Van1.wav I heard: m aeRecognizing: Van2.wav I heard: m ae