iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… ·...

iROBOTROCK:A

Speech Recognition Mobile Application

Reema PimpalePrabhat NarayanAnand Kamath

Outline• Introduction• Technologies• Current Approaches• Our Solution• Users (Application Domain)• Our Approach• Pending Functionality• Future Enhancements• References

IntroductionRemotely accessing machine to do tasks such as• Play music• Type e-mail• Take notes• Browse web

MotivationTo increase user friendliness by using Speech Recognition.

Technologies• iOS – Operating system developed by Apple in use on

their mobile devices, such as the iPhone• Python – Cross platform programming language• CMUSphinx – A speech recognition toolkit which has a

number of packages for different tasks and applications.• PocketSphinx – A package of CMUSphinx• SphinxBase – A package of CMUSphinx• Acoustic Model – is used by a speech recognition engine

to recognize speech.• LMTool – Web based tool for generating the language

model

Current ApproachesLatest technologies• iPhone – Siri• Android – IrisFeatures supported by the above technology:• Send text messages• Listen to music• Call contacts• Send e-mail• View a map• Visit websites

Our Solution• Recognize the speech and connect to your computer.

• Open a website on your computer by giving a voice inputon your phone.

• Type and send an email from your computer using voiceinput from your phone.

• In short, mobile device acts as a remote control toconnect and carry out tasks on your computer.

Users (Application Domain)

• Extremely useful for handicapped people.

• Useful for people on the go.

Our Approach

Researched the available

speech recognition

libraries

Designed the high

level architecture

design

Designed the

Sequence diagram to understand

the data flow

Build the Application

for the iPhone (Client

Application)

Build the server

ApplicationTest the

application

• Client side - Uses the open source components ofCMUSphinx to provide very complicated voice-recognition functionality.

• Server side – Uses Python to dynamically load additionalclasses at run-time and allow enhanced functionalitywithout requiring future users to touch any previouslydeveloped code

Researched the

available speech

recognition libraries

Designing the high level architecture

design

Designed the Sequence diagram to understand the

data flow

Build the Application for the iPhone

(Client Application) Build the server

Application Test the application

Researched the available speech


Designed the high

level architecture design


data flow






Designing the high level

architecture design

Designed the

Sequence diagram to understand

the data flow

Build the Application for the iPhone (Client

Application) Build the server


Uses the CMUSphinx toolkit and implement thearchitecture for the client.

The main components are• PocketSphinx: lightweight recognizer library written in C.

• Sphinxbase: support library required by Pocketsphinx.(Part of the PocketSphinx component).

• Dictionary: (Language Model and Acoustic model)



Designing the high level

architecture design

Designed the Sequence diagram to

understand the data flow

Build the Application

for the iPhone (Client

Application)

Build the server Application Test the application

• Written in Python• Basic TCP server that listens for messages from client

on an assigned port• Waits for keywords to activate specific functionality• Ex. "MESSAGE" triggers e-mail composition component

– All subsequent messages received will be appendedto a message string

– Waits for "BYE" keyword to end message and sendemail




design


data flow


(Client Application)

Build the server

ApplicationTest the application

Live Demonstration




design


data flow



ApplicationTest the

application

Pending Functionality• Dynamically load additional components enabled on the

server.

• New component will be written as a Python class with a defined set of callbacks.

• Each class defines trigger messages and defines how following messages should be handled to perform any required task

Future Enhancements• Currently uses unencrypted TCP connection.

• Security can be enabled using TLS (Transport LayerSecurity) to encrypt messages.

• Plugin based system can be extended in any way thatthe user desires

RESULT

Successfully implemented a mobile application usingspeech recognition which connects to the computerremotely and is able to send an email via the voice inputgiven through the phone.

References• Carnegie Mellon University, “CMU Sphinx – Open Source Toolkit for Speech

Recognition,” March 2011. http://cmusphinx.sourceforge.net/wiki/

• Huggins-Daines, D.; Kumar, M.; Chan, A.; Black, A.W.; Ravishankar, M.; Rudnicky, A.I.; ,"Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices," Acoustics, Speech and Signal Processing, 2006. ICASSP 2006Proceedings. 2006 IEEE International Conference on , vol.1, no., pp.I, 14-19 May 2006.

• Gulic, M.; Lucanin, D.; Simic, A.; , "A digit and spelling speech recognition system for theCroatian language," MIPRO, 2011 Proceedings of the 34th International Convention ,vol., no., pp.1673-1678, 23-27 May 2011.

• K. F. Lee, H. W. Hon, and R. Reddy, "An overview of the SPHINX speech recognitionsystem," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 38, no.1pp. 35-45, Jan. 1990.

• Cohen, J. Embedded speech recognition applications in mobile phones: status, trendsand challenges. Proc. ICASSP 2008, IEEE Press (2008), pp. 5352-5355.

• Python Software Foundation, “The Python Standard Library,” November 2011.http://docs.python.org/library/index.html

Thank You

Questions?

iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… ·...

Documents

Transcript of iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… ·...