iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… ·...
Transcript of iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… ·...
iROBOTROCK:A
Speech Recognition Mobile Application
Reema PimpalePrabhat NarayanAnand Kamath
Outline• Introduction• Technologies• Current Approaches• Our Solution• Users (Application Domain)• Our Approach• Pending Functionality• Future Enhancements• References
IntroductionRemotely accessing machine to do tasks such as• Play music• Type e-mail• Take notes• Browse web
MotivationTo increase user friendliness by using Speech Recognition.
Technologies• iOS – Operating system developed by Apple in use on
their mobile devices, such as the iPhone• Python – Cross platform programming language• CMUSphinx – A speech recognition toolkit which has a
number of packages for different tasks and applications.• PocketSphinx – A package of CMUSphinx• SphinxBase – A package of CMUSphinx• Acoustic Model – is used by a speech recognition engine
to recognize speech.• LMTool – Web based tool for generating the language
model
Current ApproachesLatest technologies• iPhone – Siri• Android – IrisFeatures supported by the above technology:• Send text messages• Listen to music• Call contacts• Send e-mail• View a map• Visit websites
Our Solution• Recognize the speech and connect to your computer.
• Open a website on your computer by giving a voice inputon your phone.
• Type and send an email from your computer using voiceinput from your phone.
• In short, mobile device acts as a remote control toconnect and carry out tasks on your computer.
Users (Application Domain)
• Extremely useful for handicapped people.
• Useful for people on the go.
Our Approach
Researched the available
speech recognition
libraries
Designed the high
level architecture
design
Designed the
Sequence diagram to understand
the data flow
Build the Application
for the iPhone (Client
Application)
Build the server
ApplicationTest the
application
• Client side - Uses the open source components ofCMUSphinx to provide very complicated voice-recognition functionality.
• Server side – Uses Python to dynamically load additionalclasses at run-time and allow enhanced functionalitywithout requiring future users to touch any previouslydeveloped code
Researched the
available speech
recognition libraries
Designing the high level architecture
design
Designed the Sequence diagram to understand the
data flow
Build the Application for the iPhone
(Client Application) Build the server
Application Test the application
Researched the available speech
recognition libraries
Designed the high
level architecture design
Designed the Sequence diagram to understand the
data flow
Build the Application for the iPhone
(Client Application) Build the server
Application Test the application
Researched the available speech
recognition libraries
Designing the high level
architecture design
Designed the
Sequence diagram to understand
the data flow
Build the Application for the iPhone (Client
Application) Build the server
Application Test the application
Uses the CMUSphinx toolkit and implement thearchitecture for the client.
The main components are• PocketSphinx: lightweight recognizer library written in C.
• Sphinxbase: support library required by Pocketsphinx.(Part of the PocketSphinx component).
• Dictionary: (Language Model and Acoustic model)
Researched the available speech
recognition libraries
Designing the high level
architecture design
Designed the Sequence diagram to
understand the data flow
Build the Application
for the iPhone (Client
Application)
Build the server Application Test the application
• Written in Python• Basic TCP server that listens for messages from client
on an assigned port• Waits for keywords to activate specific functionality• Ex. "MESSAGE" triggers e-mail composition component
– All subsequent messages received will be appendedto a message string
– Waits for "BYE" keyword to end message and sendemail
Researched the available speech
recognition libraries
Designing the high level architecture
design
Designed the Sequence diagram to understand the
data flow
Build the Application for the iPhone
(Client Application)
Build the server
ApplicationTest the application
Live Demonstration
Researched the available speech
recognition libraries
Designing the high level architecture
design
Designed the Sequence diagram to understand the
data flow
Build the Application for the iPhone
(Client Application) Build the server
ApplicationTest the
application
Pending Functionality• Dynamically load additional components enabled on the
server.
• New component will be written as a Python class with a defined set of callbacks.
• Each class defines trigger messages and defines how following messages should be handled to perform any required task
Future Enhancements• Currently uses unencrypted TCP connection.
• Security can be enabled using TLS (Transport LayerSecurity) to encrypt messages.
• Plugin based system can be extended in any way thatthe user desires
RESULT
Successfully implemented a mobile application usingspeech recognition which connects to the computerremotely and is able to send an email via the voice inputgiven through the phone.
References• Carnegie Mellon University, “CMU Sphinx – Open Source Toolkit for Speech
Recognition,” March 2011. http://cmusphinx.sourceforge.net/wiki/
• Huggins-Daines, D.; Kumar, M.; Chan, A.; Black, A.W.; Ravishankar, M.; Rudnicky, A.I.; ,"Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices," Acoustics, Speech and Signal Processing, 2006. ICASSP 2006Proceedings. 2006 IEEE International Conference on , vol.1, no., pp.I, 14-19 May 2006.
• Gulic, M.; Lucanin, D.; Simic, A.; , "A digit and spelling speech recognition system for theCroatian language," MIPRO, 2011 Proceedings of the 34th International Convention ,vol., no., pp.1673-1678, 23-27 May 2011.
• K. F. Lee, H. W. Hon, and R. Reddy, "An overview of the SPHINX speech recognitionsystem," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 38, no.1pp. 35-45, Jan. 1990.
• Cohen, J. Embedded speech recognition applications in mobile phones: status, trendsand challenges. Proc. ICASSP 2008, IEEE Press (2008), pp. 5352-5355.
• Python Software Foundation, “The Python Standard Library,” November 2011.http://docs.python.org/library/index.html
Thank You
Questions?