Voice User Interface for Mobile Applications

54
Building a Voice User Interface Android Speech Recognition and Text-To-Speech Tuesday, January 29, 13

description

Android Speech Recognition and Text-To-Speech - How to voice-enable your mobile application "What does a weasel look like?" We are taking a closer look at Android's Speech-To-Text (STT) and Text-To-Speech (TTS) capabilities - and will develop and deploy three small apps, each a little more capable, and finally walk through the steps of building a voice controlled assistant. Android uses Google's Speech-To-Text engine in the cloud but has Text-To-Speech capabilities baked right into Android since Android 2.0 (Donut), using SVOX Pico with six language packages (US and UK English, German, French, Italian and Spanish). While Speech Recognition, Interpretation, and Text-To-Speech Synthesizer are addressed by phone equipment- and OS makers, the core problem of how to capture knowledge and make it accessible to smart software agents is ignored and all service like SIRI or Google Voice Actions remain closed, i.e. not easily extendable with 3rd party information/knowledge.

Transcript of Voice User Interface for Mobile Applications

Page 1: Voice User Interface for Mobile Applications

Building a Voice User Interface Android Speech Recognition and Text-To-Speech

Tuesday, January 29, 13

Page 2: Voice User Interface for Mobile Applications

Building a Voice User Interface Android Speech Recognition and Text-To-Speech

Tuesday, January 29, 13

Page 3: Voice User Interface for Mobile Applications

http://wolfpaulus.comTuesday, January 29, 13

Page 4: Voice User Interface for Mobile Applications

Star Trek© 2012-2013 Wolf Paulus - http://wolfpaulus.com

Tuesday, January 29, 13

Page 5: Voice User Interface for Mobile Applications

Tuesday, January 29, 13

Page 6: Voice User Interface for Mobile Applications

Red Planet© 2012-2013 Wolf Paulus - http://wolfpaulus.com

Tuesday, January 29, 13

Page 7: Voice User Interface for Mobile Applications

Tuesday, January 29, 13

Page 8: Voice User Interface for Mobile Applications

2001 Space Odyssey © 2012-2013 Wolf Paulus - http://wolfpaulus.com

Tuesday, January 29, 13

Page 9: Voice User Interface for Mobile Applications

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 10: Voice User Interface for Mobile Applications

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 11: Voice User Interface for Mobile Applications

If a computer could think, how could we tell?

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 12: Voice User Interface for Mobile Applications

In 1950, Alan Turing suggested:

“If the responses from the computer were indistinguishable from that of a human,the computer could be said to be thinking.”

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 13: Voice User Interface for Mobile Applications

Loebner Prize Solid 18 Carat Gold Medal

Grand Prize of $100,000 and a Gold Medal for the first computer whose responses were indistinguishable from a human's.

Each year an annual prize of $2000 and a bronze medal is awarded to the most human-like computer.

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 14: Voice User Interface for Mobile Applications

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 15: Voice User Interface for Mobile Applications

How can we create a Chat bot ?

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 16: Voice User Interface for Mobile Applications

Capture Speech Input

Convert Speech into Text

Create Text Response

Message or Command ?

Execute Command

Message or Action ?

Perform Action

Synthesize Voice

(Message)

SpeekMessage

Cmd

Msg

Action

access Web Serviceperform on Device

Msg

ai

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 17: Voice User Interface for Mobile Applications

Capture Speech Input

Convert Speech into Text

Synthesize Voice (Message)

SpeekMessage

access Web Serviceperform on Device

Echo Bot

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 18: Voice User Interface for Mobile Applications

Capture Speech Input

void startVoiceRecognitionActivity() {Intent intent = new Intent( RecognizerIntent.ACTION_RECOGNIZE_SPEECH );

intent.putExtra( RecognizerIntent.EXTRA_PROMPT, "Speak to Bot1");intent.putExtra( RecognizerIntent.EXTRA_MAX_RESULTS, 1);

intent.putExtra( RecognizerIntent.EXTRA_CALLING_PACKAGE, getClass().getPackage().getName() );

intent.putExtra( RecognizerIntent.EXTRA_LANGUAGE_MODEL, ! ! ! ! !!                          RecognizerIntent.LANGUAGE_MODEL_FREE_FORM );

startActivityForResult( intent, VOICE_RECOGNITION_REQUEST_CODE );

}

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 19: Voice User Interface for Mobile Applications

@Override

protected void onActivityResult( int requestCode, int resultCode, Intent data ) {

switch (requestCode) {

case VOICE_RECOGNITION_REQUEST_CODE:

if (resultCode == RESULT_OK) {

ArrayList<String> matches = data.getStringArrayListExtra( RecognizerIntent.EXTRA_RESULTS );

say( matches.get(0) );

} else {

mTV_STT.setText("");

}

break;

}

}

... speech has been converted into text ...

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 20: Voice User Interface for Mobile Applications

Convert Speech into Text

Capture Speech Input

t

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 21: Voice User Interface for Mobile Applications

private TextToSpeech mTts;..mTts = new TextToSpeech( this, this ); !    // Context, TextToSpeech.OnInitListener ..

// Implement TextToSpeech.OnInitListener@Overridepublic void onInit( final int status ) {! if ( status == TextToSpeech.SUCCESS && mTts != null ) {

! ! startVoiceRecognitionActivity();

! ! mTts.setOnUtteranceCompletedListener(new TextToSpeech.OnUtteranceCompletedListener() {

! ! ! @Override! ! ! public void onUtteranceCompleted( final String s ) {! ! ! ! startVoiceRecognitionActivity();! ! ! }});

! } else {! ! mTV_TTS.setText("Could not initialize TextToSpeech.");! }}

Synthesize Voice (Message)

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 22: Voice User Interface for Mobile Applications

private void say(final String s) {

final HashMap<String, String> map = new HashMap<String, String>(1);

map.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, UTTERANCE_ID);

mTts.speak(s, TextToSpeech.QUEUE_FLUSH, map);

}

SpeekMessage

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 23: Voice User Interface for Mobile Applications

Code and DemoCapture Speech Input

Convert Speech into Text

Synthesize Voice (Message)

SpeekMessage

access Web Serviceperform on Device

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 24: Voice User Interface for Mobile Applications

Capture Speech Input

Convert Speech into Text

Execute Command

Synthesize Voice (Message)

SpeekMessage

access Web Serviceperform on Device

“stock quote for ...”

Stock Quote Bot

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 25: Voice User Interface for Mobile Applications

@Override protected void onActivityResult( int requestCode, int resultCode, Intent data ) {

switch (requestCode) {

case VOICE_RECOGNITION_REQUEST_CODE: if (resultCode == RESULT_OK) {

ArrayList<String> matches = data.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS); say( matches.get(0) );

} else {

mTV_STT.setText(""); }

break;

} }

... speech has been converted into text ...

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 26: Voice User Interface for Mobile Applications

@Override protected void onActivityResult( int requestCode, int resultCode, Intent data ) {

switch (requestCode) {

case VOICE_RECOGNITION_REQUEST_CODE: if (resultCode == RESULT_OK) {

ArrayList<String> matches = data.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS);final String s = matches.get(0);

final int k = s.indexOf( KEY_WORD );

! if (0 <= k) {! final String ticker = s.substring( k + KEY_WORD.length() ).trim();

! ! ! new YQuote(mHandler).execute( ticker );! ! } else {

say( s );

} } else {

mTV_STT.setText(""); }

break;

}

}

... speech has been converted into text ...

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 27: Voice User Interface for Mobile Applications

Code and DemoCapture Speech Input

Convert Speech into Text

Execute Command

Synthesize Voice (Message)

SpeekMessage

access Web Serviceperform on Device

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 28: Voice User Interface for Mobile Applications

Create Text Response

What is cheese ?What is chocolate?

How old are you? Who is the President?Where is Atlantis? What’s up?

Did you have dinner already?

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 29: Voice User Interface for Mobile Applications

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 30: Voice User Interface for Mobile Applications

Artifical Intelligence Markup Language (AIML)

<?xml version="1.0" encoding="ISO-8859-1"?> <aiml>

<category><pattern>WHAT IS AIML</pattern><template>

AIML is short for Artifical Intelligence Markup Language

</template></category>

</aiml>

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 31: Voice User Interface for Mobile Applications

<?xml version="1.0" encoding="ISO-8859-1"?> <aiml>

<category><pattern>TELL ME WHAT AIML IS</pattern><template>

<srai>WHAT IS AIML</srai></template>

</category></aiml>

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 32: Voice User Interface for Mobile Applications

<?xml version="1.0" encoding="ISO-8859-1"?> <aiml>

<category><pattern>WHAT IS AIML</pattern><template>

<random><li>First response</li><li>Second response</li><li>3rd response</li>

</random></template>

</category></aiml>

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 33: Voice User Interface for Mobile Applications

<?xml version="1.0" encoding="ISO-8859-1"?> <aiml>

<category><pattern>TELL ME WHAT * IS</pattern><template>

I don't know what <star/> is.</template>

</category></aiml>

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 34: Voice User Interface for Mobile Applications

AIML Spec.• http://www.alicebot.org/TR/2011/

AIML Primer• http://www.alicebot.org/documentation/aiml-primer.html

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 35: Voice User Interface for Mobile Applications

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 36: Voice User Interface for Mobile Applications

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 37: Voice User Interface for Mobile Applications

Program D v4.6 http://aitools.org/

last updated:14-Mar-2006

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 38: Voice User Interface for Mobile Applications

CharlieBot 4.1.8 http://sourceforge.net/projects/charliebot/

Forked from Program D v4.1.5 works on Mac OS X or any Java 1.3 or better VM

last updated:14-Dec-2002

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 39: Voice User Interface for Mobile Applications

ChatterBean http://www.geocities.ws/phelio/chatterbean/

ChatterBean is an AIML interpreter (also known as "Alicebot") written in pure Java.

Fully AIML 1.0.1 compliant

last updated:11-May-2006

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 40: Voice User Interface for Mobile Applications

AIML Sets

• http://aitools.org/Free_AIML_sets

• http://code.google.com/p/aiml-en-us-foundation-alice/

• http://www.square-bear.co.uk/aiml/

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 41: Voice User Interface for Mobile Applications

http://myAIMLServer:PORT/talk?botid=xyz..

XML-RPC: http://myAIMLServer:PORT/talk-xml

HTTP-POSTbotid=”xzy..”input=”Hello”custid=”d22..”

RESPONSE:<result status="0" botid="xyz.."

custid="d2228e2eee12d255"> <input>Hello</input> <that>Hi there!</that></result>

HTML:

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 42: Voice User Interface for Mobile Applications

Capture Speech Input

Convert Speech into Text

Create Text Response

Message or Command ?

Execute Command

Synthesize Voice (Message)

SpeekMessage

Cmd

Msg

access Web Serviceperform on Device

Msg

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 43: Voice User Interface for Mobile Applications

@Overrideprotected void onActivityResult(final int requestCode, final int resultCode, final Intent data) {! switch (requestCode) {! ! case VOICE_RECOGNITION_REQUEST_CODE:! ! if (resultCode == RESULT_OK) {! ! ! final ArrayList<String> matches = data.getStringArrayListExtra( RecognizerIntent.EXTRA_RESULTS);! ! ! final String s = matches.get(0);! ! ! new AIML_RPC(mHandler).execute(s);! ! ! mTV_STT.setText(s);! ! } else {! ! ! mTV_STT.setText("");! ! }! ! break;! }}

Speech has been converted into text ...

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 44: Voice User Interface for Mobile Applications

@Overridepublic void handleMessage(final Message msg) {! if (msg.getData() != null && !msg.getData().isEmpty()) {! ! String s = msg.getData().getString(Bot3.BUNDLE_KEY_NAME_FOR_MSG);! ! if (s != null && 0 < s.length()) {! ! ! int i = s.indexOf("CMD ");! ! ! if (0 <= i) {! ! ! ! s = s.substring(i + 4, s.endsWith(".") ? s.length() - 1 : s.length());! ! ! ! String cmd;! ! ! ! int k = s.indexOf(" ");! ! ! ! if (0 < k) {! ! ! ! ! cmd = s.substring(0, k);! ! ! ! ! s = s.substring(k + 1).replace(" ", "");! ! ! ! ! if (Bot3.KEY_WORD.equals(cmd)) {! ! ! ! ! ! new YQuote(mHandler).execute(s);! ! ! ! ! }! ! ! ! }! ! ! } else {! ! ! ! Bot3.this.say(s);! ! ! }! ! }! }}

Handler

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 45: Voice User Interface for Mobile Applications

Code and DemoCapture Speech Input

Convert Speech into Text

Create Text Response

Message or Command ?

Execute Command

Synthesize Voice (Message)

SpeekMessage

Cmd

Msg

access Web Serviceperform on Device

Msg

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 46: Voice User Interface for Mobile Applications

Summary

Tuesday, January 29, 13

Page 47: Voice User Interface for Mobile Applications

Capture Speech Input

Convert Speech into Text

Synthesize Voice (Message)

SpeekMessage

access Web Serviceperform on Device

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 48: Voice User Interface for Mobile Applications

Capture Speech Input

Convert Speech into Text

Execute Command

Synthesize Voice (Message)

SpeekMessage

access Web Serviceperform on Device

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 49: Voice User Interface for Mobile Applications

Capture Speech Input

Convert Speech into Text

Create Text Response

Message or Command ?

Execute Command

Synthesize Voice (Message)

SpeekMessage

Cmd

Msg

access Web Serviceperform on Device

Msg

AIML Bot

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 50: Voice User Interface for Mobile Applications

Cora, your imaginary friendTechcasita Productions

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 51: Voice User Interface for Mobile Applications

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 52: Voice User Interface for Mobile Applications

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 53: Voice User Interface for Mobile Applications

© 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13

Page 54: Voice User Interface for Mobile Applications

Tuesday, January 29, 13