Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

29
TAUCHI – Tampere Unit for Computer-Human Interaction Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere MUMIN PhD course, Tampere, 18.- 22.11.2002 Speech Application Architectures

description

Speech Application Architectures. Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere MUMIN PhD course, Tampere, 18.-22.11.2002. Outline. Topics Background Architecture types Example architectures Topics for research Jaspis architecture. - PowerPoint PPT Presentation

Transcript of Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

Page 1: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Markku Turunen

Tampere Unit for Human-Computer InteractionUniversity of Tampere

MUMIN PhD course, Tampere, 18.-22.11.2002

Speech Application Architectures

Page 2: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Outline

Topics

• Background

• Architecture types

• Example architectures

• Topics for research

• Jaspis architecture

Page 3: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Software architectures 1

Definitions

• “software architecture defines the system in terms of components and interactions between them. Connectors are used to mediate interaction between the components” [Garlan & Shaw, 1994]

• several views can be used to describe different aspects of software architectures: design view, run-time view, module view, logical view, control view, class view, …

• human-computer interaction viewpoint: support for interaction methods and techniques

Page 4: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Software architectures 2

Software development tools

• support and tools for the construction of practical applications

• core architecture: basic infrastructure (hub/facilitator, communication libraries, blackboard)

• complete architecture: technology components (ASR, TTS), dialogue manager, database, …

• toolkit: dialogue editor, ASR grammar builder, corpus collection tool, annotation editor, …

Page 5: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Speech system components

speech recognition natural languageprocessing

speech synthesis

natural languagegeneration

databasedialogue

management

telephone interface

user

Page 6: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Architecture types 1

Pipelines and dialogue management architectures

• pipeline (batch-sequence) architectures– data flow

– one-way interfaces

– fixed processing order

• dialogue manager architectures– function calls

– dialogue manager as controller

– relaxed processing order

TTSTTSASRASR NLUNLU DMDM NLGNLG

TTSTTSASRASR NLUNLU

DMDM

UMUM NLGNLGDBDB

Page 7: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

client-server and blackboard architectures

• client-server architectures– two-way messages

– hub as coordinator (star topology)

– free processing order

• blackboard (DB) architectures– data events / db operations

– shared information

– free processing order

ISIS

Architecture types 2

TTSTTS

ASRASR

NLUNLUDMDM

UMUM

NLGNLG

DBDB

HUB

TTSTTS

ASRASR

NLUNLUDMDM

UMUMNLGNLG

Page 8: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

agent architectures

• independent agents– independent agents

– facilitator

– collaborative processing

• compact agents– compact agents

– shared knowledge

– distributed processing

Architecture types 3

TTSTTS

ASRASR

NLUNLUDMDM

UMUM

NLGNLG

DBDB

Facilitator

TTSTTS

ASRASRFacilitator ISIS

PE

DADE PA

UA

DE

IA

DA

PEPA

DA

IA

PANLGNLG

NLUNLU

Page 9: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Example architectures 1

GALAXY-II

• MIT / MITRE

• DARPA Communicator reference architecture

• freely available

• HUB and servers

• frames (messages)

• hub scripts route messages

[Seneff et al., 1998]

Page 10: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Example architectures 2

Open Agent Architecture

• general agent architecture

• Facilitator as coordinator

• requesters (tasks)

• services (solutions)

• Interagent Communication Language (ICL)

• freely available

• used in speech applications

[Martin et al., 1999]

Page 11: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Example architectures 3

WITAS

• dialogue manager agent reacts to events send by other agents

• dialogue manager acts as blackboard

• multimodal inputs are coordinated by DM

• based on OAA

[Lemon et al., 2001]

Page 12: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Example architectures 4

MITRE architecture

• dialogue manager as controller

• default processing order

• dialogue manager monitors other components

• dialogue manager is a kind of blackboard

• based on OAA

[Luperfoy et al., 1998]

Page 13: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Example architectures 5

TRIPS

• agents, managers and shared databases

• loosely coupled components

• no dialogue manager

• KQML messages

• facilitator does not contain control logic

[Allen et al., 2001]

Page 14: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Current vs. new application areas

Traditional speech applications Future speech applications

single user multiple users

desktop and telephonyoffice and home environments, mobile

settings

Single deterministic dialogueopen-ended, dynamically constructed

concurrent dialogues

active user (pro)active computer

centralized dialogue management distributed interaction management

Page 15: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Current vs. new application areas

Traditional speech applications Future speech applications

mostly unimodal mostly multimodal

alternative / exclusive (sequential) modalities

concurrent / synergistic (parallel) modalities

speech, text, graphics speech, sensors, haptics, gestures…

monolingual Multilingual

”natural” interaction methods based on human-human interaction

”innovative” interaction methods based on human-computer interaction

Page 16: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Adaptive systems

Need for adaptive applications

• different users: speech-based communication can differ greatly between individual users and situations

– speech is language and culture dependent

– preferences and needs between user groups can be large

• different approaches: people from different backgrounds have different solutions for same problems

• we need interaction methods and architectures that adapt to the different users and situations and support multiple approaches

Page 17: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Future example

U: <calls the systems>

S: welcome to the bus timetable system? How may I help you?

U: I want to go to hospital.

S: Which hospital do you mean? There are three hospitals?

U: The northern one.

S: what is your departure location?

U: railway station.

S: bus number 17 leaves 10:12

U: bye. <hangs up>

time passes, no bus is coming

U: <calls the system>

S: welcome to the taxi service.

How may I help you? …

S: <contact the user> You have appointment with your doctor. You need to hurry to catch the bus. It leaves the central station 10:12.

U: thanks.

time passes, bus strike hits the city

S: <contact the user> I’m really sorry, there is no bus coming. The next train leaves seven minutes from now..

U: no, I take a taxi.

S: Please wait… It is coming, please wait in front of the opera building.

U: thanks.

Page 18: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Topics for research

Topics for speech systems

• adaptivity: how to support adaptive methods? how to make systems to be adaptive?

• reusability: components, interaction methods, …

• distributed systems: communication protocols, resource sharing, ubiquitous applications

• distributed interaction management: centralized dialogue manager is not suitable for many tasks

• shared knowledge: dialogue, user etc.

• development and evaluation tools: WOZ, corpora, …

Page 19: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Jaspis architecture

speech application development framework

• implementation of core architecture with extensions

• designed especially for multilingual and distributed applications

• overall focus on system level adaptivity

• current focus on ubiquitous and multimodal applications

• Java and XML, freely available

• used in several projects and applications

Page 20: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Jaspis architecture overview

Info

rma

tio

n M

an

ag

em

en

t <?xml version="1.0"?><add> <route> <tag>preferences</tag> <tag>recognizer</tag> </route> <content> <port>8200</port> </content></add>

<?xml version="1.0"?> <result> success </result>

SocketInterface(server)

InformationManager

Implementation(Java)

InformationManager

Implementation(Perl)

InformationManager

Implementation(etc.)

Information Storage

<?xml version="1.0"?><state> <internal> </internal> <user> </user> <history> </history> <technical> </technical> <external> </external></state>

DialogueManagement

DialogueAgent

DialogueAgent

DialogueAgent

DialogueAgentDialogue

Agent

evaluatorsc apability ev aluatorc ons is tenc y ev aluator...

Dialogue Manager

DialogueAgent

PresentationManagement

Presentation Manager

evaluators

PresentationAgentPresentation

Agent

PresentationAgent

PresentationAgent

PresentationAgent

c apability ev aluatorlanguage ev aluator...

CommunicationManagement

Server Client Device

Client Device

Co

mm

un

ica

tion

Ma

na

ge

r

1:1 n:m

InputEvaluator

InputAgentInput

AgentInputAgent

Engine

Engine

InputEvaluatorInput

Evaluator

1:1

InteractionManager

NGL NLU

DB UM

Page 21: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Agents, evaluators and managers

• agents handle various interaction situations, such as speech input interpretations, dialogue decisions and speech output presentations

• evaluators measure how well agents can handle current interaction situation

• managers are used to coordinate agents and evaluators, especially to try to choose the best possible agents to handle each interaction situation

Jaspis components

Page 22: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Jaspis interaction management

Interaction ManagerCoordinate Coordinate

Dialogue Model

Dialogue Agents DialogueEvaluators

Evaluate

Dialogue ManagerSelec t Use

Coordinate

Input Model

Input Agents Input Evaluators

Input ManagerUse Use

Coordinate

Presentation Model

PresentationAgents

PresentationEvaluators

Evaluate

Presentation ManagerSelec t Use

Coordinate

Coordinate

Page 23: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

• information storing method is not fixed (XML, DB)

• information access protocol is defined (DTD)

• Information Managers are used to access the Information Storage – these can be implemented in any language and they can use TCP/IP, XML-RPC or method calls

Information management in Jaspis

Info

rmat

ion

Man

agem

ent

<?xml version="1.0"?><add> <route> <tag>preferences</tag> <tag>recognizer</tag> </route> <content> <port>8200</port> </content></add>

<?xml version="1.0"?> <result> success </result>

SocketInterface(server)

InformationManager

Implementation(Java)

InformationManager

Implementation(Perl)

InformationManager

Implementation(etc.)

Information Storage

<?xml version="1.0"?><state> <internal> </internal> <user> </user> <history> </history> <technical> </technical> <external> </external></state>

Page 24: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

• presentation agents convert conceptual messages to speech outputs

• for every output the most suitable agent is selected by presentation evaluators

• multiple presentation management modules for different phases

Presentation management in Jaspis

PresentationManagement

Presentation Manager

evaluators

PresentationAgentPresentation

Agent

PresentationAgent

PresentationAgent

PresentationAgent

capability evaluatorlanguage evaluator...

Page 25: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

• different dialogue agents for different dialogue tasks

• alternative dialogue agents for same dialogue tasks

• dialogue evaluators select dialogue agents

• no single controller (the dialogue manager)

• multiple dialogue management modules

Dialogue management in Jaspis

DialogueManagement

DialogueAgent

DialogueAgent

DialogueAgent

DialogueAgentDialogue

Agent

evaluatorscapability evaluatorcons is tency evaluator...

Dialogue Manager

DialogueAgent

Page 26: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

• i/o-agents and evaluators handle, combine and coordinate different input streams

• devices – clients – servers – engines

• run-time interpretation and multimodal fusion

• separate module for selection of input modalities

Communication (I/O) management in Jaspis

CommunicationManagement

Server Client Device

Client Device

Com

mu

nication

Mana

ger

1:1 n:m

InputEvaluator

InputAgentInput

AgentInputAgent

Engine

Engine

InputEvaluatorInput

Evaluator

1:1

Page 27: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Jaspis extensions

Beyond core infrastructure

• XML-based linguistic information (Annotation Graphs) and log formats (corpus collection, usability tests)

• visualization components (blackboard, interaction)

• speech technology interfaces for common telephony cards, synthesizer and recognizers

• reusable components: error handling, general tasks

• SMS interface, graphical components

• Wizard Of Oz tools

Page 28: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

Jaspis

Future improvements

• concurrent dialogues and multiple users

• event-based interaction management

Page 29: Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

TAUCHI – Tampere Unit for Computer-Human Interaction

http://www.cs.uta.fi/hci/spi/

[email protected]

[email protected]

TampereUnit forComputerHumanInteraction

Department of Computer and Information Sciences