1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et...
-
Upload
andra-hawkins -
Category
Documents
-
view
212 -
download
0
Transcript of 1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et...
1
Galatea: Open-Source Software for Galatea: Open-Source Software for Developing Anthropomorphic Spoken Developing Anthropomorphic Spoken
Dialog AgentsDialog Agents
S. Kawamoto, et al.
October 27, 2004
2
AgendaAgenda
• Introduction
• Toolkit Design and Outline– Speech recognition module– Speech synthesis module– Facial image synthesis module– Agent manager– Virtual machine model– Task manager– Prototyping tools
• Prototype Systems
• Conclusions
3
IntroductionIntroduction• An anthropomorphic spoken dialog agent (ASDA) is one of
the next-generation human-computer interfaces
• Many ASDA systems have been developed, but developing a high-quality ASDA system is still challenging
An unlimited number of life-like agent characters having different faces and voices just like human
• For this reason, Galatea has been developed to provide a platform to build next-generation ASDA systems
4
Features of the ToolkitFeatures of the Toolkit• Easy customization
– Model-based approachesOnce the model parameters are trained, facial expressions
and voice quality can be controlled easily
• Key techniques for natural spoken dialog Incremental speech recognition, synchronization between
speech and facial animation, etc
• Modularity of functional units– Simple architecture to manage each functional unit
User can develop, improve, debug, etc
• Open-source free software
Introduction
5
Toolkit Design and OutlineToolkit Design and Outline
Works as an inter-modulecommunication manager
Directly managed by the modules which utilize the devices
Adding a new module for the function and connecting the module to the agent manager
6
Speech Recognition Module (SRM)Speech Recognition Module (SRM)• Major interfaces of SRM are
as follows:– Outputs
Recognition result (XML format)
Engine status(“busy”, “waiting”, ... )
– Control commandReload grammar, change
the settings of thespeech recognition engine
– Grammar representationTransforms the XML grammar into a format that is accepted
by the speech recognition engine
Toolkit Design and Outline
Command InterpreterCommand Interpreter
Grammar TransformerGrammar Transformer
Speech Recognition EngineSpeech Recognition Engine
Speech input
Grammar
Request
Response
7
Speech Synthesis Module (SSM)Speech Synthesis Module (SSM)• Accept arbitrary Japanese
texts
• Synthesize speech with a human voice– HMM-based speech
synthesis method isemployed
• Synchronizing the lip movement with speech
• SSM can interrupt speech output to cope with any interruption by the user
Toolkit Design and Outline
Command Interpreter
Dictionary
AcousticModels
SpeechOutput
Text Analyzer
WaveformGeneration
Engine
8
Facial Image Synthesis Module (FSM)Facial Image Synthesis Module (FSM)• Supports high-quality facial
image synthesis, animation control, precise lip-sync with voice
• GUI is equipped to fit a generic face wire frame model onto a full-face snapshot image
• Facial action control– Mouth shape– Facial expression
Toolkit Design and Outline
9
Agent Manager (AM)Agent Manager (AM)• Integrator of all the modules of the ASDA system
• Play a central role of communication
• Synchronization manager between SSM and FSM to achieve the precise lip-sync
Toolkit Design and Outline
Dispatcher
Macro-command interpreter
10
Virtual Machine ModelVirtual Machine Model
• Module interface is modeled as a machine with slots– Each slot is indicates machine status
• Changing the slot values by a common command set “set Speak = now” means starting voice synthesis of a given
text immediately
Toolkit Design and Outline
11
Task Manager (TM)Task Manager (TM)• Define the dialog as a set of interactions which can be
represented by a dialog description language
• Goal in developing the TM is that the system can use several types of dialog description languages– VoiceXML
High-level language, task-oriented information and the intentions of the participants
– PDOC (primitive dialog operation commands)Low-level language, device events and sequence control
Toolkit Design and Outline
12
Prototyping ToolsPrototyping Tools• “Galatea Interaction Builder (IB)”
Toolkit Design and Outline
ApplicationDeveloper
Interaction Builder
Galatea MMI System
XISL File
web site
Create XISL
Document
Download and
Execute XISL
Check
DesignScenario
13
Prototype SystemsPrototype Systems
14
Echo-back taskEcho-back task
Prototype Systems
15
ConclusionsConclusions• A human-like spoken dialog agent is one of the promising
man-machine interfaces for the next generation
• Galatea is a software toolkit to develop a human-like spoken dialog agent
• Because of the high modularity and simple communication architecture, it will speed up the research and application development based on ASDA