CONFUCIUS: an Intelligent MultiMedia storytelling interpretation & presentation system
description
Transcript of CONFUCIUS: an Intelligent MultiMedia storytelling interpretation & presentation system
CONFUCIUS: an Intelligent MultiMedia storytelling interpretation & presentation system
Minhua Eunice Ma
Supervisor: Prof. Paul Mc Kevitt
School of Computing and Intelligent Systems
Faculty of Informatics
University of Ulster, Magee
Objectives of CONFUCIUS
To interpret natural language story and movie (drama) script input and to extract conceptual semantics from the natural language
To generate 3D animation and virtual worlds automatically from natural language
To integrate 3D animation with speech and non-speech audio, to form an intelligent multimedia storytelling system for presenting multimodal stories
CONFUCIUS’ context diagram
Story in natural language
CONFUCIUSMovie/drama script 3D animation
non-speech audioTailored menu for script input
Speech (dialogue)Storywrit
er /playwrig
ht
User/story listene
r
Schank’s CD Theory (1972) Primitive & scripts SAM & PAM
Automatic Text-to-Graphics Systems WordsEye (Coyne & Sproat, 2001) ‘Micons’ and CD-based language animation
(Narayanan et al. 1995) Spoken Image (Ó Nualláin & Smith, 1994)
& its successor SONAS (Kelleher et al. 2000)
Previous systems
MultiModal interactive storytelling AesopWorld KidsRoom Larsen & Petersen’s Interactive Storytelling Oz Computer games
Virtual humans & embodied agents BEAT (Cassell et al., 2000) Jack (University of Pennsylvania) Improv (Perlin and Goldberg, 1996) SimHuman Gandalf PPP persona
Architecture of CONFUCIUS
3D authoring tools, existing 3D models &
character models
visual knowledge (3D graphic library)
Prefabricated objects(knowledge base)
Script writer
Script parser
Natural Language Processing
Text To Speech
Sound effects
Animation generation
Synchronizing & fusion
3D world with audio in VRML
Natural language stories
Language knowledge
mapping
lexicongrammaretc
semantic representations
visual knowledge
Semantic representations
Categories Knowledge representations Decomposition Typical applications
rule-based representation expert systems
FOPC (First Order Predicate Calculus)
sentence representation, expert systems
semantic networks
lexical semantics
Schank’s scripts
story understanding
frame-based representations
(1) general knowledge representation & reasoning
XML-based representations
multimodal semantics
Conceptual Dependency (CD)
event-logic truth conditions
x-schema and f-structure
Jackendoff’s Lexical-Conceptual Semantics (LCS)
(2) physical knowledge representation & reasoning (inc. spatial /temporal reasoning)
decomposite predicate-argument representation
dynamic vision (movement) recognition & generation
MultiModal semantic representation
Multimodal semantics
Language modality
Visual modality
Non-speech audio modality
Media-independent representation
Visual media-dependent representationIntermediate level
High-level multimodal semantic representation:XML/frame-based
Audio media-dependent representation
Mental imagery & meaning processing
Cognition Re-cognition
Communication
Simulation:presentation via language or other modalities
Simulation:Image recognition
Simulation:Language understanding
Meanings, communicable ideas, thoughts, manifestable messages, proverbs, examples, parables, etc.
Physical world Virtual world
Mental world Mental world
knowledge base
Language knowledge
Visual knowledge
World knowledge
Spatial & qualitative reasoning knowledge
Semantic knowledge - lexicons (eg. WordNet)Syntactic knowledge - grammarsStatistical models of languageAssociations between words
Object model (nouns)
Functional informationInternal coordinate axes (for spatial reasoning)Associations between objects
Knowledge base of CONFUCIUS
Event model (event verbs, describes the motion of objects)
Graphic library
Simple geometry filesgeometry & joint hierarchy
files
animation library(key frames)
objects/props characters
motions
instantiation
script
story
Script parser
Natural language processor
Script writer
Animation generator
TTS
Sound effect driver
Media coordination
Synthesized animation
Primitives library
Music library
script
dialogues
Non-speech audio
Data Flow Diagram
Visual semantics
Scene&Actor descriptions
VRML without sound nodes
Animation generator
verbsemantic analysis use lexical relations (WordNet)
to replace synonyms, scripts application, etc.
match basic motionsin library?
motiondecomposition
animation controller
environmentplacement
N
Y
LCS representation
VRML format of the virtual story worldexamples demo
motioninstantiation
Categories of eventsAtomic entities
Change physical location such as position and orientation, e.g. “bounce”, “turn”Change intrinsic attributes such as shape, size, color, and texture, e.g. “bend”, and even visibility, e.g. “disappear”, “fade” (in/out)
Non-atomic entitiesNon-character events
Two or more individual objects fuse together, e.g. “melt” (in)One object divides into two or more individual parts, e.g. “break” (into pieces)Change sub-components (their position, size, color), e.g. “blossom”Environment events (weather verbs), e.g. “snow”, “rain”
Character eventsAction verbs
Intransitive verbsTransitive verbs
Non-action verbs (stative, emotion, possession, mental activities, cognition & perception)Idioms & metaphor verbs
Categories of action verbs
Intransitive verbs Biped kinematics, e.g. “walk”, “swim”, & other motion models
like “fly” Face expressions, e.g. “laugh”, “anger” Lip movement, e.g. “speak”, “say”
Transitive verbs single object, e.g. “throw”, “push”, “kick” multiple objects
direct and indirect objects, e.g. “give”, “pass”, “show” indirect object & the instrument, e.g. “cut”, “hammer”
involve speech modality
Visual definition & word sense
verb word sense visual definition entrymapping
word sense -- minimal complete unit of meaning in the language modality
visual definition entry -- minimal complete unit of meaning in the visual modality
polysemy
synonymy
Example: “close” (a door)
1. a normal door (rotation on y axis)
2. a sliding door (moving on x axis)3. a rolling shutter door (a
combination of rotation on x axis and moving on y axis)
one manymany many
Troponyms & verbs derived from adjectives/nouns
troponym elaborates the manners of a base verb (Fellbaum 1998) examples: “trot”-“walk” (fast), “gulp”-“eat” (quickly) base verb + adverb
present the base verb + modify the manner (speed, the agent’s state, duration of the activity, iteration, etc.)
Verbs derived from adjectives or nouns change objects’ properties (size, color, shape) or the world
state verbs with affixes such as –en, -ify, or –ize, e.g. “lengthen” using predicates scale(), squash() or changing the
corresponding property fields of the object in VRML
Representing active & passive voice
active and passive voice converse verb pairs such as “give/take”,
“buy/sell”, “lend/borrow” same activity from different point of view use of VRML Viewpoint node
Implementation: semanticsVRML
bounce(ball):- [moveTo(ball, [0,0,0]), moveTo(ball,[0,20,0])]L.(a) visual definition of “bounce”
Example: “A ball is bouncing”
DEF ball Transform { translation 0 0 0 children [ Shape { appearance Appearance{ material Material{} } geometry Sphere {
radius 5 } } ]}(b) VRML code of a static ball
DEF ball Transform { translation 0 0 0 children [ DEF ball-TIMER TimeSensor {
loop TRUEcycleInterval 0.5 },
DEF ball-POS-INTERP PositionInterpolator { key [0, 0.5, 1 ] keyValue [0 0 0, 0 20 0, 0 0 0 ] }, Shape { appearance Appearance { material Material {} } geometry Sphere { radius 5 } }]ROUTE ball-TIMER.fraction_changed TO ball-POS-INTERP.set_fractionROUTE ball-POS-INTERP.value_changed TO ball.set_translation}(c) Output VRML code of a bouncing ball
Categories of adjectives
Visually observable
Visually unobservable
Objects’ attributes/states: dark/light, large/small, big/little, white/black (color adj.), long/short, new/old, high/low, full/empty, open/closed
Observablehuman attributes
Relational adj.: nasal (nose), mural (wall), dental (teeth)
Perceivable by other modalities: wet/dry, warm/cold, coarse/smooth, hard/soft, heavy/light
Abstract attributes
Reference-modifying adj.: possible/impossible, former, past/present, last, other, different/same
Feelings: happy/sad, angry, excited, surprised, terrified
Others: old/young, beautiful/ugly, strong/weak, poor/rich, fat/thin
Unobservable human attributes (virtue): good/evil, kind, mean, ambitious
Others: easy/difficult, real, important, particular, right/wrong, early/late
Software Analysis
Java programming language parsing intermediate representation changing VRML code to create/modify animation integrating modules
Natural language processing tools Gate (pre-processing) PC-PARSE (morphologic and syntax analysis) WordNet (lexicon, semantic inference)
3D graphic modelling existing 3D models on the Internet 3D Studio Max (props & stage) VRML (Virtual Reality Modelling Language) 97, H-anim 2001 spec.
The Actors – using embodied agents Microsoft Agent (the narrator and minor actors) Character Studio, Internet Character Animator (protagonists)
Natural Language Processing
Semantic inference
Coreference resolution
Part-of-speech tagger
Syntactic parser morphological parser
Temporal reasoning
Pre-processing
PC-PARSER
WordNet 1.6
LEXICON &MORPHOLOGICAL RULES
FEATURES
Contribution & prospective applications
Children’s education Multimedia presentation Movie/drama production Script writing Computer games Virtual Reality
multimodal semantic representation of natural language automatic animation generation multimodal fusion and coordination
Conclusion
The objectives of CONFUCIUS meet the challenging
problems in language visualisation:
formalizes meaning of action verbs and states
mapping language primitives with visual primitives
a reusable ‘common sense’ knowledge base for other systems
sophisticated spatial and temporal reasoning
representing stories by temporal multimedia requires
significant coordination