Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for...
-
Upload
stuart-wilkins -
Category
Documents
-
view
215 -
download
2
Transcript of Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for...
![Page 1: Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649ed95503460f94be822d/html5/thumbnails/1.jpg)
Towards multimodal meaning representation
Harry Bunt & Laurent Romary
LREC Workshop on standards for language resources
Las Palmas, May 2002
![Page 2: Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649ed95503460f94be822d/html5/thumbnails/2.jpg)
Scope
• What should we consider as meaning?– How the processing of the input should lead to some
update of the information state of the system (domain model, discourse model, user model, etc.);
– This comprises both propositional content and communicative function.
• Such a representation:– Should support both interpretation and generation;– Should support any kind of multimodal input and output;– Should support the variety of semantic theories;
![Page 3: Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649ed95503460f94be822d/html5/thumbnails/3.jpg)
Objectives
– Provide interface formats within a MM dialogue architecture
• Incremental construction (reference interpretation etc.) up to a final representation (e.g. fixed frame à la MUC) or system action/feedback;
• Should also be a basis for the definition of annotation schemes of MM semantic content.
– Specification and comparison of application-specific representations
• Towards a framework allowing one to compare existing representations (e.g. M3L) or define a new one, while ensuring some level of interoperability between these.
![Page 4: Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649ed95503460f94be822d/html5/thumbnails/4.jpg)
What it is not
• A domain model representation, or an ontology– There is OIL, DAML, Topic Maps, UNL etc.
• A representation of lower-level linguistic or gestural information (e.g. syntax, etc.)– Some features may be percolated, though, or pointed
to…
• A representation of the underlying processes– Focus on the output of what is done by a given module
![Page 5: Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649ed95503460f94be822d/html5/thumbnails/5.jpg)
Basic constraints
• Expressive and semantic adequacy– Coverage of phenomena and inferencing capacities
• Uniformity (representation for various types of inputs and outputs)
• Incrementality (usable at various stages)– Before/after fusion, semantic/pragmatic aspects
• Underspecification and partiality• Openness and Extensibility
– Compatibility with various theories and approaches– Method for designing schemas (XML or others), rather than
one specific schema
![Page 6: Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649ed95503460f94be822d/html5/thumbnails/6.jpg)
Methodology
• Basic components– Represent the general organization of any semantic
structure– Parameterized by
• data categories taken from a common registry• application specific data categories
• General mechanisms– To make the thing work
• General categories– Descriptive categories available to all formats
![Page 7: Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649ed95503460f94be822d/html5/thumbnails/7.jpg)
Basic components (1)
• Temporal structures (“events”)– Dialogue turns/utterance
– Gestures
– Actions on/in the task
• Referential structures (“participants”)– Individuals and objects participating in an event
• Comprises spatial structures
– Propositional content
![Page 8: Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649ed95503460f94be822d/html5/thumbnails/8.jpg)
Basic components (2)
• Restrictions (on temporal and referential structures)– E.g. Gesture types, Linguistic modifiers,
Dialogue acts, etc.
• Dependency structures (linking events and referential structures)– E.g. Participant roles (cf. AGENT-SOURCE-
GOAL), Discourse/rhetorical structure, temporal relations
![Page 9: Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649ed95503460f94be822d/html5/thumbnails/9.jpg)
General mechanisms
• Links– Internal links– To lower levels (syntactic structures, prosodic
cues, gestural trajectories, etc.)– To domain model (types and instances)
• Alternatives (cf. ambiguities)– E.g. disjunction of internal links
![Page 10: Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649ed95503460f94be822d/html5/thumbnails/10.jpg)
General categories
• Architectural– Producer (consumer?) of the information, confidence,
devices
• Environmental– Time stamps, spatial information (speaker’s position,
graphical configurations, gestural trajectories etc.)
• Interactional– Speaker (user state?), other addressees etc.
![Page 11: Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649ed95503460f94be822d/html5/thumbnails/11.jpg)
![Page 12: Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649ed95503460f94be822d/html5/thumbnails/12.jpg)
Combining basic components and data categories
Just to illustrate things…
![Page 13: Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649ed95503460f94be822d/html5/thumbnails/13.jpg)
Example
<semRep id=”rep1”><event id=“e0”>
<cat>utterance</cat><speaker target=“Peter”/><adressee target=“System”/>
</event><event id=“e1”>
<tense>present</tense><evtType>wanttogo</evtType>…
</event><participant id=“x”>
<num>sing</num></participant><relation source=“x” target=“e1”>
<role>agent</role></relation>
</semRep>
In black: basic components and mechanisms(meta-model of semantic representation)
In blue: parameter component chosen from reference registries• Categories• Values
Pointer to speaker’scharacteristics
Peter: I want to go …
![Page 14: Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649ed95503460f94be822d/html5/thumbnails/14.jpg)
<semRep id=”rep1”> <event id=“e0”>
<evtCat>utterance</evtCat> <speaker target=“Peter”/> <adressee target=“System”/> <alt>
<dialAct cert=“0.8”>Order</dialAct><dialAct cert=“0.3”>Inform</dialAct>
</alt></event><event id=“e1”>
<tense>present</tense><voice>active</voice><wh>none</wh><evtType>wanttogo</evtType>…
</event> <participant id=“x”> <lex>I</lex> <synCat>Pronoun</synCat> <num>sing</num> <pers>first</num>
… </participant>
<participant id=“y”>
<lex>Paris</lex>
<synCat>ProperNoun</synCat>
<pers>third</num>
…
</participant>
<participant id=“z”>
<lex>Nancy</lex>
<synCat>ProperNoun</synCat>
<pers>third</num>
…
</participant>
<relation source=“x” target=“e1”>
<role>agent</role>
</relation>
<relation source=“y” target=“e1”>
<role>source</role>
</relation>
<relation source=“z” target=“e1”>
<role>goal</role>
</relation>
</semRep>
I want to go from Paris to Nancy
![Page 15: Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649ed95503460f94be822d/html5/thumbnails/15.jpg)
<semRep id=”rep1”> <event id=“e0”>
<evtCat>utterance</evtCat> <agent target=“Peter”/> <adressee target=“System”/> <dialAct>Order</dialAct></event>
<event id=“e1”> <tense>present</tense> <voice>active</voice> <wh>none</wh> <evtType>wanttogo</evtType> …</event>
<event id=“e2”> <evtCat>gestural</evtCat> <agent target=“Peter”/> <when>2002-02-2:02.02.02</when> <gestType>designation</gestType> <graphContext target=“ctxt23“></event>
<participant id=“x”> <lex>I</lex>
…</participant>
<participant id=“y”>
<lex>here</lex>
<synCat>adverb</synCat>
</participant>
<participant id=“z”>
<lex>there</lex>
<synCat>adverb</synCat>
</participant>
<relation source=“y” target=“e2”>
<MMLink>co-designation</MMLink>
</relation>
</semRep>
I want to go from here to there
![Page 16: Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649ed95503460f94be822d/html5/thumbnails/16.jpg)
Future work
• SIGSEM Working group on meaning representations (ACL)– Liaison with ISO TC37/SC4 - linguistic
resources• Preparation of a working draft
– Liaison with Isle– Liaison with SIGMedia and SIGDial– W3C/VoiceXML