Spatial Language and Grounded Ontologies Amitabha Mukerjee IIT Kanpur, INDIA work done with G....
-
date post
21-Dec-2015 -
Category
Documents
-
view
212 -
download
0
Transcript of Spatial Language and Grounded Ontologies Amitabha Mukerjee IIT Kanpur, INDIA work done with G....
Spatial Language and Grounded Ontologies
Amitabha MukerjeeIIT Kanpur, INDIA
work done withG. Satish, Mausoom Sarkar, Prithwijit Guha, Achla Raina
Spatial Knowledge Conceptual knowledge (naïve ontology)
Categories (concepts) Relations / Interactions / Commonsense Rules Actions
Perceptual knowledge
Linguistic Knowledge• Semantics = conceptual space?• Concept space = ontology? • Ontology + subjective structuring
Actions : key to semantics• determines argument structure• determines role of other entities • crucial to interaction with contextual elements• modeled as frames in logical semantics
structures for common actions – universal?
organized as hierarchies → Ontology- Need for grounding
Heider/Simmel video
[Singh et al CRV 2006]
Match object under gaze focus with words in narrative
Narrative: the little square
hit the big square[Heider and Simmel 1944]
video recreated by Bridgette Hardat Barbara Tversky lab, Stanford U
Narratives: “Chase” Video
Video and commentaries from Tversky Group, Stanford University
Wide variation in Narratives :
1. Large square corners the little circle
2. Big square approaches little circle
3. Little square is moving away from the big square; and objects inside are moving closer together
4. Big block tries to go after little circle
Constructivist model : Tasks
[Singh et al CRV 2006]
Match object under gaze focus with words in narrative
Demonstrations Object / relation concepts : unsupervised Action Concepts : unsupervised clusters Linguistic unit Argument structure / Syntax Production
Language WorldPatternClusters
Association Attentive Clustering
Object Label Associations
Maps for trajectories ending inside
Based on intervals where the attended agent is ending “in” the box.
Trajectories ending Outside
Learning Containment Spatial Descriptors
Four Clusters emerge: (instantaneous Feature Space results)
Two-agent actions
MA CHCC
Two agent action ontology
Emergent Clusters
Human Labels (CC, MA, Chase) Ground Truth Label Vs Cluster assigned
CC: Come-Closer (C1), MA: Move Away (C2), C3 & C4 : Chase
Chase sub-categories:
Chase_RO-chases-LO: C3
Chase_LO-chases-RO: C4
Number of Clusters from MNG = 4 when Edge Aging = 30 (0.9 prob)
Hierarchy in Concept Space
More clusters Reveals category hierarchy:
Number of Clusters from MNG = 8 when Edge Aging = 16
Come-Closer
Come-Closer-RO-static Come-Closer-both-movingCome-Closer-LO-static
C1, C5, C6 : sub-classes of Come-Closer; C2, C7, C8 :of Move-Away
Similarly: Move-Away : 3 subclasses
CC-RO-static CC-both-movingCC-LO-static
Two-agent actions
MA
MA-RO-static MA-both-movingMA-LO-static
CH
CH (LO, RO)CH (RO, LO)
CC
Two agent action ontology
Multi-word fragments
Learning Word Order: Chase
Chase Cluster (C3) Pattern : ref. object CHASE located object Prob(“RO chase+tense LO”/C3) = 0.90
Big Square (RO) = ChaserSmall Square (LO) = Chased
Learning Word Order: Chase
Chase Cluster (C4) Pattern : located object CHASE ref. object Prob(“LO chase+tense RO”/C4) = 0.84
Big Square (LO) = Chaser
Circle (RO) = Chased
Learning Word Order: Chase
Chase Verb SVO Syntax
Other Word Patterns involving Chase for C3 and C4 (about 10%) “RO & LO chase each other” “RO is chased by LO”
Convergence -> Universality
Thank You
Objects as coherent wholes
“whole objects” = coherently moving image blobs [Spelke]
identify objects being attended to
correlate with words in input stream
keep higher correlations
Constructivist model : Tasks
[Singh et al CRV 2006]
A. Object / relation concepts: Learn sensorimotor patterns from perceptual data (with bottom-up attention; unsupervised)
B. Action Concepts: Learn temporal patterns from perceptual data (unsupervised temporal learning)
C. Linguistic unit: Learn associations from sensorimotor pattern clusters to language elements by
A. Listening to descriptions of scene by native speakers (input: word-separated text, not speech)
B. Associate language units (without parsing, without morphological simplification) based on maximum co-occurrence mediated by attention
D. Argument structure / Syntax: Learn argument structure and word orderE. Production: Use same model to produce descriptions of visual scene
Classification: no visual attention
Classification: with visual attention
Attentive focus limits relations to those that are conceptually salient
Object Relation Event Acquisition
9-30 min after birth
- Face vs scrambled face in moving images [Goren et al 75, Kellman/Arterberry 98]
2.5 months- dynamic ACTION TRACKING [Aguiar/Baillargeon98, Spelke 95] - Objects to remain occluded behind occluders [Aguiar/Ballargeon 99]- Containment: needs open top [Hespos/Baillargeon 02]
3-5 months - OBJECT VISUAL CATEGORIES : CAT class excludes dogs, but DOG include cats asymmetry [Mareschal/French/Quinn:00]
5 months - Earliest evidence of SHAPE discrimination [McMurray-Aslin:2004]- tight vs loose CONTAINMENT [Hespos/Spelke:2004]
6 months- dynamic tracking through OCCLUSIONS, motion prediction [Gredebaeck/vonHofsten:2004]- learns background before figure [Coldren&Haaf 1999]- EVENT categorization for occlusion events [Baillargeon/Wang:02]
12 months - EVENT STRUCTURE for actions (e.g. DRINK) [Mareschal/French/Quinn:00]
Object / Event Category Acquisition3-5 months OBJECT VISUAL CATEGORIES : CAT class excludes dogs, but DOG include
cats asymmetry [Mareschal/French/Quinn:00]
5 months SHAPE categories
6 months- EVENT STRUCTURES for occlusion events [Baillargeon/Wang:02]- PROPERTY CATEGORIES: Hard/Soft: squeeze soft objects; bang hard ones
6-12 months PHONOLOGICAL CATEGORIES (phonemes): based on frequencies of usage
12 months- EVENT STRUCTURE for actions (e.g. DRINK) [Mareschal/French/Quinn:00] - EMOTION CATEGORIES (Social Referencing): looks at caregiver - if
positive emotions, will react to it more positively etc.
Spatial Relations Development
2.5 months CONTAINMENT needs open top [Hespos Baillargeon 02]
3 months dot ABOVE/ BELOW bar[Q] not on novel shapes
5 months tight vs loose CONTAINMENT
6 months - CONTAINMENT shape-indep[Q} - CONTAINMENT from diff views [C] - BETWEEN shape-dependent [Q]
9 months shape-indep BETWEEN[Q]
Concepts in Infancy Debate Piaget: pre-conceptual Sensorimotor stage
Perceptual abstractions are procedural Representations change enabling concepts, and pretend
play - leading to language, around 18 months Concepts and language acquired simultaneously
Contra Broad evidence of a number of representations in infancy
(Jones/Smith:1993 Cogn. Dev, Mandler:2004 Dev Science)
Perceptual Theory of Mind
Theory of Mind: awareness of others’ beliefs and intentions [Bloom 2000]
Perceptual Theory of Mind: Attentive focus of speaker is similar to that of listener Avoids looking at others constantly Track the gaze of others when failing to identify referent May be precursor to Theory of Mind
Computational Model: Requires Synthetic Model of Dynamic Visual Attention
Computational Study: ObjectivesQuestions:
Is it possible to learn concepts without any linguistic cues? Can such concepts be mapped to linguistic tokens? Does it ease the computational load?
Does it generalize to parameters different from the inputs?
Can we learn perceptual models for actions? Can we learn relational roles for participants in the action?
The big square chased the circle
[Heider and Simmel 1944] video recreated by Bridgette Hardat Barbara Tversky lab, Stanford U