Spatial Language and Grounded Ontologies Amitabha Mukerjee IIT Kanpur, INDIA work done with G....

Spatial Language and Grounded Ontologies

Amitabha MukerjeeIIT Kanpur, INDIA

work done withG. Satish, Mausoom Sarkar, Prithwijit Guha, Achla Raina

Spatial Knowledge Conceptual knowledge (naïve ontology)

Categories (concepts) Relations / Interactions / Commonsense Rules Actions

Perceptual knowledge

Linguistic Knowledge• Semantics = conceptual space?• Concept space = ontology? • Ontology + subjective structuring

Actions : key to semantics• determines argument structure• determines role of other entities • crucial to interaction with contextual elements• modeled as frames in logical semantics

structures for common actions – universal?

organized as hierarchies → Ontology- Need for grounding

Heider/Simmel video

[Singh et al CRV 2006]

Match object under gaze focus with words in narrative

Narrative: the little square

hit the big square[Heider and Simmel 1944]

video recreated by Bridgette Hardat Barbara Tversky lab, Stanford U

Narratives: “Chase” Video

Video and commentaries from Tversky Group, Stanford University

Wide variation in Narratives :

1. Large square corners the little circle

2. Big square approaches little circle

3. Little square is moving away from the big square; and objects inside are moving closer together

4. Big block tries to go after little circle

Constructivist model : Tasks


Match object under gaze focus with words in narrative

Demonstrations Object / relation concepts : unsupervised Action Concepts : unsupervised clusters Linguistic unit Argument structure / Syntax Production

Language WorldPatternClusters

Association Attentive Clustering

Object Label Associations

Maps for trajectories ending inside

Based on intervals where the attended agent is ending “in” the box.

Trajectories ending Outside

Learning Containment Spatial Descriptors

Four Clusters emerge: (instantaneous Feature Space results)

Two-agent actions

MA CHCC

Two agent action ontology

Emergent Clusters

Human Labels (CC, MA, Chase) Ground Truth Label Vs Cluster assigned

CC: Come-Closer (C1), MA: Move Away (C2), C3 & C4 : Chase

Chase sub-categories:

Chase_RO-chases-LO: C3

Chase_LO-chases-RO: C4

Number of Clusters from MNG = 4 when Edge Aging = 30 (0.9 prob)

Hierarchy in Concept Space

More clusters Reveals category hierarchy:

Number of Clusters from MNG = 8 when Edge Aging = 16

Come-Closer

Come-Closer-RO-static Come-Closer-both-movingCome-Closer-LO-static

C1, C5, C6 : sub-classes of Come-Closer; C2, C7, C8 :of Move-Away

Similarly: Move-Away : 3 subclasses

CC-RO-static CC-both-movingCC-LO-static

Two-agent actions

MA

MA-RO-static MA-both-movingMA-LO-static

CH

CH (LO, RO)CH (RO, LO)

CC

Two agent action ontology

Multi-word fragments

Learning Word Order: Chase

Chase Cluster (C3) Pattern : ref. object CHASE located object Prob(“RO chase+tense LO”/C3) = 0.90

Big Square (RO) = ChaserSmall Square (LO) = Chased


Chase Cluster (C4) Pattern : located object CHASE ref. object Prob(“LO chase+tense RO”/C4) = 0.84

Big Square (LO) = Chaser

Circle (RO) = Chased


Chase Verb SVO Syntax

Other Word Patterns involving Chase for C3 and C4 (about 10%) “RO & LO chase each other” “RO is chased by LO”

Convergence -> Universality

Thank You

Objects as coherent wholes

“whole objects” = coherently moving image blobs [Spelke]

identify objects being attended to

correlate with words in input stream

keep higher correlations

Constructivist model : Tasks


A. Object / relation concepts: Learn sensorimotor patterns from perceptual data (with bottom-up attention; unsupervised)

B. Action Concepts: Learn temporal patterns from perceptual data (unsupervised temporal learning)

C. Linguistic unit: Learn associations from sensorimotor pattern clusters to language elements by

A. Listening to descriptions of scene by native speakers (input: word-separated text, not speech)

B. Associate language units (without parsing, without morphological simplification) based on maximum co-occurrence mediated by attention

D. Argument structure / Syntax: Learn argument structure and word orderE. Production: Use same model to produce descriptions of visual scene

Classification: no visual attention

Classification: with visual attention

Attentive focus limits relations to those that are conceptually salient

Object Relation Event Acquisition

9-30 min after birth

- Face vs scrambled face in moving images [Goren et al 75, Kellman/Arterberry 98]

2.5 months- dynamic ACTION TRACKING [Aguiar/Baillargeon98, Spelke 95] - Objects to remain occluded behind occluders [Aguiar/Ballargeon 99]- Containment: needs open top [Hespos/Baillargeon 02]

3-5 months - OBJECT VISUAL CATEGORIES : CAT class excludes dogs, but DOG include cats asymmetry [Mareschal/French/Quinn:00]

5 months - Earliest evidence of SHAPE discrimination [McMurray-Aslin:2004]- tight vs loose CONTAINMENT [Hespos/Spelke:2004]

6 months- dynamic tracking through OCCLUSIONS, motion prediction [Gredebaeck/vonHofsten:2004]- learns background before figure [Coldren&Haaf 1999]- EVENT categorization for occlusion events [Baillargeon/Wang:02]

12 months - EVENT STRUCTURE for actions (e.g. DRINK) [Mareschal/French/Quinn:00]

Object / Event Category Acquisition3-5 months OBJECT VISUAL CATEGORIES : CAT class excludes dogs, but DOG include

cats asymmetry [Mareschal/French/Quinn:00]

5 months SHAPE categories

6 months- EVENT STRUCTURES for occlusion events [Baillargeon/Wang:02]- PROPERTY CATEGORIES: Hard/Soft: squeeze soft objects; bang hard ones

6-12 months PHONOLOGICAL CATEGORIES (phonemes): based on frequencies of usage

12 months- EVENT STRUCTURE for actions (e.g. DRINK) [Mareschal/French/Quinn:00] - EMOTION CATEGORIES (Social Referencing): looks at caregiver - if

positive emotions, will react to it more positively etc.

Spatial Relations Development

2.5 months CONTAINMENT needs open top [Hespos Baillargeon 02]

3 months dot ABOVE/ BELOW bar[Q] not on novel shapes

5 months tight vs loose CONTAINMENT

6 months - CONTAINMENT shape-indep[Q} - CONTAINMENT from diff views [C] - BETWEEN shape-dependent [Q]

9 months shape-indep BETWEEN[Q]

Concepts in Infancy Debate Piaget: pre-conceptual Sensorimotor stage

Perceptual abstractions are procedural Representations change enabling concepts, and pretend

play - leading to language, around 18 months Concepts and language acquired simultaneously

Contra Broad evidence of a number of representations in infancy

(Jones/Smith:1993 Cogn. Dev, Mandler:2004 Dev Science)

Perceptual Theory of Mind

Theory of Mind: awareness of others’ beliefs and intentions [Bloom 2000]

Perceptual Theory of Mind: Attentive focus of speaker is similar to that of listener Avoids looking at others constantly Track the gaze of others when failing to identify referent May be precursor to Theory of Mind

Computational Model: Requires Synthetic Model of Dynamic Visual Attention

Computational Study: ObjectivesQuestions:

Is it possible to learn concepts without any linguistic cues? Can such concepts be mapped to linguistic tokens? Does it ease the computational load?

Does it generalize to parameters different from the inputs?

Can we learn perceptual models for actions? Can we learn relational roles for participants in the action?

The big square chased the circle

[Heider and Simmel 1944] video recreated by Bridgette Hardat Barbara Tversky lab, Stanford U

Spatial Language and Grounded Ontologies Amitabha Mukerjee IIT Kanpur, INDIA work done with G....

Documents

Transcript of Spatial Language and Grounded Ontologies Amitabha Mukerjee IIT Kanpur, INDIA work done with G....