and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure...

105
Perception and Perspective in Robotics Perception and Perspective in Robotics Paul Fitzpatrick Paul Fitzpatrick MIT CSAIL MIT CSAIL USA USA

Transcript of and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure...

Page 1: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

Perceptionand

Perspectivein

Robotics

Perceptionand

Perspectivein

Robotics

–– Paul Fitzpatrick Paul Fitzpatrick ––

MIT CSAILMIT CSAILUSAUSA

Page 2: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

experimentation helps perception

Rachel: We have got to find out if [ugly naked guy]'s alive.

Monica: How are we going to do that? There's no way.

Joey: Well there is one way. His window's open – I say, we poke him.

(brandishes the Giant Poking Device)

Page 3: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

robots can experiment

Robot: We have got to find out where this object’s boundary is.

Camera: How are we going to do that? There's no way.

Robot: Well there is one way. Looks reachable – I say, let’s poke it.

(brandishes the Giant Poking Limb)

Page 4: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

the root of all vision

pokingobject segmentation

edge catalog

object detection(recognition, localization,

contact-free segmentation)

affordance exploitation(rolling)

manipulator detection(robot, human)

Page 5: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

theoretical goal: a virtuous circle

familiar activities

familiar entities (objects, actors, properties, …)

use constraint of familiar activity to

discover unfamiliar entity used within it

reveal the structure of unfamiliar activities by

tracking familiar entities into and through them

Page 6: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

practical goal: adaptive robots

Motivated by fallibility§ Complex action and perception will fail§ Need simpler fall-back methods that resolve

ambiguity, learn from errors

Motivated by transience§ Task for robot may change from day to day§ Ambient conditions change§ Best to build in adaptivity from very beginning

Motivated by infants§ Perceptual development outpaces motor § Able to explore despite sloppy control

Page 7: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

giant poking device: Cog

Head(7 DOFs)

Torso(3 DOFs)

Left arm(6 DOFs)

Right arm(6 DOFs)

Stand(0 DOFs)

Page 8: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

giant poking device: Cog

Page 9: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

talk overview

Learning from an activity§ Poking: to learn to recognize objects,

manipulators, etc.§ Chatting: to learn the names of objects

Learning a new activity§ Searching for an object§ Then back to learning from the activity…

Page 10: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

virtuous circle

poking

objects, …

Page 11: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

virtuous circle

poking, chatting

objects, words, names, …

ball!

Page 12: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

virtuous circle

poking, chatting, search

objects, words, names, …

searchsearch

Page 13: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

virtuous circle

poking, chatting, search

objects, words, names, …

searchsearch

Page 14: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

talk overview

Learning from an activity§ Poking: to learn to recognize objects,

manipulators, etc.§ Chatting: to learn the names of objects

Learning a new activity§ Searching for an object§ Then back to learning from the activity…

Page 15: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

talk overview

Learning from an activity§ Poking: to learn to recognize objects,

manipulators, etc.§ Chatting: to learn the names of objects

Learning a new activity§ Searching for an object§ Then back to learning from the activity…

Page 16: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

pokingobject segmentation

edge catalog

object detection(recognition, localization,

contact-free segmentation)

affordance exploitation(rolling)

manipulator detection(robot, human)

Page 17: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

pokingobject segmentation

Page 18: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

“Active Segmentation”

segmenting objectsthrough action

Page 19: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

“Active Segmentation”

segmenting objectsby coming into contact with them

Page 20: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

a simple scene?

Edges of table and cube overlap Color of cube and

table are poorly separatedCube has

misleading surface pattern

Maybe some cruel grad-student

faked the cube with paper,or glued it

to the table

Page 21: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

active segmentation

Page 22: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

active segmentation

Page 23: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

Sandini et al, 1993

Page 24: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

where to poke?

Visual attention system§ Robot selects a region to fixate based on salience

(bright colors, motion, etc.)§ Region won’t generally correspond to extent of

object

Poking activation§ Region is stationary§ Region reachable (right distance, not too high up)§ Distance measured through binocular disparity

Page 25: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

visual attention system

attracted toskin color

attracted tobright color,movement

smoothpursuit

attracted toskin color

smoothpursuit

person approaches

shakesobject

movesobject

hidesobject

stands up

(Collaboration with Brian Scassellati, Giorgio Metta, Cynthia Breazeal)

Page 26: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

tracking

Page 27: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

poking activation

Page 28: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

evidence for segmentation

Areas where motion is observed upon contact§ classify as ‘foreground’

Areas where motion is observed immediately before contact§ classify as ‘background’

Textured areas where no motion was observed§ classify as ‘background’

Textureless areas where no motion was observed§ no information

Page 29: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

minimum cut

“allegiance” = cost of assigning two nodes to different layers (foreground versus background)

foregroundnode

backgroundnode

pixel nodes

allegiance to foreground

allegiance to background

pixel-to-pixelallegiance

Page 30: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

minimum cut

“allegiance” = cost of assigning two nodes to different layers (foreground versus background)

foregroundnode

backgroundnode

pixel nodes

allegiance to foreground

allegiance to background

pixel-to-pixelallegiance

Page 31: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

grouping (on synthetic data)

“figure” points

(known motion)

proposed segmentation

“ground”points

(stationary,or gripper)

Page 32: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

point of contact

1 2 3 4 5

9876 10

Motion spreads suddenly, faster than the arm itself à contact

Motion spreads continuously (arm or its shadow)

Page 33: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

point of contact

Page 34: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

segmentation examples

Impact event Motion caused(red = novel,Purple/blue = discounted)

Segmentation(green/yellow)

Sidetap

Backslap

Page 35: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

segmentation examples

Page 36: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

segmentation examples

table

car

Page 37: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

boundary fidelity

0 0.05 0.1 0.15 0.2 0.25 0.30

0.04

0.08

0.12

0.16

0.2

measure of size (area at poking distance)

mea

sure

of a

nisi

otro

py(s

quar

e ro

ot o

f) s

econ

dH

um

omen

tcube

car

bottle

ball

Page 38: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

signal to noise

Page 39: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

pokingobject segmentation

edge catalog

Page 40: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

“Appearance Catalog”

exhaustively characterizingthe appearance of a low-level feature

Page 41: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

sampling oriented regions

Page 42: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

sample samples

Page 43: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

most frequent samples

1st

21st

41st

61st

81st

101st

121st

141st

161st

181st

201st

221st

Page 44: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

selected samples

Page 45: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

some tests

Red = horizontal Green = vertical

Page 46: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

natural images

0000 909000 454500 −−454500±±22.522.500 ±±22.522.500 ±±22.522.500 ±±22.522.500

Page 47: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

pokingobject segmentation

edge catalog

object detection(recognition, localization,

contact-free segmentation)

Page 48: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

“Open Object Recognition”

detecting and recognizing familiar objects,

enrolling unfamiliar objects

Page 49: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

object recognition

Geometry-based§ Objects and images modeled as set of

point/surface/volume elements§ Example real-time method: store geometric relationships in

hash table

Appearance-based§ Objects and images modeled as set of features closer to

raw image§ Example real-time method: use histograms of simple

features (e.g. color)

Page 50: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

geometry+appearance

Advantages: more selective; fastDisadvantages: edges can be occluded; 2D methodProperty: no need for offline training

angles + colors + relative sizes

invariant to scale, translation,In-plane rotation

Page 51: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

details of features

Distinguishing elements:§ Angle between regions (edges)§ Position of regions relative to their projected intersection point (normalized

for scale, orientation)§ Color at three sample points along line between region centroids

Output of feature match:§ Predicts approximate center and scale of object if match exists

Weighting for combining features:§ Summed at each possible position of center; consistency check for scale§ Weighted by frequency of occurrence of feature in object examples, and

edge length

Page 52: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

localization example

look for this… …in this

Page 53: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

localization example

Page 54: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

localization example

Page 55: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

localization examplejust using geometry

geometry + appearance

Page 56: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

other examples

Page 57: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

other examples

Page 58: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

other examples

Page 59: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

just for fun

look for this… …in thisresult

Page 60: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

real object in real images

Page 61: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

yellow on yellow

Page 62: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

multiple objects

camera image

response foreach object

implicated edgesfound and grouped

Page 63: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

extending the attention system

trackertrackermotor control

(eyes, head, neck)motor control

(eyes, head, neck)

egocentricmap

egocentricmap

low-levelsalience filters

low-levelsalience filters

object recognition/localization (wide)object recognition/localization (wide)

object recognition/localization (foveal)object recognition/localization (foveal)

motor control(arm)

motor control(arm)

pokingsequencerpoking

sequencer

Page 64: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

attention

Page 65: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

open object recognition

sees ball,“thinks” it is cube

correctly differentiatesball and cube

robot’scurrentview

recognizedobject (as seenduring poking) pokes,

segmentsball

Page 66: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

open object recognition

Page 67: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

pokingobject segmentation

edge catalog

object detection(recognition, localization,

contact-free segmentation)

manipulator detection(robot, human)

Page 68: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

finding manipulators

Analogous to finding objects

Object§ Definition: physically coherent structure§ How to find one: poke around and see what moves together

Actor§ Definition: something that acts on objects§ How to find one: see what pokes objects

Page 69: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

similar human and robot actions

Object connects robot and human action

Page 70: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

catching manipulators in the act

contact!manipulator approaches object

Page 71: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

modeling manipulators

Page 72: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

manipulator recognition

Page 73: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

pokingobject segmentation

edge catalog

object detection(recognition, localization,

contact-free segmentation)

affordance exploitation(rolling)

manipulator detection(robot, human)

Page 74: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

“Affordance Recognition”

switching fromobject-centric perception

to recognizing action opportunities

(collaboration with Giorgio Metta)

Page 75: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

what is an affordance?

A leaf affordsrest/walking to an ant …

… but not to an elephant

Page 76: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

exploring affordances

Page 77: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

objects roll in different ways

a toy carit rolls forward

a bottleit rolls along its side

a toy cubeit doesn’t roll easily

a ballit rolls in

any direction

Page 78: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

preferred direction of motion

0 10 20 30 40 50 60 70 80 900

0.1

0.2

0.3

0.4

0.5

0 10 20 30 40 50 60 70 80 900

0.1

0.2

0.3

0.4

0.5

0 10 20 30 40 50 60 70 80 900

0.1

0.2

0.3

0.4

0.5

0 10 20 30 40 50 60 70 80 900

0.1

0.2

0.3

0.4

0.5

Bottle, pointedness=0.13 Car, pointedness=0.07

Ball, pointedness=0.02Cube, pointedness=0.03freq

uenc

y of

occ

urre

nce

difference between angle of motion and principal axis of object (degrees)

Rolls at right angles toprincipal axis

Rolls along principal axis

Page 79: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

affordance exploitation

Caveat: this work uses an early version of object detection (not the one presented today)

Page 80: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

mimicry test

Demonstration by human

Mimicry in similar situation

Mimicry when object is rotated

Invoking the object’s natural rolling affordance

Going against the object’s natural rolling affordance

Page 81: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

mimicry test

Page 82: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

pokingobject segmentation

edge catalog

object detection(recognition, localization,

contact-free segmentation)

affordance exploitation(rolling)

manipulator detection(robot, human)

Page 83: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

talk overview

Learning from an activity§ Poking: to learn to recognize objects,

manipulators, etc.§ Chatting: to learn the names of objects

Learning a new activity§ Searching for an object§ Then back to learning from the activity…

Page 84: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

open speech recognition

Vocabulary can be extended at any timeAssumes active vocabulary is smallIsolated words only

Page 85: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

keeping track of objects

EgoMapshort term memory

of objects and their locationsso “out of sight” is not “out of mind”

Page 86: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

keeping track of objects

Page 87: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

speech and space: chatting

Page 88: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

talk overview

Learning from an activity§ Poking: to learn to recognize objects,

manipulators, etc.§ Chatting: to learn the names of objects

Learning a new activity§ Searching for an object§ Then back to learning from the activity…

Page 89: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

Tomasello’s experiments

Designed experiments to challenge constraint-based theory of language acquisition in infants

Wants to show infants learn words through real understanding of activity (‘flow of interaction’), not hacks

Great test cases! Get beyond direct association

(But where does knowledge of activity come from?)

Page 90: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

“let’s go find the toma!”

Infant plays with set of objects

Then adult says “let’s go find the toma!” (nonce word)Acts out a search, going to several objects first before finally finding the ‘toma’

Later, infant tested to see which object it thinks is the ‘toma’

Several variants (e.g. ‘toma’ placed in inaccessible location with the infant watching – adult is upset when trying to get it)

Page 91: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

“let’s go find the toma!”

Page 92: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

goal

Have robot learn about search activity from examples of looking for known objects

Then apply that to a “find the toma”-like scenario

Page 93: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

virtuous circle

poking, chatting

car, ball, cube, and their names

discover car, ball, and cube through poking; discover their names

through chatting

Page 94: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

virtuous circle

poking, chatting, search

car, ball, cube, and their names

follow named objects into search activity, and observe

the structure of search

Page 95: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

virtuous circle

poking, chatting, searching

car, ball, cube, toma, and their names

discover object through poking, learn its name

(‘toma’) indirectly during search

Page 96: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

learning about search

Page 97: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

what the robot learns

‘Find’ is followed by mention of an absent object

‘Yes’ is said when a previously absent object is in view

Page 98: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

how it learns this

Look for reliable event/state combinations, sequences

Events are:§ hearing a word§ seeing an object

States are: § recent events§ situation evaluations (object corresponding to word not present,

mismatch between word and object, etc.)

Page 99: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

finding the toma

Page 100: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

caveats

Much much less sophisticated than infants!

Cues the robot is sensitive to are very impoverished

Slightly different from Tomasello’s experiment

Saved state between stages – wasn’t one complete continuous run

Page 101: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

conclusions: why do this?

Uses all the ‘alternative essences of intelligence’§ Development§ Social interaction§ Embodiment§ Integration

Points the way to really flexible robots§ today the robot should sort widgets from wombats (neither of

which it has seen before)§ who knows what it will have to do tomorrow

Page 102: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

conclusions: contributions

active segmentation

open object recognition

appearance catalog

affordance recognition

open speech recognition

through contact

for correction, enrollment

for oriented features

for rolling

for isolated words

virtuous circle of development

learning about and through activity

Page 103: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

conclusions: the future

Dexterous manipulation

Object perception (visual, tactile, acoustic)§ During dextrous manipulation§ During failed manipulation

Integration with useful platform§ Socially enabled§ Mobile

Page 104: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical
Page 105: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical

Tom Murphy (www.cs.cmu.edu/~tom7)