Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT...
-
Upload
juliana-jones -
Category
Documents
-
view
214 -
download
1
Transcript of Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT...
Exploiting cross-modal rhythm for robot perception of objects
Artur M. Arsenio Paul Fitzpatrick
MIT Computer Science and Artificial Intelligence Laboratory
m
CIRAS 2003
cameras on active vision head
microphone array above torso
periodically moving object (hammer)
periodically generated sound (banging)
Cog – the humanoid platform
CIRAS 2003
Motivation
• Tools are often used in a manner that is composed of some repeated motion - consider hammers, saws, brushes, files, …
• Rhythmic information across the visual and acoustic sensory modalities have complementary properties
• Features extracted from visual and acoustic processing are what is needed to build an object recognition system
CIRAS 2003
Interacting with the robotrobot
CIRAS 2003
Talk outline
• Matching sound and vision• Matching with visual distraction• Matching with acoustic distraction• Matching multiple sources• Priming sound detection using vision
• Towards object recognition
CIRAS 2003
•Tools are often used in a manner that is composed of some repeated motion - consider hammers, saws, brushes, files.
•Points tracked using Lukas-Kanade algorithm
•Periodicity Analysis
•FFTs of tracked trajectories
•Periodicity Histograms
•Phase verification
Detecting periodic events
CIRAS 2003
Matching sound and vision
fre
qu
en
cy (
kHz)
0 500 1000 1500 20000
2
4
6
8
00
500
1000
1500e
ne
rgy
0-80
-70
-60
-50
500 1000 1500 2000 2500
500 1000 1500 2000 2500time (ms)h
am
me
r p
osi
tion
•The sound intensity peaks once per visual period of the hammer
CIRAS 2003
Matching with visual distraction
• One object (the car) making noise• Another object (the ball) in view
– Problem: which object goes with the sound?– Solution: Match periods of motion and sound
CIRAS 2003
Comparing periods
0 500 1000 1500 2000 25000
5
10en
ergy
0 500 1000 1500 2000 2500-50
0
50
car
posi
tion
0 500 1000 1500 2000 250040
50
60
time (ms)
ball
posi
tion
freq
uenc
y (k
Hz)
0 500 1000 1500 20000
1
2
3
4
5
6
7
8
•The sound intensity peaks twice per visual period of the car
CIRAS 2003
Matching with acoustic distraction
Matching with acoustic distraction
500 1000 1500 2000
50
60
70
80
90
car
posi
tion
freq
uenc
y (k
Hz)
0 500 1000 1500 20000
2
4
6
8
Sound
500 1000 1500 2000 25004
6
8
10
12sn
ake
po
sitio
n
CIRAS 2003
Matching multiple sources
• Two objects making sounds with distinct spectrums– Problem: which object goes with which sound?– Solution: Match periods of motion and sound
CIRAS 2003
Binding periodicity features
200 400 600 800 100012001400160018002000
-60
-40
-20
car
posi
tion
200 400 600 800 100012001400160018002000
30
35
40
ratt
le p
ositi
on
freq
uenc
y (k
Hz)
0 500 1000 1500 2000
0
2
4
6
8
•The sound intensity peaks twice per visual period of the car. For the cube rattle, the sound/visual signals have different ratios according to the frequency bands
CIRAS 2003
134146162723Plane and Mouse
00100494Snake rattle (Car in Background)
001005165Car (Snake rattle in background)
17083566Car
860141117Car & ball
00100686hammer
Missed Binds (%)
Incorrect Binds (%)
Correct binds (%)
Bind made
Sound period found
Visual period found
Experiment
StatisticsAn evaluation of cross-modal binding for various objects and situations
the sound generated by a periodically moving object can be much more complex and ambiguous than its visual trajectory
CIRAS 2003
time (ms)
fre
qu
en
cy (
kHz)
0 200 400 600 8000
2
4
6
8
time (ms)
fre
qu
en
cy (
kHz)
0 200 400 6000
2
4
6
8
Priming sound detection using vision
Signals in Phase
CIRAS 2003
time (ms)
fre
qu
en
cy (
kHz)
0 200 400 600 8000
2
4
6
8
time (ms)
fre
qu
en
cy (
kHz)
0 200 400 6000
2
4
6
8
Signals out of phase!
CIRAS 2003
Object recognition
• Visual object segmentation
• Cross-modal object recognition– Ratio between acoustic/visual fundamental frequencies– Phase between acoustic and visual signals– Range of acoustic frequency bands
CIRAS 2003
Cross-modal object recognition
Causes sound when changing direction, often quiet during remainder of trajectory (although bells vary)
Causes sound when changing direction after striking object; quiet when changing direction to strike again
Causes sound while moving rapidly with wheels spinning; quiet when changing direction
CIRAS 2003
0.4 0.5 0.6 0.7 0.8 0.9 1-200
-150
-100
-50
0
50
100
Ph
ase
diff
ere
nce
(so
un
d/v
isu
al)
Fundamental period Ratio (sound/visual)
Clustering
0.40.5
0.60.7
0.80.9
1 -200
-150
-100
-50
0
50
100
0
5
10
15
Phase difference (sound/visual)
Fundamental period Ratio (sound/visual)
Fre
quen
cy b
ands
CIRAS 2003
Conclusions• Different objects = distinct acoustic-visual patterns
which are a rich source of information for object recognition. Object differentiation from both its visual and acoustic backgrounds by binding pixels and frequency bands that are oscillating together
• Cognitive evidence that, for humans, simple visual periodicity can aid the detection of acoustic periodicity
• More feature can be used for better discrimination, like the ratio of the sound/visual peak amplitudes
• Each type of features are important for recognition when the other is absent. But when both are present, then we can do better by looking at the relationship between visual motion and the sound generated.
CIRAS 2003
Questions?Questions?