Interactive Learning of the Acoustic Properties of Objects by a Robot Jivko Sinapov Mark Wiemer...

Interactive Learning of the Acoustic Properties of Objects

by a RobotJivko SinapovMark Wiemer

Alexander Stoytchev{jsinapov|banff|alexs}@iastate.edu

Iowa State University

Motivation: why study sound?

Sound Producing Event

[Gaver, 1993]

Motivation (2) Why should a robot use acoustic

information? Human environments are cluttered with

objects that generate sounds Help robot perceive events and objects

outside of field of view Help robot perceive material properties of

objects

Related Work Krotkov et al. (1996) and Klatzky et al. (2000):

Perception of material using contact sounds. Learned sound models for tapping aluminum, brass,

glass, wood, and plastic (one object per material)

Richmond and Pai (2000) Robotic platform for measuring co

ntact sounds between robot’s end effector and object surfaces

Models the contact sounds from different materials using spectrogram averaging [Richmond and Pai, 200]

Related Work (2) Torres-Jara, Natale and Fitzpatrick (2005)

Robot taps objects and records spectrogram of sound Recognize objects using spectrogram matching Recognized 4 test objects used during training.

Tapping objects Spectrogram of tapping

Our Study Demonstrate object

recognition using acoustic features from interaction

18 Different Objects 3 Different behaviors:

push, grasp, drop Evaluate different machine

learning algorithms

Robot and Objects 7-DOF Barret WAM arm with Barret Hand 18 Different objects:

Robot Behaviors Three behaviors: grasp, push, drop

Grasping:


Pushing:


Dropping:

Sound Feature RepresentationStep 1: segment sound wave during interaction:

Step 2: Compute Discrete Fourier transform (DFT) of sound wave:

Step 3: Compute 2-D histogram of DFT matrix using block averaging:

10 temporal bins

5 frequency bins

Time

Fre

quen

cy

Object Recognition using Acoustic Properties of Objects

Problem: given robot’s behavior and detected sound features from interaction, predict the object.

Behavior: Sound Features: Object Class:

grasp

Example:

Problem Formulation Let be the set of

exploratory behaviors Let be the set of objects, Let be a data point such that:

, , and

For each behavior learn a model that can estimate

Learning Algorithms K-NN

Simple instance-based algorithm Uses Euclidean distance function

Support Vector Machine (SVM) Discriminative approach, uses Kernel trick

Bayesian Network Probabilistic graphical model Sound Features are discretized into bins

Learning Algorithms: k-NN, SVM, and Bayesian Network

k-NN: memory-based learning algorithm

? Test point

With k = 3:

2 neighbors

1 neighbors

Therefore, Pr(red) = 0.66 Pr(blue) = 0.33

Support Vector Machine: discriminative learning algorithm


1. Finds maximum margin hyperplane that separates two classes

2. Uses Kernel trick to map data points into a feature space in which such a hyperplane exists

[http://www.imtech.res.in/raghava/rbpred/svm.jpg]

Bayesian Network: a probabilistic graphical model


A

C D

E

B

1. Full power of statistical modeling and inference

2. Learning: learns both the structure of the network and the parameters (conditional probability tables)

3. Numerical features are discretized into bins

Using Multiple Behaviors Given trained models , ,

Given novel sounds , , from behaviors performed on the same object

Assign prediction to object class that maximizes:

Evaluation 6 trials recorded with each of the 18 objects

with each of the 3 behaviors Leave-one-out cross-validation Compared performance of learning

algorithms as well as behaviors Performance Measure:

Results

Chance accuracy = 1/18 = 5.6667%

4 - 2 - - - - - - -

- 5 - - - - - - - 1

- - 5 - - - - - - 1

- - - 0 6 - - - - -

- - - 3 3 - - - - -

- - - - - 4 1 - - 1

- - - - - - 6 - - -

- - - - - - - 6 - -

- - - - - - - 1 5 -

- - - - - 1 - - - 5

Confusion Matrix for model Mpush using Bayesian Network

Predicted →

Perfect classification and no false positives for:

Confusion Matrix for model Mcombined using Bayesian Network

6 - - -

1 5 - -

- - 5 1

- - 1 5

Conclusion: The errors made by models Mgrasp, Mpush and Mdrop are uncorrelated.

Predicted →

Learning rate of algorithms

Compare performance of the model Mgrasp as a function of dataset size for:

• k-NN• Support Vector Machine• Bayesian Network

Learning Rate per Behaviorwith Bayesian Network

Summary and Conclusions Accurate acoustic-based object recognition with

18 objects and 3 behaviors Using multiple behaviors improves recognition

regardless of learning algorithm Bayesian network performed best with given

feature representation Grasping and Pushing interaction produces

sound features that are more informative of the object than Dropping

Future Work Scaling up:

Increase number of objects Vary object and robot pose Autonomous interaction

Use unsupervised learning to form object sound categories

More powerful feature representations Temporal features (i.e. periodicity) of sounds

Use models to detect events in the world performed by others (humans or other robots)

Interactive Learning of the Acoustic Properties of Objects by a Robot Jivko Sinapov Mark Wiemer...

Documents

Transcript of Interactive Learning of the Acoustic Properties of Objects by a Robot Jivko Sinapov Mark Wiemer...