Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert...

43
Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin INRIA Rhone-Alpes

Transcript of Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert...

Page 1: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

LectureshipA proposal for advancing computer

graphics, imaging and multimedia design at RGU

Robert Gordon University

Aberdeen, 20/6/2008

Fabio CuzzolinINRIA Rhone-Alpes

Page 2: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Career path

Master’s thesis on gesture recognition at the University of Padova

Visiting student, ESSRL, Washington University in St. Louis, and at the University of California at Los Angeles (2000)

Ph.D. thesis on belief functions and uncertainty theory (2001)

Researcher at Politecnico di Milano with the Image and Sound Processing group (2003-2004)

Post-doc at the University of California at Los Angeles, UCLA Vision Lab (2004-2006)

Marie Curie fellow at INRIA Rhone-Alpes

Page 3: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

collaborations with several groups

Scientific production and collaborations

collaborations with journals:

IEEE PAMI IEEE SMC-B

CVIUInformation

FusionInt. J. Approximate

ReasoningPC member for VISAPP, FLAIRS, IMMERSCOM, ISAIMcurrently 4+10 journal papers and 31+8 conference papers

SIPTA

Setubal

CMU

Pompeu Fabra

EPFL-IDIAPUBoston

Page 4: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

My background

research

Discrete math

linear independence on lattices and matroids

Uncertainty theory

geometric approach

algebraic analysis

generalized total probability

Machine learning

Manifold learning for dynamical models

Computer vision gesture and action recognition

3D shape analysis and matching

Gait ID

pose estimation

Page 5: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

action recognition

action segmentation

A multi-layer frameworkfor human motion analysis

different tasks, integrated in a series of layesfeedbacks act between different layers

multiple views

3D reconstruction

unsupervised body-part segmentation

image data fusion

model fitting (stick-

articulated)

motion capture

identity recognitio

n

surveillance

HMI

Page 6: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

A multi-layer frameworkfor human motion analysis

Action and gesture recognition

Laplacian unsupervised segmentation

Matching of 3D shapes by embedded orthogonal alignment

Bilinear models for invariant gaitID

Manifold learning for dynamical models

The role of uncertainty measures

Information fusion for model-free pose estimation

Page 7: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

HMMs for gesture recognition

transition matrix A -> gesture dynamics

state-output matrix C -> collection of hand poses

Hand poses were represented by “size functions” (BMVC'97)

Page 8: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Gesture classification

HMM 1

HMM 2

HMM n

EM to learn HMM parameters from an input sequence

the new sequence is fed to the learnt gesture models

they produce a likelihoodthe most likely model is chosen (if above a threshold)

OR new model is attributed the label of the closest one (using K-L divergence or other distances)

Page 9: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Volumetric action recognition

• 2D approaches: features are extracted from single views -> viewpoint dependence

• volumetric approach: features are extracted from a volumetric reconstruction of the moving body (ICIP'04)

Page 10: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

A multi-layer frameworkfor human motion analysis

Action and gesture recognition

Laplacian unsupervised segmentation

Matching of 3D shapes by embedded orthogonal alignment

Bilinear models for invariant gaitID

Manifold learning for dynamical models

The role of uncertainty measures

Information fusion for model-free pose estimation

Page 11: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Unsupervised coherent 3D segmentation

to recognize actions we need to extract features

segmenting moving articulated 3D bodies into parts

along sequences, in a consistent way

in an unsupervised fashion

robustly, with respect to changes of the topology of the moving body

as a building block of a wider motion analysis and capture framework

ICCV-HM'07, CVPR'08, to submit to IJCV

Page 12: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Clustering after Laplacian embedding

generates a lower-dim, widely separated embedded cloudless sensitive to topology changes than other methodsless computationally expensive then ISOMAP

rigid part

rigid part

moving joint area

unaffected neighborhoods

unaffected neighborhoods

affected neighborhoods

local neighborhoods -> stable under articulated motion

Page 13: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Algorithm

K-wise clustering in the embedding space

Page 14: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Seed propagation along time

To ensure time consistency clusters’ seeds have to be propagated along time

Old positions of clusters in 3D are added to new cloud and embedded

Result: new seeds

Page 15: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Results

Coherent clustering along a sequence

Handling of topology changes

Page 16: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

A multi-layer frameworkfor human motion analysis

Action and gesture recognition

Laplacian unsupervised segmentation

Matching of 3D shapes by embedded orthogonal alignment

Bilinear models for invariant gaitID

Manifold learning for dynamical models

The role of uncertainty measures

Information fusion for model-free pose estimation

Page 17: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Laplacian matching of dense meshes or voxelsets

as embeddings are pose-invariant (for articulated bodies)

they can then be used to match dense shapes by simply aligning their images after embedding

ICCV '07 – NTRL, ICCV '07 – 3dRR, CVPR '08, submitted to ECCV'08, to submit to PAMI

Page 18: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Eigenfunction Histogram assignment

Algorithm:

compute Laplacian embedding of the two shapesfind assignment between eigenfunctions of the two shapesthis selects a section of the embedding spaceembeddings are orthogonally aligned there by EM

Page 19: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Results

Appls: graph matching, protein analysis, motion capture To propagate bodypart segmentation in timeMotion field estimation, action segmentation

Page 20: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Application: spatio-temporal action segmentation

problem: segmenting parts of the video(s) containing “interesting” motions

• global approach: working on the entire sequence (multidimensional volumemultidimensional volume)

• previous works: object segmentation on the spatio-temporal volume for single frames

idea: in a multi-camera setup, working on 3D clouds (hulls) + motion fields + time = 7D volume

• outline of an approach: smoothingsmoothing using message passing + shape detectionshape detection on the obtained manifold

Page 21: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

A multi-layer frameworkfor human motion analysis

Action and gesture recognition

Laplacian unsupervised segmentation

Matching of 3D shapes by embedded orthogonal alignment

Bilinear models for invariant gaitID

Manifold learning for dynamical models

The role of uncertainty measures

Information fusion for model-free pose estimation

Page 22: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Bilinear models for gait-ID

CSSC bAy

To recognize the identity of humans from their gait (CVPR '06, book chapter in progress)nuisance factors: emotional state, illumination, appearance, view invariance ... (literature: randomized trees)each motion possess several labels: action, identity, viewpoint, emotional state, etc.

• bilinear models (Tenenbaum) can be used to separate the influence of “style” and “content” (the label to classify)

Page 23: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Content classification of unknown style

given a training set in which persons (content=ID) are seen walking from different viewpoints (style=viewpoint)an asymmetric bilinear model can learned from it through SVDwhen new motions are acquired in which a known person is being seen walking from a different viewpoint (unknown style)…

an iterative EM procedure can be set up to classify the content

E step -> estimation of p(c|s), the prob. of the content given the current estimate s of the style M step -> estimation of the linear map for unknown style s

Page 24: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Three layer model

each sequence is encoded as an HMMits C matrix is stacked in a single observation vectora bilinear model is learnt from those vectors

Three-layer model

Features: projections of silhouette's contours onto a line through the center

Page 25: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Results on CMU database

Mobo database: 25 people performing 4 different walking actions, from 6 cameras. T Three labels: action, id, view

Compared performances with “baseline” algorithm and straight k-NN on sequence HMMs

Page 26: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

A multi-layer frameworkfor human motion analysis

Action and gesture recognition

Laplacian unsupervised segmentation

Matching of 3D shapes by embedded orthogonal alignment

Bilinear models for invariant gaitID

Manifold learning for dynamical models

The role of uncertainty measures

Information fusion for model-free pose estimation

Page 27: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Learning manifolds of dynamical models

Classify movements represented as dynamical models

for instance, each image sequence can be mapped to an ARMA, or AR linear model

Motion classification then reduces to find a suitable distance function in the space of dynamical models

when some a-priori info is available (training set)..

.. we can learn in a supervised fashion the “best” metric for the classification problem!

To submit to ECCV'08 – MLVMA Workshop

Page 28: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Learning pullback metrics

many unsupervised algorithms take in input dataset and map it to an embedded space, but fail to learn a full metric

consider than a family of diffeomorphisms F between the original space M and a metric space N

the diffeomorphism F induces on M a pullback metric maximizing inverse volume finds the manifold which better

interpolates the data (geodesics pass through “crowded” regions)

N

k

M

k

k

dmmg

mgDO

1 2

1

2

1

))((det

))((det)(

Page 29: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Space of AR(2) models

given an input sequence, we can identify the parameters of the linear model which better describes itautoregressive models of order 2 AR(2)Fisher metric on AR(2)

Compute the geodesics of the pullback metric on M

21

12

2212121 1

1

)1)(1)(1(

1),(

aa

aa

aaaaaaag

Page 30: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Results on action and ID rec

scalar feature, AR(2) and ARMA models

Page 31: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

A multi-layer frameworkfor human motion analysis

Action and gesture recognition

Laplacian unsupervised segmentation

Matching of 3D shapes by embedded orthogonal alignment

Bilinear models for invariant gaitID

Manifold learning for dynamical models

The role of uncertainty measures

Information fusion for model-free pose estimation

Page 32: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

assumption: not enough evidence to determine the actual probability describing the problem

second-order distributions (Dirichlet), interval probabilities

credal sets

Uncertainty measures: Intervals, credal sets

Belief functions (Shafer 76): special case of

credal sets

a number of formalisms have been proposed to extend or replace classical probability

Page 33: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

1)( B

Bm• if m is a mass function on 2Θ

s.t.

• Probability on a finite set: function p: 2Θ -> [0,1] with p(A)=x m(x), where m: Θ -> [0,1] is a mass function

• Probabilities are additive: if AB= then p(AB)=p(A)+p(B)

Belief functions as random sets

AB

BmAb )(

A

B• belief function b: 2Θ ->[0,1]

Page 34: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Information fusion by Dempster’s rule

several aggregation or elicitation operators proposed

original proposal: Dempster’s rule

• b1:

m({a1})=0.7, m({a1 ,a2})=0.3

a1

a2

a3

a4

• b1 b2 :

m({a1}) = 0.7*0.1/0.37 = 0.19

m({a2}) = 0.3*0.9/0.37 = 0.73

m({a1 ,a2}) = 0.3*0.1/0.37 = 0.08

• b2:

m()=0.1, m({a2 ,a3 ,a4})=0.9

Page 35: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Imprecise classifiers and credal networks

“imprecise” classifiersclass estimate is a belief function

exploit only available evidence, represent ignorance

Belief networks or credal networksat each node a belief function or a convex set of probs

robust version of bayesian networks

Page 36: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

A multi-layer frameworkfor human motion analysis

Action and gesture recognition

Laplacian unsupervised segmentation

Matching of 3D shapes by embedded orthogonal alignment

Bilinear models for invariant gaitID

Manifold learning for dynamical models

The role of uncertainty measures

Information fusion for model-free pose estimation

Page 37: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Model-free pose estimation

estimating the “posepose” (internal configuration) of a moving body from the available images

Qtq k ˆt=0

t=T

if you do not have an a-priori model of the

object ..

Page 38: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Learning feature-pose maps

... learn a map between features and poses directly from the data

given pose and feature sequences acquired by motion capture ..

q q

y y

1

1

T

T

Q~

a Gaussian density for each state is set up on the feature space -> approximate feature space

• maps each cluster to the set of training poses qk with feature yk inside it

Page 39: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Evidential model

18594

161

38

.. and approximate parameter space ..

.. form the “evidential model”

MTNS'00, ISIPTA'05, to submit to Information Fusion

Page 40: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Results on human body tracking

comparison of three models: left view only, right view only, both views

pose estimation yielded by the overall model

estimate associated with the “right” model

ground truth

• “left” model

Page 41: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Conclusions - Research

Hot topic in computer vision and machine learning: human motion analysis

Applications: motion capture, surveillance, human machine interaction, biometric identification

Different tools from machine learning, robust statistics, differential geometry can be useful

Several tasks are involved in a hierarchical fashion

Tasks are not isolated, but interact and generate feedbacks to help the solution of the others

Page 42: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Conclusions - Teaching plans

machine vision involves notions coming from different branches of pure and applied mathematics: robust statistics, differential geometry, discrete math

all of them are considered as useful tools to solve real-world problems

students have then the chance to improve their mathematical background ...

... and learn at the same time how to develop real products on the ground

integrated courses can be designed along this line

Page 43: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.

Conclusions – Commercial partnershipsseveral opportunities to develop technology transfer activities involving companies

biometrics: in particular, behavioral (non-controlled) identification

surveillance: multi-camera human motion detection and classification

image and video browsing: internet-based content retrieval

personal links with companies like Honeywell Labs (surveillance), Riya (image googling), MS Research