Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices

ALEXANDROS A. CHAARAOUI

JOSÉ R. PADILLA-LÓPEZ

FRANCISCO FLÓREZ-REVUELTASydney,

December 2, 2013

3rd IEEE

Workshop

on

Consumer

Depth

Cameras

for

Computer

Vision

(CDC4CV)

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

1. Introduction2

Motivation:

Use of both skeleton and silhouette in previous works

Problems with skeleton: lack of precision or noisy

caused by occlusion caused by body parts or objects

Pick-up and Throw

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

1. Introduction3

Motivation:

Use of both skeleton and silhouette in previous works

Problems with silhouettes: the only available

viewpoint is unfavourable for recognition

Tennis Serve Forward Punch Hammer

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

1. Introduction4

Solution:

Fusing different features that complement each other:

skeleton, RGB colour, silhouette (2D), volume (3D)…

In this work, we fuse skeleton and silhouette

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

Concatenation of skeleton and silhouette

features

2. Fusion of skeleton and

silhouette5

Skeleton:

3D coordinates of the

joints

Silhouette:

Radial summary

16

18 19

17

15 14

5 6

7

4

1

8

10

12 11

13

9

2

3

20

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

3. Classification method based on

a bag of key poses6

[1] A.A. Chaaraoui, P. Climent-Pérez and F. Flórez-Revuelta. Silhouette-based human action recognition using sequences of key poses. Pattern Recognition Letters, 34(15):1799-1807, 2013.

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)


a bag of key poses7


© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)


a bag of key poses8

Sequence recognition

Transform a sequence into a sequences of key poses

using the bag of key poses

Sequence matching using dynamic time warping


© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

4. Experimentation9

Evaluation with the MSR Action3D dataset

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

4. Experimentation10

Cross-subject validation as in [2]:

Training: actors 1, 3, 5, 7 and 9

Testing: actors 2, 4, 6, 8 and 10

[2] W. Li, Z. Zhang, and Z. Liu. Action recognition based on a bag of 3D points. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 9-14, 2010.

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)


Cross-subject validation as in [2]:

[2] W. Li, Z. Zhang, and Z. Liu. Action recognition based on a bag of 3D points. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 9-14, 2010.

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)


Confusion matrices for AS1:

a02 a03 a05 a06 a10 a13 a18 a20

a02 0,92 0,08

a03 1,00

a05 0,91 0,09

a06 0,09 0,73 0,18

a10 1,00

a13 1,00

a18 1,00

a20 0,14 0,07 0,29 0,50

Skeleton

a02 a03 a05 a06 a10 a13 a18 a20

a02 0,67 0,25 0,08

a03 0,58 0,42

a05 0,18 0,73 0,09

a06 0,18 0,82

a10 1,00

a13 0,07 0,93

a18 0,33 0,20 0,07 0,40

a20 0,07 0,14 0,07 0,14 0,57

Silhouette

a02 a03 a05 a06 a10 a13 a18 a20

a02 1,00

a03 1,00

a05 0,09 0,91

a06 0,18 0,73 0,09

a10 1,00

a13 1,00

a18 1,00

a20 0,29 0,71

Fusion

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)


Leave-one-actor-out:

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

5. Conclusions and future work14

Straightforward fusion of skeleton and silhouette

Improvement in the recognition rate

Include also side and top projected silhouettes

Select the weight for each feature vector

Feature subset selection

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)


We have already applied the approach in [3] for

feature selection to the fusion of skeleton and

silhouette

[3] A.A. Chaaraoui, J.R. Padilla-López, P. Climent-Pérez, and F. Flórez-Revuelta. Evolutionary joint selection to improve human action recognition with RGB-D devices. Expert Systems with Applications, 41(3):786-794,2014.

Cross-Subject

LOAO

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)


Straightforward fusion of skeleton and silhouette

Improvement in the recognition rate

Include also side and top projected silhouettes

Select the weight for each feature vector

Feature subset selection

Should we create a large bank of features and

select them appropriately?

ALEXANDROS A. CHAARAOUI

JOSÉ R. PADILLA-LÓPEZ

FRANCISCO FLÓREZ-REVUELTASydney,

December 2, 2013

3rd IEEE

Workshop

on

Consumer

Depth

Cameras

for

Computer

Vision

(CDC4CV)

Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices

Technology

Transcript of Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices