Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices

17
ALEXANDROS A. CHAARAOUI JOSÉ R. P ADILLA-LÓPEZ FRANCISCO FLÓREZ-REVUELTA Sydney, December 2, 2013 3 rd IEEE Workshop on Consumer Depth Cameras for Computer Vision (CDC4CV)

description

Paper presented at the 3rd Workshop on Consumer Depth Cameras for Computer Vision (CDC4CV), ICCV 2013 Workshop, Sydney (Australia), 2013

Transcript of Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices

Page 1: Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices

ALEXANDROS A. CHAARAOUI

JOSÉ R. PADILLA-LÓPEZ

FRANCISCO FLÓREZ-REVUELTASydney,

December 2, 2013

3rd IEEE

Workshop

on

Consumer

Depth

Cameras

for

Computer

Vision

(CDC4CV)

Page 2: Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

1. Introduction2

Motivation:

Use of both skeleton and silhouette in previous works

Problems with skeleton: lack of precision or noisy

caused by occlusion caused by body parts or objects

Pick-up and Throw

Page 3: Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

1. Introduction3

Motivation:

Use of both skeleton and silhouette in previous works

Problems with silhouettes: the only available

viewpoint is unfavourable for recognition

Tennis Serve Forward Punch Hammer

Page 4: Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

1. Introduction4

Solution:

Fusing different features that complement each other:

skeleton, RGB colour, silhouette (2D), volume (3D)…

In this work, we fuse skeleton and silhouette

Page 5: Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

Concatenation of skeleton and silhouette

features

2. Fusion of skeleton and

silhouette5

Skeleton:

3D coordinates of the

joints

Silhouette:

Radial summary

16

18 19

17

15 14

5 6

7

4

1

8

10

12 11

13

9

2

3

20

Page 6: Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

3. Classification method based on

a bag of key poses6

[1] A.A. Chaaraoui, P. Climent-Pérez and F. Flórez-Revuelta. Silhouette-based human action recognition using sequences of key poses. Pattern Recognition Letters, 34(15):1799-1807, 2013.

Page 7: Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

3. Classification method based on

a bag of key poses7

[1] A.A. Chaaraoui, P. Climent-Pérez and F. Flórez-Revuelta. Silhouette-based human action recognition using sequences of key poses. Pattern Recognition Letters, 34(15):1799-1807, 2013.

Page 8: Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

3. Classification method based on

a bag of key poses8

Sequence recognition

Transform a sequence into a sequences of key poses

using the bag of key poses

Sequence matching using dynamic time warping

[1] A.A. Chaaraoui, P. Climent-Pérez and F. Flórez-Revuelta. Silhouette-based human action recognition using sequences of key poses. Pattern Recognition Letters, 34(15):1799-1807, 2013.

Page 9: Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

4. Experimentation9

Evaluation with the MSR Action3D dataset

Page 10: Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

4. Experimentation10

Cross-subject validation as in [2]:

Training: actors 1, 3, 5, 7 and 9

Testing: actors 2, 4, 6, 8 and 10

[2] W. Li, Z. Zhang, and Z. Liu. Action recognition based on a bag of 3D points. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 9-14, 2010.

Page 11: Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

4. Experimentation11

Cross-subject validation as in [2]:

[2] W. Li, Z. Zhang, and Z. Liu. Action recognition based on a bag of 3D points. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 9-14, 2010.

Page 12: Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

4. Experimentation12

Confusion matrices for AS1:

a02 a03 a05 a06 a10 a13 a18 a20

a02 0,92 0,08

a03 1,00

a05 0,91 0,09

a06 0,09 0,73 0,18

a10 1,00

a13 1,00

a18 1,00

a20 0,14 0,07 0,29 0,50

Skeleton

a02 a03 a05 a06 a10 a13 a18 a20

a02 0,67 0,25 0,08

a03 0,58 0,42

a05 0,18 0,73 0,09

a06 0,18 0,82

a10 1,00

a13 0,07 0,93

a18 0,33 0,20 0,07 0,40

a20 0,07 0,14 0,07 0,14 0,57

Silhouette

a02 a03 a05 a06 a10 a13 a18 a20

a02 1,00

a03 1,00

a05 0,09 0,91

a06 0,18 0,73 0,09

a10 1,00

a13 1,00

a18 1,00

a20 0,29 0,71

Fusion

Page 13: Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

4. Experimentation13

Leave-one-actor-out:

Page 14: Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

5. Conclusions and future work14

Straightforward fusion of skeleton and silhouette

Improvement in the recognition rate

Include also side and top projected silhouettes

Select the weight for each feature vector

Feature subset selection

Page 15: Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

5. Conclusions and future work15

We have already applied the approach in [3] for

feature selection to the fusion of skeleton and

silhouette

[3] A.A. Chaaraoui, J.R. Padilla-López, P. Climent-Pérez, and F. Flórez-Revuelta. Evolutionary joint selection to improve human action recognition with RGB-D devices. Expert Systems with Applications, 41(3):786-794,2014.

Cross-Subject

LOAO

Page 16: Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices

© A

.A. C

haa

rao

ui, J

.R. P

adill

a-L

óp

ez

and

F. F

lóre

z-R

evu

elta

(C

DC

4C

V’1

3)

5. Conclusions and future work16

Straightforward fusion of skeleton and silhouette

Improvement in the recognition rate

Include also side and top projected silhouettes

Select the weight for each feature vector

Feature subset selection

Should we create a large bank of features and

select them appropriately?

Page 17: Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices

ALEXANDROS A. CHAARAOUI

JOSÉ R. PADILLA-LÓPEZ

FRANCISCO FLÓREZ-REVUELTASydney,

December 2, 2013

3rd IEEE

Workshop

on

Consumer

Depth

Cameras

for

Computer

Vision

(CDC4CV)