Download - Classification of indoor actions through deep neural networks

Classification of indoor actions through deep neural networksAgnese Augello1, Umberto Maniscalco1, Filippo Vella1, Vincenzo Bentivenga2

and Salvatore Gaglio1,2 !

1 Institute for High Performance Computing and Networking Cognitive Robotics & Social Sensing Lab

CNR. Palermo, Italy 2 DICGIM - Università degli studi di Palermo. Palermo, Italy

What's the matter

• Non-invasive monitoring of elderly people, inside their domestic environment, in order to guarantee their safety.

• Designing an automatic system capturing movements and activities of a person through accelerometers and RGBd cameras.

• Classification of movements and activities by a Deep Convolutional Neural Network.

What's the matter

M O N I T O R

C L A S S I F I C A T I O N

Row data Label

The dataset

• We have used the dataset of the SPHERE (Sensor Platform for HEalthcare in Residential Environments) project: http://irc-sphere.ac.uk/sphere-challenge/home

• The dataset contains measurements from RGB-d cameras and accelerometers, collected asking a set of trained people to perform specific actions in an indoor environment.

The dataset

• The samples have been manually annotated with one of the given labels.

• The values sampled by accelerometer, RGB-D camera data are arranged in a vector of eighteen values.

The dataset• x, y, z: acceleration along the x, y, z axes

• x, y of the center of the bounding box

• x, y of the bottom right corner of the bounding box

• x, y of the top left corner of the bounding box

• x, y and z for the centre of the 3D bounding box

• x, y and z for the bottom right back corner of the 3D bounding box

• x, y and z for the front left top corner of the 3D bounding box

3

2

2

2

3

3

3

18

Labels (22)

• The actions: ascent stairs, descent stairs, jump, walk with load, walk.

• The positions: bending, kneeling, lying, sitting, squatting, standing.

• The transitions: stand-to-bend, kneel-to-stand, lie- to-sit, sit-to-lie, sit-to-stand, stand-to-kneel, stand-to-sit, bend-to-stand, turn.

3

2

2

2

3

3

3

18

Labels (5)• All the transition labels have been clustered together in a

simple label transition. The classes according the walking have been merged together in a single class. The final labels are:

• bending • standing • lying • sitting • transition

Data sampling

M O N I T O R

Row data

3222333

t1

3222333

t2

3222333

tn

20 Hz

Down Sampling

3222333

t1

3222333

t2

3222333

tn

Final Sampling 2 Hz

Arranging data

M O N I T O R

Row data

D O W N !

S A M P L I N G

3222333

t3

3222333

t4

3222333

t12

FIFO 5 seconds

3222333

t1

3222333

t2

3222333

t13

3222333

t14

Classification

• MLP Multilayer Perceptron used as baseline.

• Convolutional Neural Networks composed by a convolutional layer followed by a second convolutional layer and a max pool stage.

• Convolutional Neural Networks composed by a convolutional layer followed by a max pool stage, followed by a second convolutional layer and a max pool stage.

Deep Net 1• The first convolutional stage is

performed with kernels with size 3x3

• The second convolution stage is performed with kernels with size 3x3

• Before the fully connected step a dropout with parameter equal to 0.25 is performed.

• The last stage of the net, is formed by linear rectified units followed by a dropout step with parameter equal to 0.5.

Deep Net 2• Deep Net 2 is very similar to the

above network with the difference that the a max pooling layer has been added between the the two convolutional stages.

• The values of the first convolutional step, with kernels with size 3x3, are processed through a max pool layer that performs the non-linear downsampling.

Implementation• To implement the architectures and perform the tests, we

used Keras library.

• Keras is a high-level Python neural networks library, capable of running on top of two of the most important libraries for numerical computation used for deep learning: TensorFlow and Theano.

• The use of higher level libraries like Keras allows us to rapidly produce and test prototypes.

Performances• The performance is evaluated through the comparison between the label of

the sample in the ground truth and the label chosen by the neural networks

• True Positive (TP) counts the samples that have been correctly detected.

• False Positive (FP) is the number of times a wrong label has been assigned to a sample.

• False Negative (FN) is the number of samples that have not been correctly classified.

• True Negative (TN) is referred to the wrong labels that have not been assigned to a sample. For these experiments it has always been set to zero.

Performances TP+TN Acc = TP+TN+FP+FN

TP Prec = TP+FP

TP Rec = TP+FN

Prec * Rec F1 =2 * Prec + Rec

The F1 score can be interpreted as a weighted average of the precision and recall

Performances

Conclusion• Two different deep neural architectures have been tested.

• The two deep neural networks performed better than the chosen baseline that was a multilayer perceptron.

• Between the two nets the second net, with an additional Max Pool layer, was preferred.

• Deep net 2 showed to be more stable than the Deep Net 1 and good performances are produced when a suitable number of filter (more than twenty four) is employed.

Thank you and follow our Lab

https://www.facebook.com/CRSSLAB/

@CRSS_LAB

CRSSLAB