Classification of indoor actions through deep neural networksAgnese Augello1, Umberto Maniscalco1, Filippo Vella1, Vincenzo Bentivenga2
and Salvatore Gaglio1,2 !
1 Institute for High Performance Computing and Networking Cognitive Robotics & Social Sensing Lab
CNR. Palermo, Italy 2 DICGIM - Università degli studi di Palermo. Palermo, Italy
What's the matter
• Non-invasive monitoring of elderly people, inside their domestic environment, in order to guarantee their safety.
• Designing an automatic system capturing movements and activities of a person through accelerometers and RGBd cameras.
• Classification of movements and activities by a Deep Convolutional Neural Network.
What's the matter
M O N I T O R
C L A S S I F I C A T I O N
Row data Label
The dataset
• We have used the dataset of the SPHERE (Sensor Platform for HEalthcare in Residential Environments) project: http://irc-sphere.ac.uk/sphere-challenge/home
• The dataset contains measurements from RGB-d cameras and accelerometers, collected asking a set of trained people to perform specific actions in an indoor environment.
The dataset
• The samples have been manually annotated with one of the given labels.
• The values sampled by accelerometer, RGB-D camera data are arranged in a vector of eighteen values.
The dataset• x, y, z: acceleration along the x, y, z axes
• x, y of the center of the bounding box
• x, y of the bottom right corner of the bounding box
• x, y of the top left corner of the bounding box
• x, y and z for the centre of the 3D bounding box
• x, y and z for the bottom right back corner of the 3D bounding box
• x, y and z for the front left top corner of the 3D bounding box
3
2
2
2
3
3
3
18
Labels (22)
• The actions: ascent stairs, descent stairs, jump, walk with load, walk.
• The positions: bending, kneeling, lying, sitting, squatting, standing.
• The transitions: stand-to-bend, kneel-to-stand, lie- to-sit, sit-to-lie, sit-to-stand, stand-to-kneel, stand-to-sit, bend-to-stand, turn.
3
2
2
2
3
3
3
18
Labels (5)• All the transition labels have been clustered together in a
simple label transition. The classes according the walking have been merged together in a single class. The final labels are:
• bending • standing • lying • sitting • transition
Data sampling
M O N I T O R
Row data
3222333
t1
3222333
t2
3222333
tn
20 Hz
Down Sampling
3222333
t1
3222333
t2
3222333
tn
Final Sampling 2 Hz
Arranging data
M O N I T O R
Row data
D O W N !
S A M P L I N G
3222333
t3
3222333
t4
3222333
t12
FIFO 5 seconds
3222333
t1
3222333
t2
3222333
t13
3222333
t14
Classification
• MLP Multilayer Perceptron used as baseline.
• Convolutional Neural Networks composed by a convolutional layer followed by a second convolutional layer and a max pool stage.
• Convolutional Neural Networks composed by a convolutional layer followed by a max pool stage, followed by a second convolutional layer and a max pool stage.
Deep Net 1• The first convolutional stage is
performed with kernels with size 3x3
• The second convolution stage is performed with kernels with size 3x3
• Before the fully connected step a dropout with parameter equal to 0.25 is performed.
• The last stage of the net, is formed by linear rectified units followed by a dropout step with parameter equal to 0.5.
Deep Net 2• Deep Net 2 is very similar to the
above network with the difference that the a max pooling layer has been added between the the two convolutional stages.
• The values of the first convolutional step, with kernels with size 3x3, are processed through a max pool layer that performs the non-linear downsampling.
Implementation• To implement the architectures and perform the tests, we
used Keras library.
• Keras is a high-level Python neural networks library, capable of running on top of two of the most important libraries for numerical computation used for deep learning: TensorFlow and Theano.
• The use of higher level libraries like Keras allows us to rapidly produce and test prototypes.
Performances• The performance is evaluated through the comparison between the label of
the sample in the ground truth and the label chosen by the neural networks
• True Positive (TP) counts the samples that have been correctly detected.
• False Positive (FP) is the number of times a wrong label has been assigned to a sample.
• False Negative (FN) is the number of samples that have not been correctly classified.
• True Negative (TN) is referred to the wrong labels that have not been assigned to a sample. For these experiments it has always been set to zero.
Performances TP+TN Acc = TP+TN+FP+FN
TP Prec = TP+FP
TP Rec = TP+FN
Prec * Rec F1 =2 * Prec + Rec
The F1 score can be interpreted as a weighted average of the precision and recall
Performances
Performances
Conclusion• Two different deep neural architectures have been tested.
• The two deep neural networks performed better than the chosen baseline that was a multilayer perceptron.
• Between the two nets the second net, with an additional Max Pool layer, was preferred.
• Deep net 2 showed to be more stable than the Deep Net 1 and good performances are produced when a suitable number of filter (more than twenty four) is employed.
Thank you and follow our Lab
https://www.facebook.com/CRSSLAB/
@CRSS_LAB
CRSSLAB
Top Related