Continuous human action recognition in ambient assisted living scenarios

Francisco Flórez-Revuelta

Continuous Human Action Recognition in Ambient Assisted Living Scenarios

International Workshop on Enhanced Living Environments (ELEMENT 2014) Würzburg, Germany

Alexandros Andre Chaaraoui

Human action recognition in AAL Human action recognition with a bag of key poses Continuous human action recognition Experimentation

Overview

Architecture for AAL

Camera 1

Motion Detection

Human Behaviour Analysis

Multi-view Human Behaviour Analysis

Privacy

Reasoning System

Alarm Actuators

Camera 2

Motion Detection


Camera N

Motion Detection


...

Setup and Profiles DB (Activities, Inhabitants, Objects, ...)

Log

Event

Long-term analysis

...

...

Environmental Sensor Information

Caregiver

Human behaviour analysis

Bag of key poses

Use with RGB and RGB-D data

Results with RGB data

Weizmann MuHAVi-‐8

MuHAVi-‐14

DHA

IXMAS

Results with RGB-D data

More information

Original method Chaaraoui, A.A.; Climent-Pérez, P.; Flórez-Revuelta, F.: Silhouette-based Human Action Recognition using Sequences of Key Poses, Pattern Recognition Letters, 34(15):1799–1807, 2013. Multi-view action recognition Chaaraoui, A.A.; Climent-Pérez, P.; Flórez-Revuelta, F.: An Efficient Approach for Multi-view Human Action Recognition Based on Bag-of-Key-Poses, Lecture Notes in Computer Science, 7559:29-40, 2012. Evolutionary optimisation Chaaraoui, A.A.; Flórez-Revuelta, F.: Optimizing human action recognition based on a cooperative coevolutionary algorithm, Engineering Applications of Artificial Intelligence, Volume 31:116–125, 2014. Incremental learning Chaaraoui, A.A.; Flórez-Revuelta, F.: Adaptive Human Action Recognition With an Evolving Bag of Key Poses, IEEE Transactions on Autonomous Mental Development, 6(2):139-152, 2014 Use of RGB-D data Chaaraoui, A.A.; Padilla-López, J.R.; Climent-Pérez, P.; Flórez-Revuelta, F.: Evolutionary joint selection to improve human action recognition with RGB-D devices, Expert Systems with Applications, 41(3):786-794, 2014.

The previous methods work with pre-segmented sequences However, accurate recognition and outstanding temporal performance led us to extend it for continuous scenarios A sliding and growing window is used to process the continuous stream at different overlapping locations and scales A null class is considered in order to discard unknown actions and avoid false positives. This class corresponds to all the behaviours that may be observed and have not been modelled during the learning Continuous human action recognition is performed by detecting and classifying action zones

Continuous recognition

Action sequences may contain irrelevant segments which are common among actions and therefore ambiguous for classication Action zones = most discriminative segments with respect to the other action classes in the course of an action Action zones are shorter than the original sequences. Then, the matching time will be signicantly reduced This will allow to consider a larger number of sliding windows at every moment

Action zones

The bag of key poses is built similarly to the original method The discrimination value of each key pose wkp is obtained:

For each training sequence of action class a and specific temporal instant t: 1.  For each action class a, the nearest neighbour key pose kpa(t) is obtained

2.  The raw class evidence values for all the classes

Learning of action zones

The bag of key poses is built similarly to the original method The discrimination value of each key pose wkp is obtained:

For each training sequence of action class a and specific temporal instant t: 1.  For each action class a, the nearest neighbour key pose kpa(t) is obtained

2.  The raw class evidence values for all the classes

3. Normalisation is applied with respect to the highest value observed:


4.  Gaussian smoothing is performed centered in the current frame, considering only the frames from a temporal instant u ≤ t

5.  The final class evidence H (t ) is obtained by attenuating the resulting value:


4.  Gaussian smoothing is performed centered in the current frame, considering only the frames from a temporal instant u ≤ t

5.  The final class evidence H (t ) is obtained by attenuating the resulting value:

Action zones are detected by defining thresholds HT1(t), HT2(t),…, HTA(t) So, for a sequence belonging to an action class, the action zone is determined by the frames where


Then, the action zones for every learning sequence constitute the knowledge base A sliding and growing window is used to process the continuous stream at different overlapping locations and scales These segments of key poses are compared with the learned action zones using DTW In some cases, even the nearest key pose is very different to the input frame Therefore, a set of threshold parameters DT1, DT2, …, DTA indicate the highest allowed distance to trigger the recognition If the match is not good enough, the frame is labelled as null class Continuous recognition

Two sets of parameters to establish: HT1(t), HT2(t),…, HTA(t) and DT1, DT2, …, DTA Set with an evolutionary algorithm that finds the best performing combination for both sets Comparison with the ground truth is obtained at segment level An action must be recognised with a delay lower than τ frames Experimentation

Validation with the multi-view IXMAS and the single-view Weizmann datasets The windows grows in 5-frame steps and when lengthmax is reached, it slides 10 frames A delayed recognition is accepted for τ = 60 frames ≅ 2 seconds

Experimentation

NOTE: Approach 1: Use of action zones Approach 2: Use of the whole sequences

Francisco (Paco) Flórez-Revuelta [email protected] @fflorezrevuelta www.dtic.ua.es/~florez franciscoflorezrevuelta

Continuous human action recognition in ambient assisted living scenarios

Technology

Transcript of Continuous human action recognition in ambient assisted living scenarios