Continuous human action recognition in ambient assisted living scenarios
-
Upload
francisco-paco-florez-revuelta -
Category
Technology
-
view
182 -
download
1
description
Transcript of Continuous human action recognition in ambient assisted living scenarios
Francisco Flórez-Revuelta
Continuous Human Action Recognition in Ambient Assisted Living Scenarios
International Workshop on Enhanced Living Environments (ELEMENT 2014) Würzburg, Germany
Alexandros Andre Chaaraoui
Human action recognition in AAL Human action recognition with a bag of key poses Continuous human action recognition Experimentation
Overview
Architecture for AAL
Camera 1
Motion Detection
Human Behaviour Analysis
Multi-view Human Behaviour Analysis
Privacy
Reasoning System
Alarm Actuators
Camera 2
Motion Detection
Human Behaviour Analysis
Camera N
Motion Detection
Human Behaviour Analysis
...
Setup and Profiles DB (Activities, Inhabitants, Objects, ...)
Log
Event
Long-term analysis
...
...
Environmental Sensor Information
Caregiver
Human behaviour analysis
Bag of key poses
Use with RGB and RGB-D data
Results with RGB data
Weizmann MuHAVi-‐8
MuHAVi-‐14
DHA
IXMAS
Results with RGB-D data
More information
Original method Chaaraoui, A.A.; Climent-Pérez, P.; Flórez-Revuelta, F.: Silhouette-based Human Action Recognition using Sequences of Key Poses, Pattern Recognition Letters, 34(15):1799–1807, 2013. Multi-view action recognition Chaaraoui, A.A.; Climent-Pérez, P.; Flórez-Revuelta, F.: An Efficient Approach for Multi-view Human Action Recognition Based on Bag-of-Key-Poses, Lecture Notes in Computer Science, 7559:29-40, 2012. Evolutionary optimisation Chaaraoui, A.A.; Flórez-Revuelta, F.: Optimizing human action recognition based on a cooperative coevolutionary algorithm, Engineering Applications of Artificial Intelligence, Volume 31:116–125, 2014. Incremental learning Chaaraoui, A.A.; Flórez-Revuelta, F.: Adaptive Human Action Recognition With an Evolving Bag of Key Poses, IEEE Transactions on Autonomous Mental Development, 6(2):139-152, 2014 Use of RGB-D data Chaaraoui, A.A.; Padilla-López, J.R.; Climent-Pérez, P.; Flórez-Revuelta, F.: Evolutionary joint selection to improve human action recognition with RGB-D devices, Expert Systems with Applications, 41(3):786-794, 2014.
The previous methods work with pre-segmented sequences However, accurate recognition and outstanding temporal performance led us to extend it for continuous scenarios A sliding and growing window is used to process the continuous stream at different overlapping locations and scales A null class is considered in order to discard unknown actions and avoid false positives. This class corresponds to all the behaviours that may be observed and have not been modelled during the learning Continuous human action recognition is performed by detecting and classifying action zones
Continuous recognition
Action sequences may contain irrelevant segments which are common among actions and therefore ambiguous for classication Action zones = most discriminative segments with respect to the other action classes in the course of an action Action zones are shorter than the original sequences. Then, the matching time will be signicantly reduced This will allow to consider a larger number of sliding windows at every moment
Action zones
The bag of key poses is built similarly to the original method The discrimination value of each key pose wkp is obtained:
For each training sequence of action class a and specific temporal instant t: 1. For each action class a, the nearest neighbour key pose kpa(t) is obtained
2. The raw class evidence values for all the classes
Learning of action zones
The bag of key poses is built similarly to the original method The discrimination value of each key pose wkp is obtained:
For each training sequence of action class a and specific temporal instant t: 1. For each action class a, the nearest neighbour key pose kpa(t) is obtained
2. The raw class evidence values for all the classes
Learning of action zones
The bag of key poses is built similarly to the original method The discrimination value of each key pose wkp is obtained:
For each training sequence of action class a and specific temporal instant t: 1. For each action class a, the nearest neighbour key pose kpa(t) is obtained
2. The raw class evidence values for all the classes
3. Normalisation is applied with respect to the highest value observed:
Learning of action zones
4. Gaussian smoothing is performed centered in the current frame, considering only the frames from a temporal instant u ≤ t
5. The final class evidence H (t ) is obtained by attenuating the resulting value:
Learning of action zones
4. Gaussian smoothing is performed centered in the current frame, considering only the frames from a temporal instant u ≤ t
5. The final class evidence H (t ) is obtained by attenuating the resulting value:
Action zones are detected by defining thresholds HT1(t), HT2(t),…, HTA(t) So, for a sequence belonging to an action class, the action zone is determined by the frames where
Learning of action zones
Then, the action zones for every learning sequence constitute the knowledge base A sliding and growing window is used to process the continuous stream at different overlapping locations and scales These segments of key poses are compared with the learned action zones using DTW In some cases, even the nearest key pose is very different to the input frame Therefore, a set of threshold parameters DT1, DT2, …, DTA indicate the highest allowed distance to trigger the recognition If the match is not good enough, the frame is labelled as null class Continuous recognition
Two sets of parameters to establish: HT1(t), HT2(t),…, HTA(t) and DT1, DT2, …, DTA Set with an evolutionary algorithm that finds the best performing combination for both sets Comparison with the ground truth is obtained at segment level An action must be recognised with a delay lower than τ frames Experimentation
Validation with the multi-view IXMAS and the single-view Weizmann datasets The windows grows in 5-frame steps and when lengthmax is reached, it slides 10 frames A delayed recognition is accepted for τ = 60 frames ≅ 2 seconds
Experimentation
NOTE: Approach 1: Use of action zones Approach 2: Use of the whole sequences
Francisco (Paco) Flórez-Revuelta [email protected] @fflorezrevuelta www.dtic.ua.es/~florez franciscoflorezrevuelta