Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND...
-
Upload
kevin-mills -
Category
Documents
-
view
217 -
download
0
Transcript of Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND...
Action and Gait Recognition Action and Gait Recognition FromFromRecovered 3-D Human JointsRecovered 3-D Human Joints
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST 2010
Junxia Gu, Member, IEEE, Xiaoqing Ding, Senior Member, IEEE,
Shengjin Wang, Member, IEEE, and Youshou Wu
Adviser : Ming-Yuan Shieh
Student : shun-te chuang
ppt 製作: 100%
OutlineOutline
ABSTRACTINTRODUCTIONPREVIOUS WORKFULL-BODY TRACKING METHOD
FOR POSE RECOVERYCLASSIFICATIONEXPERIMENTAL RESULTS AND ANALYSISCONCLUSION
AbstractA common viewpoint-free framework that fuses pose
recovery and classification for action and gait recognition is presented in this paper.
First, a markerless pose recovery method is adopted to automatically capture the 3-D human joint and pose parameter sequences from volume data.
Second, multiple configuration features (combination
of joints) and movement features (position, orientation, and height of the body) are extracted from the recovered 3-D human joint and pose parameter sequences.
AbstractA hidden Markov model (HMM) and an exemplar-
based HMM are then used to model the movement features and configuration features, respectively.
Finally, actions are classified by a hierarchical
classifier that fuses the movement features and the configuration features, and persons are recognized from their gait sequences with the configuration features.
INTRODUCTIONINTRODUCTION
VIDEO-BASED study of human motion has been receiving increased attention in the past decades.
This has been motivated by the desire for application of intelligent video surveillance and human–computer interaction.
With increased awareness in security issues, motion analysis is becoming increasingly important in surveillance systems.
INTRODUCTIONINTRODUCTION
Action recognition is a new requirement for understanding what the person is doing.
Current intelligent surveillance systems are in urgent need of noninvasive and viewpoint-free research on motion analysis.
This paper focuses on the movement of main body segments (arms, legs, and torso). A human gait is extracted from a “walk” action.
INTRODUCTIONINTRODUCTION
In this paper, a vision-based markerless pose recovery approach is proposed to extract 3-D human joints.
Human joint sequence is one of the most effective and discriminative representations of human motion.
It contains much information, including position, orientation, and joint position.
The information is categorized into two types: movement features and configuration features.
INTRODUCTIONINTRODUCTION
The changes of position, orientation, and height of the body, which describe the global movement of the subject, are defined as movement features.
The sequences of human joint positions, which describe the change of relative configuration of body segments, are defined as configuration features.
A hidden Markov model (HMM) and an exemplar-based HMM (EHMM) are employed to characterize the movement and configuration features, respectively.
Both the HMM and the EHMM have been used to recognize an action and a gait .
PREVIOUS WORKPREVIOUS WORK
1. Appearance-Based Methods: Appearance-based approaches are widely used in
action and gait representation. They directly represent human motion using image information, such as a silhouette, an edge, and an optical flow.
2. Human Model-Based Methods: Human model-based approaches represent an action
or a gait with body segments, joint positions, or pose parameters.
PREVIOUS WORKPREVIOUS WORK
This paper combined stochastic search with a gradient descent for local pose refinement to recover complex whole body motion.
The initialization of the model was automatic, with an
initialization pose standing upright with his/her arms and legs spread in the “Da Vinci” pose.
The tracking speed was below 1 s per frame. In this paper, an adaptive particle filter method is proposed for pose recovery.
PREVIOUS WORKPREVIOUS WORK
First, the whole body of a subject in each frame is segmented into several body segments.
A particle filter with an adaptive particle number is then used to track each body segment.
This method decomposes the search space and reduces the computational complexity.
FULL-BODY TRACKING FULL-BODY TRACKING METHODMETHODFOR POSE RECOVERYFOR POSE RECOVERY
Human Model
Human Model-Based Full-Body Tracking
CLASSIFICATIONCLASSIFICATIONA. HMM and EHMM Learning
B. Classifier for Gait Recognition
C. Classifier for Action Recognition
A. HMM and EHMM A. HMM and EHMM LearningLearningThe EHMM is different from the HMM in the definition of observation densities. For the HMM, the general representation of observation densities is the Gaussian mixture model (GMM) of the following form:
In the EHMM, the definition of the observation probability is as follows:
B. Classifier for Gait B. Classifier for Gait RecognitionRecognition
For each person c ∈ {1, . . . , Cp} in the database, we learn an EHMM gait model λ(c) Sgait with features Sgait and an EHMM gait model λ(c) Lgait with features Lgait. A testing gait sequence Y = {y0, . . . , yK} is classified with the following MAP estimation:
C. Classifier for Action C. Classifier for Action RecognitionRecognitionTesting action sequence Y = {y0, . . . , yK}, with features SY , RY , PY , OY , and HY , is classified with a two-layer classifier fused multiple features. The first layer is a weighted-MAP classifier that fuses three movement features and theconfiguration feature of the whole body as
If the decision c1 of the first layer belongs to single-arm actions, sequence Y will be recognized by the second MAP classifier with arm features as
EXPERIMENTAL RESULTS EXPERIMENTAL RESULTS AND ANALYSISAND ANALYSIS
There are 11 actions, and each subject plays each action three times. All samples contain approximately 29 100 frames in total.
These actions include “check watch,” “cross arm,” “scratch head,” “sit down,” “get up,” “turn around,” “walk in a circle,” “wave hand,” “punch,” “kick,” and “pick up,” .To demonstrate view invariance, subjects freely change their orientations. The acquisition is achieved using five standard firewire cameras.
The image resolution is 390 × 291 pixels, and the volume of interest is divided into 64 × 64 × 64 voxels.
EXPERIMENTAL RESULTS EXPERIMENTAL RESULTS AND ANALYSISAND ANALYSIS
Fig. 8. Samples of images and 3-D volume data.
EXPERIMENTAL RESULTS EXPERIMENTAL RESULTS AND ANALYSISAND ANALYSIS
Fig. 9. Annotation of the human joint. (a) Annotation of the knee joint. (b) Results of annotation.
EXPERIMENTAL RESULTS EXPERIMENTAL RESULTS AND ANALYSISAND ANALYSIS
Fig. 11. Average error of individual joint positions.
Fig. 10. Average error of the joint positions.
EXPERIMENTAL RESULTS EXPERIMENTAL RESULTS AND ANALYSISAND ANALYSIS
Fig. 12. Results of the pose recovery.
EXPERIMENTAL RESULTS EXPERIMENTAL RESULTS AND ANALYSISAND ANALYSIS
Fig. 13. Selected exemplars and recognition rate versus the number of exemplars.
EXPERIMENTAL RESULTS EXPERIMENTAL RESULTS AND ANALYSISAND ANALYSIS
Fig. 14. Comparison of convergence performance between the EHMM andthe HMM.
EXPERIMENTAL RESULTS EXPERIMENTAL RESULTS AND ANALYSISAND ANALYSIS
Fig. 15. Average recognition rates of actions.
CONCLUSIONCONCLUSION
The main contribution of this paper is the fusion of pose recovery and motion recognition.
Future work plans include automatically segmenting temporal sequence, reducing computational complexity, analyzing actions that are more complex, and recognizing 2-D actions based on the 3-D EHMM.
The free-viewpoint 3-D human joint sequence contains a significant amount of information for motion analysis.
In addition to representing the single actions used in this paper, it can be used for more applications, such as analysis of complex actions.
CONCLUSIONCONCLUSION
High DOF and huge 3-D points make the human model-based pose recovery method very time consuming.
To solve this problem, parallel computing, code optimization, and a GPU can be used to lessen time cost. It is difficult to obtain robust volume data of subjects in surveillance and content analysis scenarios at present.
Actions/gaits are affected by different factors, including clothing, age, and gender. In the future, performance with these factors will be analyzed in larger databases.