Human Pose Estimation by Deep Learning
Transcript of Human Pose Estimation by Deep Learning
![Page 1: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/1.jpg)
Human Pose Estimation by Deep Learning
Wei YangSupervisor: Prof. WANG Xiaogang, Prof. OUYANG Wanli
IVP Lab, CUHKSeptember 11, 2015
![Page 2: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/2.jpg)
2
Outline
• Introduction• Traditional Approaches• Deep Learning Methods
– Global view (holistic view)
– Local appearance
– Combination of local appearance and global view
– Others
2015/9/11
![Page 3: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/3.jpg)
3
Introduction
• What is articulated body pose estimation? “recovers the pose of an articulated body, which consists of joints and rigid parts using image-based observations.”
2015/9/11
![Page 4: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/4.jpg)
4
Applications
Action recognition Clothing Parsing
Gaming2015/9/11
Human tracking
![Page 5: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/5.jpg)
5
Challenges
2015/9/11
![Page 6: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/6.jpg)
6
Traditional Approaches
Fischler & Elschlager 1973 Felzenszwalb & Huttenlocher 2005
Pictorial Structure• Unary Templates• Pairwise Springs
Yang & Ramanan 2011
Mixtures of “mini-parts”• Mixture of part • Unary template for part with mixture • Pairwise springs between part with
mixture and part with mixture
2015/9/11
headtorso
leg
Example of mini parts: near-vertical and near horizontal limbs
![Page 7: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/7.jpg)
7
Deep Learning for Pose Estimation
• Holistic View– e.g., joints position regression
• Local View– e.g., body parts detection
• Combining global and local information– e.g., body parts detection + joints position regression
• Others– e.g., motion features, pose estimation in videos
2015/9/11
![Page 8: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/8.jpg)
8
Holistic View
DeepPose: Human Pose Estimation via Deep Neural Networks
2015/9/11
![Page 9: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/9.jpg)
9
Holistic Reasoning
2015/9/11
• Why holistic reasoning?– Besides extreme variability in articulations, many of the joints are barely visible
![Page 10: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/10.jpg)
10
DeepPose: A CNN Regressor
2015/9/11
• Network architecture: AlexNet– Krizhevsky, Sutskever, and Hinton, NIPS 2012 (ImageNet)
– The first time deep model is shown to be effective on large scale
[Toshev & Szegedy, CVPR 2014]
![Page 11: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/11.jpg)
11
Results on LSP (Leeds Sports Pose) dataset
2015/9/11
![Page 12: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/12.jpg)
12
Cascade of Pose Regressors
• The pose estimation results are very coarse:– due to its fixed input size of 220 × 220, the network has limited capacity to look
at detail
– Train cascade of pose regressors for more precise joint localization
2015/9/11
![Page 13: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/13.jpg)
13
Cascade of Pose Regressors
2015/9/11
![Page 14: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/14.jpg)
14
Refined pose estimation
2015/9/11
![Page 15: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/15.jpg)
15
Percentage of Correct Parts (PCP) on LSP dataset
2015/9/11
![Page 16: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/16.jpg)
16
Local Appearance Method
Articulated Pose Estimation by a Graphical Modelwith Image Dependent Pairwise Relations
2015/9/11
![Page 17: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/17.jpg)
17
Motivation
• Local image patches are able to capture:– Part presence
– Pairwise part spatial relationships
2015/9/11
Number of mixture type for each pair: 6
Neighbor: 1# of relationships:
Neighbor: 2# of relationships:
Lower arm
Upper arm
[Chen & Yuille NIPS 2014]
![Page 18: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/18.jpg)
18
Tree-structured Relational Graph
– : body parts
– : pairwise relationships between parts
– : Pixel location of part
– Pairwise relationship
– Defined by relative position
– In experiment: 13 type for each pair
2015/9/11
![Page 19: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/19.jpg)
19
Formulation
2015/9/11
𝐹 (𝐩 ,𝐭|𝐼 ;𝝎 ,𝜃 )¿∑𝑖∈𝑉
𝐴𝑖(𝑝𝑖∨𝐼 ;𝜃)
Part presence
𝜔 𝑖 ⋅
Inference: • Tree structure• Can be solved efficiently by dynamic programming
, , are learned by Latent structure SVM
+ ∑(𝑖 , 𝑗 )∈𝐸
𝑅 (𝑝𝑖 ,𝑝 𝑗 , 𝑡𝑖𝑗 , 𝑡 𝑗𝑖∨𝐼 ;𝜃)
Pairwise deformation
+𝝎𝑖𝑗𝑡𝑖𝑗 ⋅𝜔 𝑖𝑗 ⋅
Pairwise Relationship
![Page 20: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/20.jpg)
20
Learning DCNN parameters
2015/9/11
Derive the type label for each patch• use relative position to represent the
pairwise relations• Cluster the relative positions over the
whole training set • Type label : cluster index• Mean relative position : cluster center
![Page 21: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/21.jpg)
21
Casting Full Connections into Convolutions
2015/9/11Elbow
Part presence map
Pairwise relationship map
![Page 22: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/22.jpg)
22
PCP and PDJ on LSP dataset and FLIC dataset
Dataset Method Torso Head U.Leg L.Leg U.Arm L.Arm Mean PCP
LSPDCNN 92.5 85.1 82.7 76.3 70.2 55.9 74.8
Ouyang et al. 85.8 83.1 76.5 72.2 63.3 46.6 68.6
LSP FLIC
2015/9/11
![Page 23: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/23.jpg)
23
Combining Local Appearance and Holistic View
Dual-Source Deep Neural Networks for Human Pose Estimation
2015/9/11
![Page 24: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/24.jpg)
24
Dual-Source CNN
• Integrate both the local part appearance and the holistic view of each local part for more accurate human pose estimation
• Each input is an image pair– Part patches
– Body patches
2015/9/11
![Page 25: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/25.jpg)
25
Part patches: incorporate local appearance
• Generated by region proposals with some restrictions– Not too small (at least contain a body part)
– Not too big (may contain too many body parts and lacks sufficient resolution)
• All classes of joints are covered by similar number of part patches
• During testing, part patches are selected from multi-scale sliding windows
2015/9/11
![Page 26: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/26.jpg)
26
Body patches: holistic view
• Also from region proposals– Must cover all body parts
– In testing stage, the body patch can be generated by human detection
• For DS-CNN, each training sample is made up with 3 components– A part patch
– A body patch
– Binary mask specifying the location of the part patch in body patch
2015/9/11
![Page 27: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/27.jpg)
27
Training of the DS-CNN
2015/9/11
Shared weights Classification( softmax)
Regression(L2 distance)
![Page 28: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/28.jpg)
28
• Part heat map– Same size of input image
– Uniformly distributed probability for each sliding window
– Sum and average over all pixels
Testing
2015/9/11
0.0
0.9
0
K part
![Page 29: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/29.jpg)
29
Testing
• Final pose estimation– Weighted average of predicted joint locations within part patches with high
responses.
2015/9/11
![Page 30: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/30.jpg)
30
Results: PCP on LSP
2015/9/11
![Page 31: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/31.jpg)
31
Other Methods & Applications
• MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation
• Flowing ConvNets for Human Pose Estimation in Videos
2015/9/11
![Page 32: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/32.jpg)
32
Using Motion Features for Human Pose Estimation
• motion is a powerful visual cue that alone can be used to extract high-level information, including articulated pose.
2015/9/11
Image credit: Large displacement optical flow: descriptor matching in variational motion estimationThomas Brox, J. Malik. IEEE TPAMI, 33(3): 500-513, 2011
![Page 33: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/33.jpg)
33
Modeep: Using Motion Features for Human Pose Estimation
• Extended Frames Labeled In Cinema (FLIC) dataset with additional motion features
2015/9/11
MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation. Arjun et. al., ACCV 2014
Average of frame pair Optical flow
![Page 34: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/34.jpg)
34
Multi-resolution efficient sliding window model
2015/9/11
![Page 35: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/35.jpg)
35
Simple Spatial Model
• FLIC: multiple people with only one annotated person• Testing: incorporate annotated torso position with simple
spatial model
2015/9/11
Predicted left shoulder Spatial mask of left shoulder Result
![Page 36: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/36.jpg)
36
Experiment results
2015/9/11
Without motion feature
With motion feature
occlusion Cluttered bg Motion blur
![Page 37: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/37.jpg)
37
Flowing ConvNets for Human Pose Estimation in Videos
2015/9/11
• CNN can benefit from temporal context by combining information across the multiple frames using optical flow.
![Page 38: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/38.jpg)
38
Spatial ConvNet
2015/9/11
Why regression heatmap instead of joint coordinates?• The network can be multi-modal• regressing coordinates directly is a highly
non-linear and more difficult to learn mapping
![Page 39: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/39.jpg)
39
Warping neighbouring heatmaps for improving pose estimates
• Heatmaps from frames (t − n) and (t + n) warped to frame t using tracks from optical flow (green & blue lines) can help refine the wrongly estimated part location
2015/9/11
![Page 40: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/40.jpg)
40
Results
2015/9/11
![Page 41: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/41.jpg)
41
• End-to-end pose estimation– Joint learning of pose features and pose configurations
– Allow local appearance to be fine-tuned by pose configuration
Ongoing Project
2015/9/11
Unary response
Pairwise relationships
…
![Page 42: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/42.jpg)
42
Ongoing Project
2015/9/11
Pairwise relationships
… 𝑥𝑡 −2 𝑥𝑡 −1 𝑥𝑡 𝑥𝑇
𝑥𝑡 𝑥𝑡+1𝑥𝑡 −1
𝑤𝑑𝑡 𝑤𝑑𝑡 𝑤𝑑𝑡
𝑤𝑚 𝑤𝑚 𝑤𝑚
() () ()
𝑧𝑡 𝑧𝑡+1𝑧𝑡 −1Add constraints between body parts in a network
Distance transform
Unary response
![Page 43: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/43.jpg)
43
Preliminary Results (PCP on LSP)
2015/9/11
• Future work– Pose relational graph learning
– Multi-task learning• Human detection
• Human segmentation
– Combining global information
Head Torso U.arms L.arms U.legs L.legs mean 84.7 91 68.7 53.6 80.7 73.3 72.82
![Page 44: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/44.jpg)
44
Recent developments
• Deeppose: Human pose estimation via deep neural networks– A Toshev, C Szegedy – CVPR, 2014
• Joint training of a convolutional network and a graphical model for human pose estimation– JJ Tompson, A Jain, Y LeCun, C Bregler – NIPS, 2014
• Human Pose Estimation with Iterative Error Feedback – Carreira, Joao, et al. arXiv preprint arXiv:1507.06550 (2015).
• Maximum-Margin Structured Learning with Deep Networks for 3D Human PoseEstimation – S Li, W Zhang, AB Chan - arXiv preprint arXiv:1508.06708, 2015
• Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network – S Li, ZQ Liu, AB Chan – CVPR Workshop, 2014
• Flowing ConvNets for Human Pose Estimation in Videos – T Pfister, J Charles, A Zisserman - ICCV, 2015
• R-CNNs for Pose Estimation and Action Detection – G Gkioxari, B Hariharan, R Girshick, J Malik - arXiv preprint arXiv:1406.5212, 2014
• MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation – A Jain, J Tompson, Y LeCun, C Bregler -ACCV 2014
• Efficient object localization using convolutional networks– J Tompson, R Goroshin, A Jain, Y LeCun, C Bregler – CVPR, 2015
• Combining Local Appearance and Holistic View: Dual-Source Deep Neural Networks for Human Pose Estimation– Xiaochuan Fan, Kang Zheng, Yuewei Lin, Song Wang, CVPR 2015
• Parsing Occluded People by Flexible Compositions– Xianjie Chen, Alan L. Yuille. CVPR 2015
• Articulated pose estimation by a graphical model with image dependent pairwise relations– X Chen, AL Yuille –NIPS, 2014
• …
2015/9/11
![Page 45: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/45.jpg)
Thank you
Human Pose Estimation by Deep LearningWei Yang
IVP Lab, CUHKSeptember 11, 2015
![Page 46: Human Pose Estimation by Deep Learning](https://reader038.fdocuments.in/reader038/viewer/2022102811/587066c81a28ab48378b517b/html5/thumbnails/46.jpg)
46
Evaluation Metrics
• Percentage of Correct Parts (PCP)– measures the percentage of correctly localized body parts.
– A candidate body part is treated as correct if its segment endpoints lie within 50% of the length of the ground-truth annotated endpoints.
• Percentage of Detected Joints (PDJ)– measures the performance using a curve of the percentage of correctly localized
joints by varying localization precision threshold, which is normalized by the scale defined as distance between left shoulder and right hip
– invariant to scale
2015/9/11