Post on 05-Feb-2021
FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding
Dian Shao, Yue Zhao, Bo Dai, Dahua Lin
CVPR 2020 Oral
STRUCT Group Paper Reading
Presented by Xiao Wu2020.4.19
Outline◦Authorship◦ Introduction◦Dataset◦Experiments◦Conclusion
2020/4/25 STRUCT PAPER READING 2
Introduction
2020/4/25 STRUCT PAPER READING 3
◦Coarse-grained Action Dataset:◦ UCF101: only category labels◦ THUMOS, ActivityNet: + temporal locations◦ or: + spatial-temporal bounding boxes
◦Problems:◦ Background > Action, e.g. hockey vs gymnastics
Introduction
2020/4/25 STRUCT PAPER READING 4
◦Fine-grained Action Dataset:◦ Breakfast: action-> many units, e.g., juice = cut orange + …◦ Diving48: label-> 4 attributes, e.g., diving = back + 15som + 15twist +free
◦Problems:◦ Limited classes (~50)◦ Limited structure hierarchy
Introduction
2020/4/25 STRUCT PAPER READING 5
◦ FineGym with rich annotation◦ Recognition, detection, auto-scoring, generation…◦ 3 semantical level, 2 temporal level
Outline◦Authorship◦Related Work◦Dataset◦Experiments◦Conclusion
2020/4/25 STRUCT PAPER READING 6
Dataset
2020/4/25 STRUCT PAPER READING 7
Dataset
2020/4/25 STRUCT PAPER READING 8
◦ Gymnastics = 10 action (6 man + 4 woman)4 woman action(跳马,平衡木,自由体操,高低杠)
= 15 subaction(e.g.,平衡木转身)= 530 subsubaction(e.g., 3次屈体转体)
◦ Release:◦ Gym99 – balanced distribution◦ Gym288, Gym530
Dataset
2020/4/25 STRUCT PAPER READING 9
◦ Stats:◦ action(~55s), subaction(~2s)◦ Mostly 720p+
Outline◦Authorship◦Related Work◦Dataset◦Experiments◦Conclusion
2020/4/25 STRUCT PAPER READING 10
Experiments
2020/4/25 STRUCT PAPER READING 11
◦ Event/Set (action, subaction) Recognition◦ 3 frames is enough for event-/set-level recognition◦ RGB > Flow at this level
Experiments
2020/4/25 STRUCT PAPER READING 12
◦ Element (subsubaction) Recognition◦ Long-tail overfitting◦ Flow > RGB in fine-grained◦ TSM, TRN > TSN
◦ Temporal dynamics is important◦ Pretrained ImageNet≈Kinetics◦ Skeleton methods suffer from
estimation
Experiments
2020/4/25 STRUCT PAPER READING 13
Experiments
2020/4/25 STRUCT PAPER READING 14
Experiments
2020/4/25 STRUCT PAPER READING 15
◦ Temporal Action Localization◦ Localizing sub-actions is much more challenging
Experiments
2020/4/25 STRUCT PAPER READING 16
◦ Ablation on sparse sampling◦ Accuracy saturated slowly when # Frame increases◦ Every frame counts in fine-grained action◦ Sample rate on UCF(2.7%, 5-frame), on FineGym(30%, 12)
Experiments
2020/4/25 STRUCT PAPER READING 17
◦ Other Ablations◦ (a) Flow contributes in subsubaction recognition◦ (b) Frame order matters significantly in TRN
Experiments
2020/4/25 STRUCT PAPER READING 18
◦ Other Ablations◦ (c) TSN is more robust than TSM when # test frames vary◦ (d) on UCF101, pretrain I3D 84.5%->97.9%
on FineGym, not helpfulHypothesis: gaps in terms of temporal pattern
Experiments
2020/4/25 STRUCT PAPER READING 19
◦ Challenging Classes from Confusion Matrix◦ Intense motion (e.g. salto, often < 1s)
◦ Subtle spatial semantics (e.g. legs bent or straight)
◦ Complex temporal dymamics (e.g. direction of motion, degree ofrotation, counting times of saltos)
Outline◦Authorship◦ Introduction◦Dataset◦Experiments◦Conclusion
2020/4/25 STRUCT PAPER READING 20
Conclusion
2020/4/25 STRUCT PAPER READING 21
◦ Coarse -> Fine-grained, RGB -> Flow
◦ Temporal localization: not well solved for fine-grained dataset
◦ Sparse Sampling: disapproved on fine-grained dataset
◦ Shuffle frame will degrade TRN, increase #test frame will degradeTSM
◦ Pretrained model hard to transfer
Thank you!Presented by Xiao Wu
2020/4/25 STRUCT PAPER READING 22