Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples
-
Upload
selma-terrell -
Category
Documents
-
view
32 -
download
0
description
Transcript of Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples
![Page 1: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/1.jpg)
Modeling Scene and Object Contexts for Human Action
Retrieval with Few Examples
Yu-Gang JiangZhenguo Li
Shih-Fu ChangIEEE Transactions on CSVT 2011
![Page 2: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/2.jpg)
Outline
• Context-based Action Retrieval Framework• Experiment Result• Conclusion
![Page 3: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/3.jpg)
Framework
A. Video Representation and Negative Sample Selection
B. Obtaining Action Context1. Scene Recognition2. Object Recognition
C. Estimating Action-Scene-Object Relationship
D. Incorporationg Multiple Contextual Cues
![Page 4: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/4.jpg)
Context-Based Action Retrival Framework
![Page 5: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/5.jpg)
A. Video Representation and Negative Sample Selection
• Use the bag-of-features framework
![Page 6: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/6.jpg)
A. Video Representation and Negative Sample Selection
• Use the bag-of-features framework• Use k-means clustering to generate 4000
visual words
![Page 7: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/7.jpg)
A. Video Representation and Negative Sample Selection
• Use the bag-of-features framework• Use k-means clustering to generate 4000
visual words• Quantize each video clip into two 4000-D
histograms of visual words
![Page 8: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/8.jpg)
A. Video Representation and Negative Sample Selection
• Use the bag-of-features framework• Use k-means clustering to generate 4000
visual words• Quantize each video clip into two 4000-D
histograms of visual words• Apply Local and Global Consistency(LGC) [27]
• Pick negative samples after propagation
[27] D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Scholkopf, “Learning with local and global consistency,” in Proc. Neural Inform. Process. Syst., 2004, pp. 321–328.
![Page 9: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/9.jpg)
Context-Based Action Retrival Framework
![Page 10: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/10.jpg)
B. Scene Recognition
• Train different classifiers for two bag-of-features and simply average their probability predictions
• The scene models are learned by SVM• Adopt 10 scene classes
House Road Bedroom Car Interior Hotel
Kitchen Living Room Office Restaurant Shop
![Page 11: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/11.jpg)
B. Object Recognition
• It can only detect person, chair and car• Define actions– Track objects based on location and box size– Discard isolated detections
• Compute average spatial distance between different types of object
![Page 12: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/12.jpg)
B. Object Recognition
![Page 13: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/13.jpg)
Context-Based Action Retrival Framework
![Page 14: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/14.jpg)
C. Estimating Action-Scene-Object Relationship
• Define context-based inference score
– Well distinguish samples from P and N
– Produce similar scores if two samples are close
![Page 15: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/15.jpg)
C. Estimating Action-Scene-Object Relationship
• F : prediction matrix of contextual cues• c : coefficient vector
...
…m contextual cues
n training samples
c
F × ...
![Page 16: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/16.jpg)
C. Estimating Action-Scene-Object Relationship
Constraint 1 Constraint 2
![Page 17: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/17.jpg)
Context-Based Action Retrival Framework
![Page 18: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/18.jpg)
D. Incorporating Multiple Contextual Cues
• Given an action a and a test sample x
: context weight parameter: the prediction score of contextual cues on x: action prediction score based on raw visual features: refined prediction after incorporating contextual cues
AnswerPhone DriveCar Eat Kiss GetOutCar HandShake
FightPerson HugPerson Run SitDown SitUP StandUp
![Page 19: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/19.jpg)
Experiment Results
• Mean average precision(mAP)• Retrieval Performance by Raw Features
![Page 20: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/20.jpg)
Experiment Results
• Scene vs. Object
![Page 21: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/21.jpg)
Experiment Results
• Scene vs. Object
![Page 22: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/22.jpg)
Experiment Results
• Comparison to the state of art– SVM learning– Movie script-mining
![Page 23: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples](https://reader030.fdocuments.in/reader030/viewer/2022033104/56813240550346895d98af4f/html5/thumbnails/23.jpg)
Conclusion
• An algorithm based on semi-supervised learning paradigm is used to model action-scene-object dependency from limited samples
• This algorithm can be applied to other types of action videos