WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object...
Transcript of WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object...
![Page 1: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/1.jpg)
WHU-NERCMS at TRECVID2016:Instance Search Task
November 14, 2016 NIST
TRECVID 2016 Workshop
Z. Wang, Y. Yang, S. Guan, C. Han, J. Lan, R. Shao, J. Wang, C. Liang
National Engineering Research Center for Multimedia Software, School of Computer, Wuhan University
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
![Page 2: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/2.jpg)
1IntroductionProblem and Motivation
Proposed ApproachFramework and Details
Results4 runs
Conclusion
Outline
![Page 3: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/3.jpg)
1 Introduction
Previous topics Topics in this year
+
![Page 4: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/4.jpg)
1 Introduction
How to find the specific person?
How to find the specific location?
How to fuse the personand scene results?
How to alleviate noise influence?
![Page 5: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/5.jpg)
1 Introduction
How to find the specific person?
How to find the specific location?
How to fuse the personand scene results?
How to alleviate noise influence?
![Page 6: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/6.jpg)
1 Introduction
How to find the specific person?
How to find the specific location?
How to fuse the personand scene results?
How to alleviate noise influence?
Global View
Local View
![Page 7: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/7.jpg)
1 Introduction
How to find the specific person?
How to find the specific location?
How to alleviate noise influence?
Global View
Local View
How to fuse the personand scene results?
How to fuse the personand scene results?
![Page 8: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/8.jpg)
1 Introduction
How to find the specific person?
How to find the specific location?
How to alleviate noise influence?
Global View
Local View
How to fuse the personand scene results?
How to fuse the personand scene results? Outdoor scene
Non faceX
![Page 9: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/9.jpg)
2IntroductionProblem and Motivation
Proposed ApproachFramework and Details
Results4 runs
Conclusion
Outline
![Page 10: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/10.jpg)
2 Proposed Approach
![Page 11: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/11.jpg)
2 Proposed Approach
![Page 12: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/12.jpg)
Y. Zhu, J. Wang, C. Zhao, H. Guo and H. Lu. Scale-adaptive Deconvolutional Regression Network for Pedestrian Detection, ACCV, 2016.
Scale-Adaptive Deconvolutional Regression face detection network
Use the pretrained VGG16 model to initialize the network
two regression layers + softmax layer
Face detection
2 Proposed Approach – Face recognition
9 convolutional layers, 5 pooling layers, 2 fully connected layer
Softmax and triplet cost are combined Trained in our collected IVA-WebFace with 80
thousand identities and each has about 500-800 face images.
Face identification
Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016.
![Page 13: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/13.jpg)
Search the keyword EastEnders in Bing Our own face library includes 815 face images
Face library
815
2 Proposed Approach – Face recognition
![Page 14: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/14.jpg)
DEMO
2 Proposed Approach – Face recognition
![Page 15: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/15.jpg)
2 Proposed Approach
![Page 16: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/16.jpg)
Multiple objects retrieval
Through identifying typical objects in a certain topic scene, we can seek out shots of this scene indirectly
2 Proposed Approach – Local View + Global View
Global scene retrieval
Global feature: the output of the fully connected layer
ResNet-152 model pre-trained by Facebook AI Research
2048
ResNet-152
![Page 17: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/17.jpg)
DEMO
2 Proposed Approach – Local View + Global View
![Page 18: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/18.jpg)
2 Proposed Approach
![Page 19: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/19.jpg)
Non-target face filter
217,894 shots are deleted 851 ground truth shots deleted 822 of them are recovered with expanding shots Up to 46% of original video shots are filtered
Due to non-front and occlusion, some ground truth shots are filtered by mistake.
2 Proposed Approach - Filtering
Non-target scene filter
Global feature: the output of the fully connected layer
ResNet-152 model pre-trained by Facebook AI Research
We filter 5592 shots
![Page 20: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/20.jpg)
Irrelevant object categories filter
37 categories about vehicles, such as ambulance, minibus and police van
52 categories only appear outdoor, such as hippopotamus, Indian elephant and castle
We totally delete 19,244 shots
http://imagenet.stanford.edu/synset?wnid=n03417042
2 Proposed Approach - Filtering
Previous groundtruth filter
Some landmark objects only appear in a specific location.
Some objects must not be contained in the topics of this year.
We filter 12,006 shots
![Page 21: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/21.jpg)
2 Proposed Approach
![Page 22: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/22.jpg)
Score adjustment and Result expansion
The scene in TV series is likely to be blocked by the person, which causes the similarity scores of such shots are not high.
we find high-score shots with high slope of the score curve, and adjust those missed low-score shots among adjacent high-score shots.
2 Proposed Approach
Result fusion
three score vectors which have values from 0 to 1
![Page 23: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/23.jpg)
3 Proposed ApproachFramework and Details
Results4 runs
Conclusion
Outline
IntroductionProblem and Motivation
![Page 24: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/24.jpg)
Description of our methods
Results of our submitted 4 runs
3 Results
![Page 25: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/25.jpg)
4Conclusion
Outline
Results4 runs
Proposed ApproachFramework and Details
IntroductionProblem and Motivation
![Page 26: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/26.jpg)
4 Conclusion
1 Specific person: Face recognition + Face library
2 Specific scene: Local view (BoW) + Global view (CNN)
3 Result combination: Score adjustment + Results expansion
4 Shots filter: Non face + Outdoor scene + Groundtruth
![Page 27: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/27.jpg)
A N KHT S
![Page 28: WHU-NERCMS at TRECVID2016: Instance Search Task · Haiyun Guo, et al. Multi-View 3D Object Retrieval with Deep Embedding Network, ICIP, 2016. Search the keyword EastEnders in Bing](https://reader035.fdocuments.in/reader035/viewer/2022071018/5fd1acef55652c0a63179a6b/html5/thumbnails/28.jpg)
Text script retrieval and Speaker identification
Text script: for the target person Jim, the retrieval keywords are Brads, Stace, Stacey, Bradley, Dot, because they are family
412 audio library: target persons-6 voice segments of each person, the rest 93 persons-4 voice segments of each person
MFCC feature of all voice segment
2 Proposed Approach
framework