Fisher Kernel based Relevance Feedback for Multimodal Video Retrieval
-
Upload
ionut-mironica -
Category
Engineering
-
view
133 -
download
2
Transcript of Fisher Kernel based Relevance Feedback for Multimodal Video Retrieval
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 1
Fisher Kernel based Relevance Feedbackfor Multimodal Video Retrieval
Ionu MIRONICț Ă1 Bogdan IONESCU1,2 Jasper Uijlings2 Nicu Sebe2
1LAPI – University Politehnica of Bucharest, Romania2MHUG DISI – University of Trento, Italy
ACM International Conference on Multimedia Retrieval, ICMR 2013, Dallas, Texas, USA, April 16 - 19, 2013
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 2
Agenda
• Introduction
• Relevance Feedback Algorithms
• Fisher Kernel Theory
• Implementation
• Experimental results
• Conclusion
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 3
Problem StatementConcepts• Content Based Video Retrieval• Genre Retrieval• Query by Example
Query Results
Movie Sample
Query Database
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 4
CBVR System
VideoDescriptors (vectors with
features)Movie Database
(Off-line)Descriptor compute
(On-line)Descriptor Compute
Comparison
Image Selection
(browsing)
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 5
Semantic Gap and Relevance Feedback• Relevance feedback uses positive and negative examples provided by the user to improve the system’s performance.
UserFeedback
Display
Estimation &Display selection
Feedbackto system
Display
UserFeedback
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 6
State-of-the-Art algorithms:
• Rocchio Algorithm
• Feature Relevance Estimation (FRE)
• SVM – Relevance Feedback
• This paper: Fisher Kernel representation + SVM
Relevance Feedback
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 7
∑∑∈∈
−+=NN
iRR
i
ii
NN
RR
QQγβα'
• Uses a set R of relevant documents and a set N of non-relevant documents, selected in the user relevance feedback phase, and updates the query feature.
Rocchio Algorithm
Initial query
Relevant documents Non-Relevant documentsNew Query-Point
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 8
∑ ∑= =
−=d
i
d
iiiii WYXWYXDist
1 1
2)(),(
• some specific features may be more important than other features• every feature will have an importance weight that will be computed as Wi = 1/σ, where σ denotes the variance of video features.
Feature Relevance Estimation (FRE)
Initial queryRelevant documents
Non-Relevant documents
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 9
SVM – Relevance Feedback
• Train SVM with two classes: - positive samples - negative samples
• Classify next samples as positive or negative using SVM network
• Use the confidence values of the classifiers to rerank the documents
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 10
Fisher Kernel (FK) Theory- Introduced by [Jaakkola, T., Haussler, D.: Exploiting generative models in
discriminative classifiers. NIPS’99] for protein detection
- Introduced in Computer Vision by [Perronnin, F., Dance C. "Fisher kernels on visual vocabularies for image categorization." CVPR’07]
- combines the benefits of generative and discriminative approaches
- represents a signal as the gradient of the probability density function that is a learned generative model of that signal
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 11
Fisher Kernel Framework
Compute Fisher Vectors
Training & Classification
Step
Feature extraction Dictionar extraction
X = {x1 ... xm}
Altering Feature Space Training & Reranking Step
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 12
Fisher Kernel based RF
- Use a video document as a „query by example”
- The system returns top k documents (k=20)
- Use these document to train an Gaussian Mixture Model (GMM)
- Compute the Fisher Kernels for all the features (reshape the features space)
- Apply Fisher Kernel Normalization
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 13
- The user selects the relevant documents
- The system is trained with these samples (SVM classifiers)- The documents are reranked using the classifier confidence level
- If the results are not satisfactory the use can continue with more Relevance Feedback iterations)
Fisher Kernel based RF
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 14
Fisher Kernel based RF
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 15
Experimental Setup MediaEval 2012 Dataset•14,838 episodes from 2,249 shows ~ 3,260 hours of data•split into Development and Test sets
5,288 for development / 9,550 for test
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 16
Experimental Setup MediaEval 2012 Dataset
• 26 Genre labels26 Genre labels1000 art 1001 autos_and_vehicles 1002 business 1003 citizen_journalism 1004 comedy 1005 conferences_and_other_events 1006 default_category 1007 documentary 1008 educational 1009 food_and_drink 1010 gaming 1011 health 1012 literature 1013 movies_and_television 1014 music_and_entertainment 1015 personal_or_auto-biographical 1016 politics 1017 religion 1018 school_and_education 1019 sports 1020 technology 1021 the_environment 1022 the_mainstream_media 1023 travel 1024 videoblogging 1025 web_development_and_sites
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014
17
Feedback Approaches for MPEG-7 Content-based Biomedical Image
Experimental Setup
Precision = Number of returned relevant documents
Total number of returned documents
Recall = Number of returned relevant documents
Total number of relevant documents
• Precision – Recall Chart
• Mean Average Precision - summarize rankings from multiple queries by averaging average precision
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 18
Visual Descriptors• Global MPEG 7 related descriptors: Local Binary Pattern (LBP), Autocorrelogram, Color Coherence Vector (CCV), Edge Histogram (EHD) Color Layout Pattern(CLD), Scalable Color Descriptor classic color histogram (hist), color moments
• Global HoG
• Global Structural Descriptors describe contour properties and appearance parameters
Global Descriptors = mean and variance over all frame descriptors
[IJCV, C. Rasche’10]
edginess symmetry
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 19
Audio Descriptors
[B. Mathieu et al., Yaafe toolbox, ISMIR’10, Netherlands]
• Linear Predictive Coefficients,
• Line Spectral Pairs,
• Mel-Frequency Cepstral Coefficients,
• Zero-Crossing Rate,
+ variance of each feature over a certain window.
• spectral centroid, flux, rolloff, and kurtosis,
f1 fn…f2
globalfeature = mean & variance
time
+var{f2} var{fn}
[Klaus Seyerlehner et al., MIREX’11, USA]
Standard Audio Features
Block-based Audio Features• Spectral Pattern, delta Spectral Pattern, • variance delta Spectral Pattern, • Logarithmic Fluctuation Pattern , • Correlation Pattern, Local Single Gaussian model• Spectral Contrast Pattern, • Mel-Frequency Cepstral Coefficients
[B. Mathieu et al., Yaafe toolbox, ISMIR’10]
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 20
Text Descriptors
TF-IDF descriptors (Term Frequency-Inverse Document Frequency)
1. extract root words
2. remove terms <5%-percentile of the frequency distribution
3. select term corpus: retaining for each genre class m terms (e.g. m = 150 for ASR) with the highest χ2 values that occur more frequently than in complement classes
4. for each document the TF-IDF values are computed.
Text source: ASR [Lamel, et al., ICNL’08]
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 21
Evaluation
(1) Evaluating Features Metrics
Manhattan Euclidian Mahalanobis Cosinus Bray-Curtis
Chi-Square
Canberra
Hog 17.02% 17.18% 17.07% 17.00% 17.10% 17.07% 16.67%
Structural 10.87% 10.55 11.14% 2.18% 10.92% 11.58% 14.82%
MPEG 7 12.37% 10.85 21.14% 8.69% 13.34% 13.34% 25.97%
Standard Audio
7.76% 7.78 29.26% 15.28% 7.78% 8.04% 1.58%
Block-Based Audio
19.33% 19.58 20.21% 21.23% 19.71% 19.99% 20.37%
Text 8.32% 7.15 5.39% 16.64% 20.40% 9.83% 9.68%
(MAP values)
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 22
Evaluation
(2) Optimizing Fisher Kernel
- number of GMM Centroids
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 23
Implementation
Visual Audio Text
Without Normalization 37.25% 38.68% 31.13%
L1 – norm 36.82% 37.97% 29.83%
L2 – norm 39.22% 41.94% 30.51%
Log – norm 38.61% 42.01% 35.07%
PN 38.51% 41.37% 34.93%
PN + L1 – norm 39.20% 42.98% 30.12%
PN + L2 – norm 39.46% 43.23% 31.71%
(2) Optimizing Fisher Kernel
- Normalization strategy
(MAP values)
PN = Power NormalizationLog – norm = Logarithmic Normalization
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 24
Implementation
Feature FK- for all data FK RF – RBF
Visual features 34.02% 38.23%
Standard audio features 38.25% 46.34%
Text features 32.37% 35.14%
(3) Fisher Kernel representation on all data
(MAP values)
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 25
Evaluation
(4) Comparison to State-of-the-Art algorithms
- Rocchio- Nearest Neighbor RF - NB- Boost RF- SVM RF - Random Forest RF - (RF)- Relevance Feature Estimation - (RFE)
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 26
Evaluation (4) Comparison to State-of-the-Art algorithms
• Rocchio• Nearest Neighbor RF - NB• Boost RF• SVM RF • Random Forest RF - (RF)• Relevance Feature Estimation - (RFE)
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 27
Evaluation
Feature Without RF
Rocchio NB Boost SVM RF RFE FK Linear
FK RBF
HoG 17.18 25.57 24.18 26.72 26.49 26.89 27.50 29.46 29.59
Structural 14.82 21.96 23.73 23.63 24.62 24.69 23.91 26.28 23.96
MPEG 7 25.97 30.88 34.09 32.55 32.90 36.85 31.93 40.50 40.80
All Visual 26.18 32.98 34.25 35.99 36.08 42.28 32.43 41.33 42.23
Standard Audio
29.26 32.71 34.88 32.88 38.58 40.46 44.32 44.80 46.34
Block Based Audio
21.23 35.39 35.22 39.87 31.46 33.41 31.93 43.96 43.69
Text 20.40 32.55 26.91 26.93 34.70 34.70 25.82 34.84 35.14
All features 30.29 37.91 39.88 38.88 40.93 45.31 44.93 46.43 46.80
(4) Comparison to State-of-the-Art algorithms
(MAP values)
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 28
Evaluation
(5) Frame Agreggation with Fisher KernelNumber of GMM Centroids
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 29
Evaluation
(5) Frame Agreggation with Fisher Kernel
Feature FKRF Linear (T=1)
FKRF RBF (T=1)
Frame Aggregation FK-RF Linear
Frame Aggregation FK-RF RBF
HoG 29.46% 29.59% 32.12% 32.87%
MPEG-7 40.50% 40.80% 44.69% 45.43%
(MAP values)
almost 5% improvement for MPEG-7
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 30
• Relevance feedback is a powerful tool for improving content based video retrieval systems• The Fisher Kernel RF Approach outperforms classical RF algorithms (such as Rocchio or RFE and SVM) in terms of accuracy.• Modelling the temporal variation in video using the Fisher Kernel yields 5% improvement for MPEG-7 features
Future Work•Test algorithm on bigger datasets•Extend to the other features (motion, audio and text)•Test modelling temporal variation on general retrieval tasks
Conclusions
University Politehnica of Bucharest & University of Trento
ICMR 2013Sunday, November 2, 2014 31
Thank you!
Questions?