MediaEval 2016 - UNIFESP Predicting Media Interestingness Task

15
UNIFESP at MediaEval 2016: Predicting Media Interestingness Task Jurandy Almeida GIBIS Lab, Institute of Science and Technology, Federal University of S˜ ao Paulo – UNIFESP [email protected] MediaEval’16 – Hilversum, Netherlands – October 20-21 – 2016

Transcript of MediaEval 2016 - UNIFESP Predicting Media Interestingness Task

UNIFESP at MediaEval 2016:Predicting Media Interestingness Task

Jurandy Almeida

GIBIS Lab, Institute of Science and Technology, Federal University of Sao Paulo – UNIFESP

[email protected]

MediaEval’16 – Hilversum, Netherlands – October 20-21 – 2016

Predicting Media Interestingness Task 2

Developed in the MediaEval 2016 Predicting Media Interestingness Task andfor its video subtask only.

The goal is to automatically select the most interesting video segmentsaccording to a common viewer.

The focus is on features derived from audio-visual content or associated textualinformation.

Available Resources 3

Table: Resources made available for the task.

Resources Textual VisualUsed — Videos

Not Used Title Low-Level and Mid-Level Features

Proposed Approach 4

It relies on combining learning-to-rankalgorithms and exploiting only visualinformation:

1. A simple, yet effective, histogram ofmotion patterns is used forprocessing visual information.

2. A majority voting scheme is usedfor combining machine-learnedrankers and predicting theinterestingness of videos.

Input

Rankers R1 R2 RN

O1 O2 ON

Combining Rankings

Output o

Visual Features 5

Low-Level and Mid-Level Features: Not used

Applying an algorithm to encode visual properties from video segments.

“Comparison of Video Sequences with Histograms of Motion Patterns”.J. Almeida, N. J. Leite, and R. S. Torres.IEEE International Conference on Image Processing (ICIP), 2011.

It relies on three steps:

1. partial decoding;2. feature extraction;3. signature generation.

Visual Features 6

Histograms of Motion Patterns (HMP)1

106 111

100 88

91 94

95 90

90 93

96 91

1 1

2 1

2 1

0 3

Previous Current Next

Temporal Spatial

Time Series of Macroblocks

Video Frames

I-frames

Macroblock

Pixel Block

Histogram Distribution

DC coefficient

1: Partial Decoding

2: Feature Extraction

3: Signature Generation

Motion Pattern

0101100110010011

1J. Almeida, N. J. Leite, and R. S. Torres. “Comparison of Video Sequences with Histogramsof Motion Patterns”. In: ICIP. 2011, pp. 3673–3676.

Learning to Rank Strategies 7

Ranking SVM2

Use the traditional SVM classifier to learn a ranking function.

RankNet3

Probability distribution metrics as cost functions to be optimized.

RankBoost4

Regression error on weighted distributions of pairwise rankings.

ListNet5

Extension of RankNet that uses a ranked list instead of pairwise rankings.

Majority Voting6

The label with the most votes is selected as the label for a given instance.

2T. Joachims. “Training linear SVMs in linear time”. In: ACM SIGKDD. 2006, pp. 217–226.3C. J. C. Burges et al. “Learning to rank using gradient descent”. In: ICML. 2005, pp. 89–96.4Y. Freund et al. “An Efficient Boosting Algorithm for Combining Preferences”. In: Journal of

Machine Learning Research 4 (2003), pp. 933–969.5Z. Cao et al. “Learning to rank: from pairwise approach to listwise approach”. In: ICML.

2007, pp. 129–136.6L. Lam and C. Y. Suen. “Application of majority voting to pattern recognition: an analysis of

its behavior and performance”. In: IEEE Trans. Systems, Man, and Cybernetics, Part A 27.5(1997), pp. 553–568.

Experimental Protocol 8

4-fold cross validation

Development data

5,054 video segments from 52 movie trailers

Test data

2,342 video segments from 26 movie trailers

Mean Average Precision (MAP)

Experimental Protocol 9

Table: Configurations of Runs

Run Learning-to-Rank Strategy1 Ranking SVM2 RankNet3 RankBoost4 ListNet5 Majority Voting

Experimental Results 10

RankingSVM

RankNet

RankBoost

ListNet

Majority

Voting

MAP(%

)

10

11

12

13

14

15

16

17

18

19

20

Figure: Results obtained on the development data.

Experimental Results 11

0

5

10

15

20

25MAP(%

)

RankingSVM

RankNet

RankBoost

ListNet

Majority

Voting

18.15

16.1716.17 16.56

14.35

Figure: Results of the official submitted runs.

Experimental Results 12

video−52

video−53

video−54

video−55

video−56

video−57

video−58

video−59

video−60

video−61

video−62

video−63

video−64

video−65

video−66

video−67

video−68

video−69

video−70

video−71

video−72

video−73

video−74

video−75

video−76

video−77

0

10

20

30

40

50

60

70Average

Precision

(%)

Ranking SVM

RankNet

RankBoost

ListNet

Majority Voting

Figure: AP per movie trailer achieved in each run.

Conclusions 13

RemarksThe proposed approach has explored only visual properties. Differentlearning-to-rank strategies were considered, including a fusion of all of them.

Findings

Obtained results demonstrate that the proposed approach is promising. Bycombining learning-to-rank algorithms, it is possible to make a contribution tobetter results.

Future workThe investigation of a smarter strategy for combining learning-to-rank algorithmsand considering other information sources to include more features semanticallyrelated to visual content.

Acknowledgements 14

Organizers of Predicting Media Interestingness Task and MediaEval 2016

Brazilian funding agencies

FAPESP, CAPES, and CNPq

Obrigado!!! 15

Thank you for your attention!!!

If you have any questions, do not hesitate to contact me:

Jurandy Almeida ([email protected])