Multimodal Features for Search and Hyperlinking of Video Content
-
Upload
petra-galuscakova -
Category
Technology
-
view
747 -
download
4
description
Transcript of Multimodal Features for Search and Hyperlinking of Video Content
![Page 1: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/1.jpg)
Multimodal Features for Search and Hyperlinking of Video Content
Petra Galuščáková[email protected]
Institute of Formal and Applied LinguisticsCharles University in Prague
29. 10. 2014
![Page 2: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/2.jpg)
2
Outline
● Speech Retrieval and Hyperlinking● Data and Evaluation● System Description● Passage Retrieval, Segmentation of Recordings● Visual and Prosodic Information
![Page 3: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/3.jpg)
3
Speech Retrieval and Hyperlinking
![Page 4: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/4.jpg)
4
Search in Audio-Visual Documents
● Input:● Data collection (video recordings)● Query
– Given as text● Output:
● Relevant segments (passages) of documents● E.g. “Children out on poetry trip Exploration of poetry by
school children Poem writing”, “Space-Cowboys Space Pirates Pirates in Space talking music”, “animal park, kenya marathon , wildlife reserve”
![Page 5: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/5.jpg)
5
Hyperlinking
● Input:● Data collection (video recordings) ● Query segment
● Output: ● Segments similar to the query
segment
![Page 6: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/6.jpg)
6
Data and Evaluation
![Page 7: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/7.jpg)
7
● MediaEval is a benchmarking initiative dedicated to development, comparison, and improvement of strategies for processing and retrieving multimedia content.
● E.g. speech recognition, multimedia content analysis, music and audio analysis, user-contributed information (tags, tweets), viewer affective response, social networks, temporal and geo-coordinates
● 2012 MediaEval Search and Hyperlinking Task● 2013 MediaEval Search and Hyperlinking Task● 2013 Similar Segments in Social Speech Task● 2014 MediaEval Search and Hyperlinking Task
![Page 8: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/8.jpg)
8
Search and Hyperlinking Task
● The main goal of the Search Subtask● Find passages relevant to a user’s interest given by a textual query in
a large set of audio-visual recordings● And of the Hyperlinking Subtask:
● To find more passages similar to the retrieved ones● Scenario:
● A user wants to find a piece of information relevant to a given query in a collection of TV programmes (Search subtask)
● And then navigate through a large archive using hyperlinks to the retrieved segments (Hyperlinking subtask)
![Page 9: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/9.jpg)
9
Search and Hyperlinking Task 2014 Data
● TV programme recordings provided by BBC● All BBC programmes broadcasted during 4 months● 1335 hours for training, 2686 hours for testing
● Subtitles and three ASR transcripts (LIMSI, LIUM, and NST Sheffield)
● Metadata, detected shots, stable keyframes, prosodic features● Search: 50 training and 30 test queries
● E.g. sightseeing london, egypt travel, celebrity diet● Hyperlinking: 30 training and 30 test queries
● Given as a query segment (beginning and end)
![Page 10: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/10.jpg)
10
Evaluation
● Full document retrieval → MRR● RR = 1 / rank of the first correctly retrieved document● MRR = average of the RR values for the set of the queries
● Retrieval of the exact passages → MRR-window● Starting points of retrieved segments is limited to appear less
than 60 seconds from the starting point of the relevant segment to be considered as correctly retrieved
● MRRw = average of the RRw values for the set of the queries● Retrieval of the exact passages → mGAP, MASP
● Takes into account the exact beginning (end) of a relevant segment
![Page 11: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/11.jpg)
11
Evaluation Cont.
● MAP, P5, P10, P20
● MAP-bin
● MAP-tol
Aly R., Eskevich M., Ordelman R., and Jones G.J.F.: Adapting Binary Information Retrieval Evaluation Metrics for Segment-based Retrieval Tasks. Technical Report, 2013.
![Page 12: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/12.jpg)
12
System Description
![Page 13: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/13.jpg)
13
Passage Retrieval
● Documents are automatically divided into shorter segments● Segments serve as documents in the traditional IR setup● The segmentation is crucial for the quality of the retrieval
– Especially the segment length → We focus on the segmentation strategies
![Page 14: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/14.jpg)
14
Effect of Passage Retrieval
Segm.Manual ASR
MRR MRRw mGAP MRR MRRw mGAP
None 0.879 0.315 0.029 0.858 0.333 0.027
Manual 0.897 0.671 0.277 0.885 0.669 0.247
● Segmentation may highly improve retrieval of the segment beginnings (MRRw and mGAP measures)
● Segmentation may improve retrieval of full recordings (MRR measure)
Similar Segments in Social Speech Task 2013
![Page 15: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/15.jpg)
15
Baseline System
● We employ the Terrier IR toolkit● Hiemstra language model
● Parameter set to 0.35 (importance of a query term in a document)● Stopwords removal, stemming● Post-filtering of the answers
![Page 16: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/16.jpg)
16
Post-filtering Effect● MAP, P5, P10 and P20 are notably higher in the experiments in
which we did not remove partially overlapping segments● These measures do not distinguish, whether a user had
already seen the retrieved segment● The overlapping segments are expected not to be so
beneficial for the users
Transcript Filtering MAP P5 P10 P20 MAP-bin MAP-tol
Subtitles Yes 0.3692 0.7467 0.7133 0.6050 0.2606 0.2157
Subtitles No 16.3486 0.8400 0.8367 0.8433 0.3172 0.0515
Search and Hyperlinking Task 2014 (Search subtask)
![Page 17: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/17.jpg)
17
Baseline System - Hyperlinking
● Transformed into Search subtask● Query segment is transformed into a textual query by
including all the words of the subtitles lying within the segment boundary
● Queries created on subtitles outperform ASR queries● Even if we run the retrieval on the ASR transcripts
![Page 18: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/18.jpg)
18
System Tuning● Metadata
● Concatenate metadata with each segment● Title, episode title, description, short episode synopsis,
service name and program variant● In Hyperlinking: Concatenate metadata with the query
segment● Context
● In Hyperlinking: use 200 seconds before the segment beginning and after the segment end
![Page 19: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/19.jpg)
19
System Tuning Cont.
● Search
● Hyperlinking
Search and Hyperlinking Task 2014
Transcript Tuning MAP P5 P10 P20 MAP-bin MAP-tol
Subtitles None 0.4209 0.7933 0.7433 0.5950 0.3192 0.3155
Subtitles Metadata 0.5127 0.7467 0.7267 0.6100 0.3538 0.3023
Transcript Tuning MAP P5 P10 P20 MAP-bin MAP-tol
Subtitles None 0.1147 0.3071 0.2786 0.2036 0.1021 0.0792
Subtitles Metadata+Context 0.4072 0.8067 0.7000 0.5417 0.2611 0.2237
![Page 20: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/20.jpg)
20
Segmentation Strategies
![Page 21: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/21.jpg)
21
Segmentation Types
● Fixed-length (Window-based)● Segments of equilong length with regular shift● Claimed to be a very effective approach
● Similarity-based● Measure the similarity between neighboring segments (e.g. cosine
distance)● Algorithms TextTiling and C99
● Lexical-chain-based● A sequence of lexicographically related word occurrences
● Feature-based
![Page 22: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/22.jpg)
22
Fixed-Length Segmentation Comparison
● S – Sentence● Sh – Shot● Sp – Speech Segment● TP – Time + Pause● TO – Time + Overlap (Fixed-
Length Segment)
Search and Hyperlinking Task 2012 (Search Subtask)
M. Eskevich et al.: Multimedia information seeking through search and hyperlinking, ICMR 2013.
![Page 23: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/23.jpg)
23
Fixed-length SegmentationSegment Length
Search and Hyperlinking Task 2013 (Search subtask)
![Page 24: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/24.jpg)
24
Fixed-length SegmentationSegment Shift
Search and Hyperlinking Task 2013 (Search subtask)
![Page 25: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/25.jpg)
25
Feature-based Segmentation
![Page 26: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/26.jpg)
26
Feature-based Segmentation
● We identify possible segment boundaries (beginnings and ends)
● J48 decision trees (almost equivalent to C4.5), Weka framework
● Training data available for the Similar Segments in Social Speech Task, MediaEval 2013● Manually marked segments● Conversations between university students
● Binary classification problem● For each word in the transcripts, we predict whether a
segment boundary occurs after this word● Classes: segment boundary and segment continuation
![Page 27: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/27.jpg)
27
Used Features
● Cue words and tags● N-grams which frequently appear at segment boundaries ● N-grams most informative for segment boundaries ● Manually defined n-grams
● Letter cases● Length of the silence before the word
● Measured as a difference between timestamps of two adjacent words
● Division given in transcripts (e.g., speech segments defined in the LIMSI transcripts)
● The output of the TextTiling algorithm
![Page 28: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/28.jpg)
28
Most Informative Features
● Division defined in the transcripts● The length of silence
● Especially if it is longer than 300ms, 400ms, 500ms, 600ms)● TextTiling algorithm output● Segment beginnings: “if”, “I’m”, “especially”, “the”, “are you”,
“you have”, “VBP PRP VBG”, …● Segment ends: “good”, “interesting”, “lot”, …
![Page 29: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/29.jpg)
29
Feature-based Segmentation Approaches
![Page 30: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/30.jpg)
30
Feature-based Segmentation Approaches Comparison
Beg. End MRR MRRw mGAP #Seg Len [s]-- -- 0.656 0.052 0.027 2 k 2531.6
Reg Reg 0.671 0.388 0.245 234 k 49.5ML -- 0.549 0.117 0.060 3125 k 2.3-- ML 0.607 0.310 0.192 280 k 29.0
ML B+50 0.685 0.412 0.272 5820 k 49.6E+50 ML 0.715 0.428 0.298 2580 k 49.6
ML ML 0.626 0.392 0.229 5659 k 20.2
Search and Hyperlinking Task 2013 (Search subtask), Results on the subtitles
![Page 31: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/31.jpg)
31
Feature-based Segmentation vs. Fixed-Length Segmentation
● Search Task
● Hyperlinking Task
Transcript Segm. Seg.Len. MAP P5 P10 P20 MAP-
binMAP-
tol
Subtitles Fixed 60s 0.5127 0.7467 0.7267 0.6100 0.3538 0.3023
Subtitles Features 50s 0.8028 0.7867 0.7667 0.6933 0.3199 0.2350
Search and Hyperlinking Task 2014
Transcript Segm. Seg.Len. MAP P5 P10 P20 MAP-
binMAP-
tol
Subtitles Fixed 60s 0.4366 0.8667 0.7700 0.5633 0.2724 0.2580
Subtitles Features 50s 0.8253 0.8867 0.8567 0.7383 0.2525 0.1991
![Page 32: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/32.jpg)
32
Visual Information in Segmentation
● Training data used for segmentation tuning are visually static visual information would not be helpful→
● Create segments only if visual similarity between adjacent segment < weight
● Tune the weight on the Search and Hyperlinking training data
Transcript Segm. MAP P5 P10 P20 MAP-bin
MAP-tol
Subtitles Features 0.8028 0.7867 0.7667 0.6933 0.3199 0.2350
Subtitles Features+Visual 0.7701 0.7600 0.7500 0.6733 0.3285 0.2530
Search and Hyperlinking Task 2014 (Search Subtask)
![Page 33: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/33.jpg)
33
Visual and Prosodic Similarity
![Page 34: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/34.jpg)
34
Visual Similarity
![Page 35: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/35.jpg)
35
Visual Similarity Cont.
● We use Feature Signatures and Signature Quadratic Form Distance
http://siret.ms.mff.cuni.cz
![Page 36: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/36.jpg)
36
Visual Similarity Results
Transcript Meta. Weights MAP P5 P10 P20 MAP-bin
MAP-tol
Subtitles No None 0.1618 0.4786 0.4107 0.2893 0.1423 0.1216Subtitles No Visual 0.1660 0.4929 0.4143 0.3000 0.1483 0.1245Subtitles Yes None 0.4301 0.8600 0.7767 0.5483 0.2689 0.2465Subtitles Yes Visual 0.4366 0.8667 0.7700 0.5633 0.2724 0.2580
LIMSI Yes None 0.4166 0.8533 0.7133 0.5450 0.2659 0.2297LIMSI Yes Visual 0.4168 0.8667 0.7333 0.5400 0.2692 0.2414LIUM Yes None 0.4226 0.8333 0.7300 0.5433 0.2593 0.2547LIUM Yes Visual 0.4212 0.8400 0.7367 0.5350 0.2622 0.2632NST Yes None 0.4072 0.8067 0.7000 0.5417 0.2611 0.2237NST Yes Visual 0.4160 0.8267 0.7167 0.5483 0.2655 0.2440
Search and Hyperlinking Task 2014 (Hyperlinking subtask)
![Page 37: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/37.jpg)
37
Visual Similarity Results -Positive Query Examples
![Page 38: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/38.jpg)
38
Visual Similarity Results -Negative Query Examples
![Page 39: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/39.jpg)
39
Prosodic Similarity
![Page 40: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/40.jpg)
40
Prosodic Similarity Results
Transcript Meta. Weights MAP P5 P10 P20 MAP-bin
MAP-tol
Subtitles Yes None 0.4301 0.8600 0.7767 0.5483 0.2689 0.2465
Subtitles Yes Prosodic 0.4321 0.8533 0.7767 0.5517 0.2687 0.2473
Search and Hyperlinking Task 2014 (Hyperlinking subtask)
● Small but promising improvement
![Page 41: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/41.jpg)
41
System Comparison
![Page 42: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/42.jpg)
42
Search Task
![Page 43: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/43.jpg)
43
Hyperlinking Task
![Page 44: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/44.jpg)
44
Conclusion
![Page 45: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/45.jpg)
45
Conclusion
● Passage Retrieval ● Improves retrieval of relevant segments● Can improve retrieval of full recordings
● Segmentation approach is crucial for the retrieval● Fixed-length segmentation works well● Feature-based segmentation outperforms fixed-length
segmentation● Visual and prosodic similarity can improve results of text-
based retrieval
![Page 46: Multimodal Features for Search and Hyperlinking of Video Content](https://reader033.fdocuments.in/reader033/viewer/2022060201/559ab2b21a28aba2378b4670/html5/thumbnails/46.jpg)
46
Thank you
This research is supported by the Charles University Grant Agency (GA UK n. 920913)