When textual and visual information join forces for multimedia retrieval
-
Upload
benoit-huet -
Category
Presentations & Public Speaking
-
view
166 -
download
4
description
Transcript of When textual and visual information join forces for multimedia retrieval
When Textual and Visual InformationJoin Forces
for MultiMedia Retrieval
Bahjat Safadi, Mathilde Sahuguet, Benoit HuetEURECOM, Multimedia Department
Sophia Antipolis, France
Introduction
� EU alone hosts 500+ online video platforms
� 42.7m hrs of footage in online archives of broadcast ers and producers (61% of archive footage is online)
� UGC on the advance: � YouTube receives 60 hrs of video/minute� Vine and Instagram video
� Internet video is now 40 percent of consumer Intern et traffic, and will reach 62 percent by the end of 2015, 75% i n 2017(source: CISCO)
� How to make the content accessible?� Browsing, Searching, Hyperlinking
B Huet - Eurecom - BAMMF - p 220/06/2014
Objectives and Contributions
� We propose and evaluate a video search framework us ing visual information to enrich the classic text-based search for video retrieval operating at the fragment level.
� We investigate the following two questions: � To which extent can visual concepts contribute information when retrieving
videos? � How can we cope with the confidence in visual concept detection?
� The framework extends conventional text-based searc h by fusing together textual and visual scores.
� We address both the semantic and intention gaps� By automatically mapping the query text to semantic concepts.� With the addition of “visual cues”
20/06/2014 B Huet - Eurecom - BAMMF - p 3
MediaEval Search & Hyperlinking
� Information seeking in a video dataset: retrieving media fragments/anchors
B Huet - Eurecom - BAMMF - p 420/06/2014
The Video Archive
2323 BBC videos of different genres (440 programs)� ~1697h of video + audio� Subtitles (manual)� Two ASR transcripts (LIMSI,LIUM)� Metadata (Title, Cast, Description,..)� Shot boundaries and key-frames� Search: 50 queries from 29 users
– Textual query + visual cues� Face detection� Concept detection
B Huet - Eurecom - BAMMF - p 520/06/2014
The Video Archive
2323 BBC videos of different genres (440 programs)� ~1697h of video + audio� Subtitles (manual)� Two ASR transcripts (LIMSI,LIUM)� Metadata (Title, Cast, Description,..)� Shot boundaries and key-frames� Search: 50 queries from 29 users
– Textual query + visual cues� Face detection� Concept detection
B Huet - Eurecom - BAMMF - p 620/06/2014
Text query : Medieval history of why castles were first builtVisual cues : Castle
Text query : Best players of all time; Embarrassing England performances; Wake up call for English football; Wembley massacre;
Visual cues : Poor camera quality; heavy looking football; unusual goal celebrations; unusual crowd reactions; dark; grey; overcast; black and white;
The proposed Framework
B Huet - Eurecom - BAMMF - p 720/06/2014
Videos, scenes and subtitles
Collection
Scenes
Conceptsindexing scores
Visualsemantic concepts
Content-based indexing
Off-line
On-line
Textual/visual
Query:Textual query
Scenes + subtitles
Text-based scores
Lucene indexing
User querying
Visual-based scores? Selected
concepts
Visualcues
Ranking
Ranked list
Fusion
The proposed Framework
B Huet - Eurecom - BAMMF - p 820/06/2014
Scenes
Conceptsindexing scores
Videos, scenes and subtitles
Collection
Visualsemantic concepts
Content-based indexing
No training data for visual concepts
Use 151 visual concept detectors trained on TrecVid 2012 data
Unknown performance
Visual concept detector confidence (w)
� 100 top images for the concept “Animal”
� 58 out of 100 are manually evaluated as valid
B Huet - Eurecom - BAMMF - p 920/06/2014
The proposed Framework
B Huet - Eurecom - BAMMF - p 1020/06/2014
Textual/visual
Query:
User querying
<queryText>Children out on poetry trip Exploration of poetry by school children Poem writing</queryText> <visualCues>House memories Farm exploration A poem on animal and shells </visualCues>
Users are not aware of visual concepts
Mapping visual cues to visual concepts
� <queryText>Children out on poetry trip Exploration of poetry by school children Poem writing</queryText> <visualCues>House memories Farm exploration A poem on animal and shells </visualCues>
Farm
Shells
Exploration
Poem
Animal
House
Memories
AnimalBirdsInsect
Cattle
DogsBuilding
SchoolChurch
Flags
Mountain
WordNet Mapping
keyw
ords
visual concepts
B Huet - Eurecom - BAMMF - p 1120/06/2014
Mapping visual cues to visual concepts
� Concepts mapped to the visual query "Castle”
� Semantic similarity computed using the “Lin” distance
20/06/2014 B Huet - Eurecom - BAMMF - p 12
Concept Windows Plant Court Church Building
β 0.4533 0.4582 0.5115 0.6123 0.701
The proposed Framework
B Huet - Eurecom - BAMMF - p 1320/06/2014
Text-based scores
Lucene indexing
Visual-based scores
WordNetsimilarity
Selected concepts
RankingFusion
One score for each scene (t)
f i = t iα + v i
1−α
One score for each scene (v):
Computed from the scores of the selected concepts for each scene
v iq = w c × vs i
c
c∈C 'q
∑
Evaluation
� To which extent can visual concepts contribute info rmation when retrieving videos?
� How can we cope with the confidence in visual conce pt detection?
� BBC Archive subset provided by the MediaEval 2013 Se arch and Hyperlinking task.
� Evaluation Measures:� Mean Reciprocal Rank (MRR): assesses the rank of the relevant segment� Mean Generalized Average Precision (mGAP) : takes into account starting
time of the segment� Mean Average Segment Precision (MASP) : measures both ranking and
segmentation of relevant segments
20/06/2014 B Huet - Eurecom - BAMMF - p 14
Retrieval Performance (50 queries)
� Low impact of visual concept detector confidence ( w)
� Significant improvement can be achieved by combinin g only mapped concepts with θ ≥ 0.3.
� Best performance is obtained when θ ≥ 0.8 (gain ≈ 11-12%).
20/06/2014 B Huet - Eurecom - BAMMF - p 15
w=1.0 w=confidence(c)
Visual concepts and Query association
� The number of concepts associated to queries with different threshold θ.
20/06/2014 B Huet - Eurecom - BAMMF - p 16
θ 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Min 5 5 5 2 0 0 0 0 0 0
Max 45 45 41 37 25 19 19 12 6 2
Mean 20 19 18 15 11 7 5 3 1 1
#Q(#c’q>0) 50 50 50 50 49 49 48 44 29 21
Retrieval on queries with visual concepts (21)
� Concept mapping improves significantly the performance of the text-based search task on these queries.
� The best performance was achieved with θ ≥ 0.7 (gain ≈ 32-33%).
20/06/2014 B Huet - Eurecom - BAMMF - p 17
w=1.0 w=confidence(c)
Conclusion
� A novel video search framework using visual informa tion to enrich a text-based search for video retrieval has been presented.
� We conducted our evaluations on the MediaEval 2013 w here we achieved the 2sd best on Search and 1 st on Hyperlinking
� Experimental results show that mapping text-based q ueries to visual concepts improves significantly the searc h system.
� When appropriately selecting the relevant visual co ncepts, a very significant improvement is achieved (gain ≈ 33%).
20/06/2014 B Huet - Eurecom - BAMMF - p 18
Related Publications
� B. Safadi, M. Sahuguet and B. Huet, When textual and visual information join forces for multimedia retrieval, ICMR 2014, ACM International Conference on Multimedia Retrieval, April 1-4, 2014, Glasgow, Scotland
� M. Sahuguet and B. Huet. Mining the Web for Multimedia-based Enriching . Multimedia Modeling MMM 2014, 20th International Conference on MultiMedia Modeling, 8-10th January 2014, Dublin, Ireland
� M. Sahuguet, B. Huet, B. Cervenkova, E. Apostolidis, V. Mezaris, D. Stein, S. Eickeler, J-L. Redondo Garcia, R. Troncy, L. Pikora. LinkedTV at MediaEval 2013 search and hyperlinking ta sk, MEDIAEVAL 2013, Multimedia Benchmark Workshop, October 18-19, 2013, Barcelona, Spain
� Stein, D.; Öktem, A.; Apostolidis, E.; Mezaris, V.; Redondo García, J. L.; Troncy, R.; Sahuguet, M. & Huet, B., From raw data to semantically enriched hyperlinking : Recent advances in the LinkedTV analysis workflow, NEM Summit 2013, Networked & Electronic Media, 28-30 October 2013, Nantes, France
� V. Mezaris and B. Huet, “Video Hyperlinking ”, Tutorial Accepted at ICIP 2014 (Oct) Paris
� B. Safadi, M. Sahuguet and B. Huet, “Linking text and visual concepts semantically for c ross modal multimedia search ”, ICIP 2014, Paris 2014.
B Huet - Eurecom - BAMMF - p 1920/06/2014
Questions?
http://www.slideshare.net/huetbenoit/
� Thank you.
When Textual and Visual InformationJoin Forces
for MultiMedia RetrievalBenoit Huet
B Huet - Eurecom - BAMMF - p 2020/06/2014