Story-based Video Retrieval in TV series using Plot...
Transcript of Story-based Video Retrieval in TV series using Plot...
![Page 1: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/1.jpg)
KIT – University of the State of Baden-Wuerttembergand National Research Center of the Helmholtz Association
Computer Vision for Human-Computer Interaction Lab
www.kit.edu
Story-based Video Retrieval in TV series using Plot Synopses
Makarand Tapaswi, Martin Bäuml, Rainer StiefelhagenKarlsruhe Institute of Technology, Germany
03 April, ACM ICMR 2014
![Page 2: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/2.jpg)
Gandalf falls to a Balrog of Moria
Obi-Wan cuts Darth Maul in two with
his light saber
Story
0:00:00 2:58:00
0:00:00 2:16:00
![Page 3: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/3.jpg)
3
Goal
![Page 4: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/4.jpg)
Names PlacesTalkingVerbs
ActionVerbs
Objects
Idea
![Page 5: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/5.jpg)
5
Crowd-sourcing
Freiburg et al. 2011Concert concepts with user feedback
Wang et al. 2013Joint latent space for images and text
Text (transcripts) to video
Everingham et al. 2006Person Identification
Laptev et al. 2008Action Recognition
Xu et al. 2008Event detection in sports
Related Work
Describing images and videos
Farhadi et al. 2010<object, action, scene>Triplets to describe images
Habibian et al. 2013Video2SentenceSentence2Video
![Page 6: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/6.jpg)
Text – Video Alignment
Pre-processing
Character identification
Alignment
6
![Page 7: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/7.jpg)
7
Pre-processing
Buffy awakens to find Dracula in her bedroom. She is helpless against his powers and unable to stop him from biting her. When she wakes the next morning …
Part-of-speech tagging Coreference resolution
Buffy/NNP awakens/VBZ to/TOfind/VBP Dracula/NNP in/INher/PRP bedroom/NN ./.She/PRP is/VBZ helpless/JJagainst/IN his/PRP powers/NNS …
Original sentence
Shot boundary detection
Buffy awakens to find Dracula in her bedroom. She is helpless against his powers and unable to stop him from biting her. When she wakes the next morning …
Names
Places
![Page 8: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/8.jpg)
8
00:10:01,933 --> 00:10:04,447So I won't be taking drama with you.
00:10:04,533 --> 00:10:08,811- What? You have to. You promised!- I know, but Giles said that it was
00:10:08,893 --> 00:10:11,407- The hell with Giles.- I can hear you, Willow.
Buffy: So I won't be taking drama with you.
Willow: What? You have to, you promised!
Buffy: Well, I know, but Giles said that it just was-
Willow: The hell with Giles.
Giles: I can hear you, Willow.
align (fan) transcripts to subtitles
who speaks what? what is spoken when?
Weak character labels
Weakly Labeled Data
speaking: Riley?
speaking: Willow?
Bäuml et al. 2013
![Page 9: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/9.jpg)
9
Person id in video
Weakly Labeled Data
speaking: Riley?
speaking: Willow?
Automatically identify all tracks
Train classifiers
Bäuml et al. 2013
![Page 10: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/10.jpg)
Alignment
10
• Compute the similarity matrix• Find the alignment which maximizes similarity*
Shots
Senten
ces
![Page 11: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/11.jpg)
11
A simple prior
Distribute shots equally to sentences
Prior Similarity Similarity
![Page 12: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/12.jpg)
12
Similarity – Identities
Riley asks Spike about Dracula, but the former commando is warned.
Buffy awakens to find Dracula in her bedroom.
131
132
133
134
Note: 𝑤𝐴 represents IDF or importance of A in the episode.
130 131 132 133 134
+𝒘𝑹𝒊𝒍𝒆𝒚
+𝒘𝑺𝒑𝒊𝒌𝒆+𝒘𝑹𝒊𝒍𝒆𝒚 +𝒘𝑺𝒑𝒊𝒌𝒆 +𝒘𝑫𝒓𝒂𝒄𝒖𝒍𝒂 0
0 0 0 +𝒘𝑫𝒓𝒂𝒄𝒖𝒍𝒂 +𝒘𝑩𝒖𝒇𝒇𝒚
Matrix of similarity scores
130
![Page 13: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/13.jpg)
Similarity – Subtitles
Giles has Willow start scanning books into a computer so there can be resources for the gang to use
He then tells her that he’s going to England because it seems he’s no longer needed by Buffy or the Scoobies
24 25 26 27
+1 +1 0 0
0 0 0 +2
Matrix of similarity scores
![Page 14: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/14.jpg)
14
Max Similarity
Maximize joint similarity over all shot-sentence assignmentssuch that each shot is assigned to ONE sentence
maximizes similarity
breaks structure causes jumpiness
Properties
![Page 15: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/15.jpg)
15
DTW2
Consecutive shots are likely to be assigned to same (or next) sentence
maximizes similarity with temporal consistency
efficient computation
can assign too many shots to one sentence
unable to handle plot-nonlinearity
Properties
maximize similarity+ each shot to ONE sentence
![Page 16: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/16.jpg)
16
DTW3
Regularize number of shots being assigned to one sentence
maximizes similarity with temporal consistency
automatically controls the number of shots assigned to a sentence
efficient computation
unable to handle plot non-linearity
Properties
maximize similarity+ each shot to ONE sentence+ temporal consistency
![Page 17: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/17.jpg)
Evaluation
Data set
Quantitative results
Qualitative results
17
![Page 18: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/18.jpg)
• Buffy the Vampire Slayer (season 5)
• Plot synopsis from Wikipedia
– 22 episodes, 15+ hours of video
– 15700 shots
– 800 sentences
– 21000 face tracks
• Per episode,– #shots: 540 – 940; avg. ~720
– #sentences: 22 – 54; avg. ~36
18
Data set
![Page 19: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/19.jpg)
19
Method BuffyE01
BuffyE02
Buffy E03
Buffy E04
Average E01 - E22
Human 81.5 86.4 77.5 72.8 –
Prior 2.9 23.8 27.9 8.8 10.11
Character ID MAX 11.6 30.9 23.6 19.1 –
Character ID DTW2 9.4 35.0 18.8 28.4 –
Character ID DTW3 42.2 43.8 40.4 40.3 41.17
Subtitles DTW3 20.4 48.4 35.3 30.1 37.00
Char-ID+Subt. DTW3 40.8 51.3 41.4 47.6 49.16
Alignment accuracy
Accuracy = correctly assigned shots
total number of shots%
![Page 20: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/20.jpg)
Alignment result
20
![Page 21: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/21.jpg)
Application
Story-based Retrieval
Demo
21
![Page 22: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/22.jpg)
Retrieval
Text Query
Plot Synopsis Retrieval
Alignment
Results Play Video
![Page 23: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/23.jpg)
Retrieval performance
62 queries;
Query Ground TruthTime and Sentence
top5?
Time
Buffy fights DraculaE01: m35-36(33) Buffy and Dracula fight in a vicious battle Overlap
Toth’s spell splits Xanderinto two personalities
E03: m11-12(7) The demon hits Xander with light from a rod … (8) … but then we see another Xander
×
Willow teleports Glory away
E13: m39(34) … before Willow and Tara perform a spell to teleport Glory somewhere else
Overlap
Glory sucks Tara’s mindE19: m24-27(15) Protecting Dawn, Tara refuses, and Glory drains Tara’s mind of sanity.
Overlap
Xander proposes AnyaE22: m24-27(6) Xander proposes Anya 2m44s
![Page 24: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/24.jpg)
24
Reaching the goal…
Conclusion Story-based retrieval in TV series
Alignment of human-written descriptions to shots in video
Dynamic programming based efficient solution
15+ hours of annotated video data
![Page 25: Story-based Video Retrieval in TV series using Plot Synopsesmakarand/presentations/2014_04_ICMR.pdf · Verbs Objects Idea. 5 Crowd-sourcing Freiburg et al. 2011 Concert concepts with](https://reader034.fdocuments.in/reader034/viewer/2022051920/600cc31edef1275b6e63c8e5/html5/thumbnails/25.jpg)
25
Thank you!
Story-based Video Retrieval in TV seriesusing Plot Synopses
Makarand Tapaswi [email protected]://cvhci.anthropomatik.kit.edu/~mtapaswi
Downloads: https://cvhci.anthropomatik.kit.edu/projects/mma