1/30 Video indexing and retrieval at TREC 2002 1 Laboratoire de Reconnaissance de Formes et Vision...
-
date post
22-Dec-2015 -
Category
Documents
-
view
212 -
download
0
Transcript of 1/30 Video indexing and retrieval at TREC 2002 1 Laboratoire de Reconnaissance de Formes et Vision...
1/30
Video indexing and retrieval atTREC 2002
1Laboratoire de Reconnaissance de Formes et VisionInstitut National des Sciences Appliquées de Lyon
Bât. Jules Verne, 20, Avenue Albert Einstein69621 Villeurbanne cedex, France
Christian Wolf1
[email protected] Doermann2
2Laboratory for Language and Media ProcessingInstitute for Advanced Computer Studies
University of MarylandCollege Park, MD 20742-3275, USA
2/30
Plan of the presentation
Introduction - The TREC Competition
Features & query techniques
Experiments & ResultsRun types
Example queries
The impact of speech/text/color
Conclusion and Outlook
Introduction Features & Query types Experimental Results ConclusionImpact of Features
3/30
The NIST TExt Retrieval Conference
68.45 hours of MPEG 1 from “the internet archive” and the “open video project”
The goal of the conference series is to encourage research in information retrieval from large amounts of text by providing
a large test collectionuniform scoring proceduresa forum for organizations
interested in comparing their results
The Video Retrieval Track aims at the investigation of content-based retrieval from digital video.
Introduction Features & Query types Experimental Results ConclusionImpact of Features
4/30
Aims and Tasks
Shot boundary determination
Feature extraction
Search
3 sub tasks are defined in the Video Track, and participants are free to choose for which tasks they want to submit results:
Feature development collection (23.6h)
Feature test collection (5h)
Search test collection (40.12h)
Introduction Features & Query types Experimental Results ConclusionImpact of Features
5/30
Search: different query typesTwo different query types are supported by the competition: manual and interactive queries.
Introduction Features & Query types Experimental Results ConclusionImpact of Features
6/30
Example search topicsFind shots with Eddie Rickenbacker in them Find additional shots with James H. Chandler Find pictures of George Washington Find shots with a depiction of Abraham Lincoln Find shots of people spending leisure time at the beach, for example: walking, Find shots of one or more musicians: a man or woman playing a music instrument with instrumental music audible. Musician(s) and instrument(s) must be at least partly visible sometime during the shot. Find shots of football players Find shots of one or more women standing in long dresses. Dress should be one piece and extend below knees. The entire dress from top to end of dress below knees should be visible at some point. Find shots of the Golden Gate Bridge Find shots of Price Tower, designed by Frank Lloyd Wright and built in Bartlesville, Oklahoma, . Find shots containing Washington Square Park's arch in New York City. The entire arch should be visible at some point Find overhead views of cities - downtown and suburbs. The viewpoint should be higher than the highest building visible Find shots of oil fields, rigs, derricks, oil drilling/pumping equipment. Shots just of refineries are not desired Find shots with a map (sketch or graphic) of the continental US. Find shots of a living butterfly Find more shots with one or more snow-covered moutain peaks or ridges. Some sky must be visible them behind
Introduction Features & Query types Experimental Results ConclusionImpact of Features
7/30
The feature extraction task: overlay text
Introduction Features & Query types Experimental Results ConclusionImpact of Features
Binarization:
OCR: Scansoft
“Soukaina Oufkir”
Detection,Multiple frame integration
Suppression of false alarms
TONY RIYERAARNOLD GILLESPIEEUGENE PODDANYEMERY NAWKúN5GEORGE GORDONGERALD NEYIUD i recto rTRUE BOAROMANCARL URBANArt DirectionEMERY NAWKINSMusic ScoreDirectorGEORGE GORDONl E W K E LLERPRODUCTIONa yen Pu s1c~
. .ai ~ ia 7) E nAl~1I.Mol, 6I J'-Nr~vir lowre,740~17-jF 00Iis!'/
Text examples Non-Text ex.A linear classifier trained with Fisher’s linear discriminant is used to classfy the OCR output for each text box into text and non text.
Separation of characters into 4 types:
Upper A-ZLower a-zdigits 0-9bad rest
Features:
Number of good characters (upper+lower+digits)
Number of charactersF1=
Number of class changes
Number of charactersF2=
8/30
Features
Shot boundary definition (MPEG7-
XML)
14524 shots
search test collection
(40h)
Outdoors IBM
Outdoors MSRA
Outdoors Mediamill
Face IBM
Face MSRA
Face Mediamill
Face IBM
Donated features:
10 different binary features from different donators (all in all 32 detectors). Confidence is given for each shot.
MPEG7-XML
Temporal Color Correlograms
Developed by UMD in collaboration with the University of Oulu. [Rautiainen and Doermann, 2002]
Detected and recognized text
Developped by INSA de Lyon. [Wolf and Jolion, 2002]
Speech recognition LIMSI Donated featureMPEG7-XMLSpeech recognition MSRA
Introduction Features & Query types Experimental Results ConclusionImpact of Features
9/30
Query techniques
TextSpeech
Binary features
Temporalcolorfeatures
Query
Introduction Features & Query types Experimental Results ConclusionImpact of Features
10/30
Recognized text and speechFor the actual retrieval we used the freely available managing gigabytes software (http://www.cs.mu.oz.au/mg). Two query metrics are available:
•Boolean
•Ranked, based on the cosine measure.
Introduction Features & Query types Experimental Results ConclusionImpact of Features
Target: “Nick Chandler”Query: “chandler” N-gram: chand|handl|andle|ndler|chandl|handle|andler|chandle|handler|chandlerResults: ni ck l6 tia ndler
colleges cattlemen handlers of livestock
MG has been written for error free documents so it checks for exact matches on the stemmed words (e.g. produced fits producer).
We added an inexact match feature by using N-grams:
11/30
Binary featuresThe binary features specify the presence of a feature in each shot, the information being given in the confidence measure [0,1].
People - IBMPeople - MediamillPeople - MSRAOutdoors - IBMOutdoors - MediamillOutdoors - MSRA...
Introduction Features & Query types Experimental Results ConclusionImpact of Features
The product rule
j
iji xCxQ )()(
Training the combining classifier
The sum rule
j
iji xCxQ )()(
Quantifies the true likelihood, if the features are statistically independent. Bad if base classifiers are weakly trained or have high error rates.
Works well with base classifiers with independent noise behaviour.
X
Cij ... Output of classifier j for class iQi ... Output of combined classifier for class i
12/30
Binary features - ranked queries
0.27
0.87
0.94
0.15
0.08
0.65
0.27
0.23
0.56
0.15
0.76
0.07
1.0
1.0
1.0
1.0
1.0
0.0
People - IBM
People - Mediamill
People - MSRA
Outdoors - IBM
Outdoors - Mediamill
Indoors - IBM
Shot 1 Shot 2Query vector
1
1
0
Introduction Features & Query types Experimental Results ConclusionImpact of Features
Eucledian distance
Mahalanobis distance
)()(),( 1 yxyxyxD T
... Covariance matrix for the complete data set
)()(),( yxyxyxD T
3 dimensional case:
13/30
Temporal color features
Introduction Features & Query types Experimental Results ConclusionImpact of Features
dppIp n
cIpIp
d
cc jnnic
ji
212,
)(
, |Pr21
It stores the probability that, given any pixel p1 of color ci, a pixel p2 at distance d is of color cj among the shots frames In.
The distance is calculated using the L1 norm.
TREC: Auto correlogram ci = cj
For each shot, a temporal color correlogram is held. [Rautiainen and Doermann, 2002]:
14/30
The query tool
Introduction Features & Query types Experimental Results ConclusionImpact of Features
15/30
QueryingKeyword based queries on text or speech or both together,
with or without n-grams, boolean or ranked.Ranked color queries.Ranked queries on binary features.Filters on binary features.
Query 1 Query 2 Query 3 Query 41.00
0.96
0.00
1.00
0.20
0.00
1.00
0.70
0.00
1.00
0.30
0.00
AND, OR combination of query results incl. weighted combinations of the ranking of both queries.
Truncate queries.
View the keyframes of queries.
Export query results into stardom, the graphical browsing tool.
Introduction Features & Query types Experimental Results ConclusionImpact of Features
i
is
is N
rm
1
1,
18/30
ExperimentsTopic min. Description
75 Eddie Rickenbacker76 6 James Chandler77 14 George Washington78 19 Abraham Lincoln79 43 People at the Beach80 20 Musicians with music playing81 Football players82 29 Women standing in long dresses83 19 Golden gate bridge84 11 Price Tower85 Washington Square Park´s arch86 85 Overhead views of cities87 61 Oil fields, ricks, derricks88 43 Map of the continental US89 61 A living butterfly90 12 Snow covered mountain peaks91 Parrots92 20 Sailbots, sailing ship93 15 beef or dairy cattle, cows94 17 people walking in cities95 15 Nuclear explosion with mushroom96 39 US flag97 23 Miscroscopic views of living cells98 15 Locomotive approaching99 19 Rocket or missile taking off
Manual run using all available features.
Manual run without speech recognition.
Interactive run using all available features. The graphical tool was used to browse the data, but all submitted results were queries submitted by the command line tool.
Introduction Features & Query types Experimental Results ConclusionImpact of Features
19/30
Example queries
Binary features People>=0.25 Landscape <= 0.75
ANDOR
Querytype on
text/speech James Chandler
text/speech Jim Chandler
text/speech N-gram James Chandler
Color 1. Example video
Color 2. Example video
Color 3. Example video
weight
4
4
2
1
1
1
“Find additional shots of James H. Chandler”: manual query:weight
100000
1
“Shots of rockets or missiles taking off”: manual & interactive:
Querytype on
text/speech rocket missile
text/speech taking off launch start
Color 1. Example video
Color 2. Example video
weight
2
2
1
1
OR
Binary features Ranked: 7000 -People -Faces
AND
weight
100000
1
Introduction Features & Query types Experimental Results ConclusionImpact of Features
Prec./100 0.2Avg. prec. 0.38
76 manual
Prec./100 0.05Avg. prec. 0.34
99 manual
20/30
Manual vs. interactive queriesQuerytype on
text/speech beach
text/speech beach fun sun
text/speech leisure sand vacation
Color 1. Example video
Color 2. Example video
Color 3. Example video
Color 4. Example video
weight
4
3
2
1
1
1
1
Binary features People>=0.25 Indoors <= 0.75 Outdoors>=0.25
OR AND
weight
100000
1
Manual query
Introduction Features & Query types Experimental Results ConclusionImpact of Features
Prec./100 0Avg. prec. 0
79 manual
Querytype on
text/speech swimming
text/speech shore
weight
2
1
Querytype on
text/speech water
Binary featuresPL<=0.5 OD>=0.5
CT<=0.05 ID<=0.75 LS>=0.5
weight
2
1
Binary features Landscape>=0.3 Cityscape <= 0.5 Outdoors>=0.5
weight
2
1
OR
OR
weight
100000
1
AND
OR
Interactive query
Prec./100 0.07Avg. prec. 0.11
79 interactive
21/30
Ranked binary queries: distance functions
Introduction Features & Query types Experimental Results ConclusionImpact of Features
Topic Eucl. Mah.82 0,24 0,1684 0 099 0 0
Topic Query Eucl. Mah.82 +People +Indoors -Outdoors -Landscape 0,24 0,1684 +Cityscape -People -Face 0,48 0,0699 -People -Face 0,96 0,8
Full query Binary query only
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
2000
4000
6000
8000
10000
12000
14000People IBMPeople MediamillPeople MSRA
Query vector
shot vector
vars
1 0 0,121 0,18 0,130 0 0,060 0,23 00 0 0,160 0,05 0,070 0,01 00 1 0,120 0 0,04
diff
std dev.
Distributions of the 3 “people” detectorsExample false alarm
22/30
Precision curves per topic
Introduction Features & Query types Experimental Results ConclusionImpact of Features
Manual Manual no ASR Interactive
Interactive
Precision / result set size
Precision / recall
23/30
Precision curves consolidated
Introduction Features & Query types Experimental Results ConclusionImpact of Features
Manual Manual no ASR Interactive
Precision / result set size
Precision / recall
24/30
Comparison with other teams
Introduction Features & Query types Experimental Results ConclusionImpact of Features
ID mean std. dev.
1 0,23 0,14
2 0,14 0,17
3 0,11 0,15
4 0,09 0,12
5 0,09 0,16
6 0,08 0,11
7 0,07 0,20
8 0,07 0,09
9 0,06 0,10
10 0,06 0,18
11 0,06 0,10
12 0,06 0,12
13 0,06 0,09
14 0,06 0,10
15 0,04 0,11
16 0,03 0,08
17 0,03 0,06
18 0,03 0,05
19 0,02 0,05
20 0,01 0,02
21 0,01 0,01
22 0,01 0,01
23 0,01 0,00
24 0,00 0,01
25 0,00 0,01
26 0,00 0,01
27 0,00 0,00
Average precison - Manual
25/30
ID mean std. dev.
1 0,52 0,24
2 0,32 0,21
3 0,31 0,20
4 0,29 0,21
5 0,26 0,21
6 0,24 0,22
7 0,22 0,23
8 0,18 0,21
9 0,15 0,15
10 0,15 0,15
11 0,07 0,11
12 0,05 0,08
13 0,05 0,08
Comparison with other teams
Introduction Features & Query types Experimental Results ConclusionImpact of Features
Average precision - Interactive
26/30
SpeechThe quality of the speech queries highly depends on the topic.
In general, the return sets of speech queries are very heterogenous and need to be filtered, e.g. by binary filters.
Example: “rocket missile”
Introduction Features & Query types Experimental Results ConclusionImpact of Features
27/30
ColorAs expected, the color filters have been very useful in cases where the query images where very different from other images in terms of low level features, or where the relevant shots in the database share common color properties with the example query (e.g. shots are in the same environment).
Query “living cells”: results of the run without speech are better than the run including speech.
Introduction Features & Query types Experimental Results ConclusionImpact of Features
28/30
ColorSearching for “James Chandler” using the color features only.
Introduction Features & Query types Experimental Results ConclusionImpact of Features
29/30
Recognized text
“Dance”
“EnergyGas”
“Music”
“Oil”
The type of videos present in the collection does not favor the use of recognized text. In most videos, the only text present in the documentaries is the title at the beginning and the casting at the end.
Introduction Features & Query types Experimental Results ConclusionImpact of Features
“Airline”“Air plane”
30/30
Conclusion and OutlookExploit temporal continuities between the frames, as already
proposed by the dutch team during TREC 2001. This seems to be especially important for video OCR, since sometimes single shots with text only “interrupt” content shots.
Training of the combination of features.More research into the combination of the binary features
(normalization, robust outlier detection etc.).Browsing: The graphical viewing interface could be very
promising, if it is possible to integrate tiny (and enlargable) keyframes into the grid.
Use of additional features: Explicit color filters and query by (sketched) example:
define regions and color ranges. Motion features. Usage of the internet to get example images (google).
Introduction Features & Query types Experimental Results ConclusionImpact of Features