Video Google: A Text Retrieval Approach to Object Matching...
Transcript of Video Google: A Text Retrieval Approach to Object Matching...
![Page 1: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/1.jpg)
Video Google: A Text Retrieval Approach to Object Matching in Videos
Josef Sivic, Frederik Schaffalitzky, Andrew Zisserman
Visual Geometry GroupUniversity of Oxford
![Page 2: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/2.jpg)
The vision …Enable video, e.g. a feature length film, to be searched on its visual content with the same ease and success as a Google search of text documents.
“Groundhog Day” [Rammis, 1993]“Run Lola Run” (‘Lola Rennt’) [Tykwer, 1999]
![Page 3: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/3.jpg)
Visually defined search
Example : Groundhog Dayclose-up
Given an object specified in one frame, retrieve all shots containing the object:
• must handle viewpoint change
• must be efficient at run time
![Page 4: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/4.jpg)
Example: Groundhog Day
12 35 69Rank:
73 keyframes retrieved
53 correct, first incorrect ranked 27
50
![Page 5: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/5.jpg)
Example : Run Lola Run
rank 9
query region close-up
Ranked returned key-frames (selection)
rank 16
rank 22 rank 25
![Page 6: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/6.jpg)
Outline
1. Viewpoint invariant content based image retrieval• match key frames
• e.g. Groundhog Day: 170K frames, 1K shots, 5K key frames
2. Benefits of using video over individual images• use information from all frames
3. Lessons from text retrieval• represent frames by a set of ‘visual words’
4. Extensions• external objects
• object aspects
![Page 7: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/7.jpg)
1. Viewpoint invariant content based image retrieval
![Page 8: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/8.jpg)
Problem statement
Various key frames from ‘Run Lola Run’
Retrieve key frames containing the same object
![Page 9: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/9.jpg)
Approach
• Determine regions (segmentation) and vector descriptors in each frame which are invariant to camera viewpoint changes
• Match descriptors between frames using invariant vectors
![Page 10: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/10.jpg)
Invariance requirements
• A significant translation results in differingsurface foreshortening (also scale changes)
• also differing intensity
![Page 11: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/11.jpg)
Local invariance requirements
• Geometric: 2D affine transformation
where A is a 2x2 non-singular matrix
• Photometric: 1D affine transformation
baII += )()( 21 xx
• Objective: compute image descriptors invariant to this class of transformations
![Page 12: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/12.jpg)
Viewpoint covariant segmentation
• Characteristic scales (size of region)• Lindeberg and Garding ECCV 1994
• Lowe ICCV 1999
• Mikolajczyk and Schmid ICCV 2001
•Affine covariance (shape of region)• Baumberg CVPR 2000
• Matas et al BMVC 2002
• Mikolajczyk and Schmid ECCV 2002
• Schaffalitzky and Zisserman ECCV 2002
• Tuytelaars and Van Gool BMVC 2000
• Miscellaneous others•Tell and Carlsson ECCV 2000
![Page 13: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/13.jpg)
Covariant regions, type I
1. Detect interest points
2. Determine scale using extrema of Laplacian of Gaussian
3. Determine elliptical shape using stationary condition on second moment gradient matrix
See Mikolajczyk & Schmid, and Schaffalitzky & Zisserman ECCV 2002
Shape adapted interest point neighbourhoods – cover same scene region
![Page 14: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/14.jpg)
Covariant regions, type II:
1. Segment using watershed algorithm, and track connected components as threshold value varies.
2. An MSR is detected when the area of the component is stationary
See Matas et al BMVC 2002
Maximally Stable regions (MSR)
![Page 15: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/15.jpg)
Example: Maximally stable regions
![Page 16: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/16.jpg)
Example of covariant regions
1000+ descriptors per frame Shape adapted regions
Maximally stable regions
![Page 17: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/17.jpg)
Viewpoint invariant description
• Elliptical viewpoint covariant regions• Shape Adapted regions
• Maximally Stable Regions
• Map ellipse to circle and orientate by dominant direction
• Represent each region by SIFT descriptor (128-vector) [Lowe 1999]• see Mikolajczyk and Schmid CVPR 2003 for a comparison
![Page 18: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/18.jpg)
Return to Problem statement
Various key frames from ‘Run Lola Run’
Retrieve key frames containing the same object
![Page 19: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/19.jpg)
Pros and cons of a set of viewpoint invariant descriptors
Advantage: • local descriptors are robust to partial occlusion
• global descriptors e.g. colour histograms and texture histograms, are not
Disadvantage:• too much individual invariance
• each region can rotate independently (by different amounts)
• use semi-local and global spatial relations to verify matches
![Page 20: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/20.jpg)
Approach
• Determine regions (segmentation) and vector descriptors in each frame which are invariant to camera viewpoint changes
• Match descriptors between frames using invariant vectors
• Verify / reject frame matches using spatial consistency and multiple view geometric relations:
• spatial neighbours match• homographies• epipolar geometry
• Frames are matched if a sufficient number of regions satisfy the geometric relations between views
![Page 21: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/21.jpg)
Example
512 x 768 pixel images
![Page 22: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/22.jpg)
Goal: establish matches
![Page 23: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/23.jpg)
A sub-set of the shape adapted interest points
2300 interest points reduced to one third (770) after adaptation
![Page 24: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/24.jpg)
Stage (1): invariant indexing
Match closest vectors• all vectors within a ball in invariant space
![Page 25: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/25.jpg)
Stage (2): neighbourhood consensus
Loose constraint on spatial consistency fora putative match
• compute the K closest matched neighbours of the point in each image
• require that at least N of these must match
Here K =10, N = 1
?
![Page 26: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/26.jpg)
Stricter photometric and geometric filters can be applied
e.g. 1: local verificationMatched invariants do not imply that the regions match
• regions match if cross-correlation after registration is below threshold
e.g. 2: local geometry• neighbouring correspondences consistent with the affine transformation
image 1 image 2
A
![Page 27: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/27.jpg)
Detail
key-frames matched points
shot4
shot9
![Page 28: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/28.jpg)
2 7 6
pairs
2
Stage (1)invariantindexing
Stage (2)neighbourhood
consensus
rank retrieved key frames by number of matches
![Page 29: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/29.jpg)
2. Video: the benefits of having contiguous frames
![Page 30: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/30.jpg)
Track regions through shot …
Constant velocity dynamical model and correlation
Shape adapted regions
Maximally stable regions
![Page 31: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/31.jpg)
The benefits of contiguous frames within a shot …
Track regions throughout a shot and aggregate information:
1. Reject unstable tracks (threshold on track length)• Increases discrimination of descriptors
2. Compute mean descriptor over track • Increases accuracy (signal to noise) of descriptor
3. Compute noise covariance over tracks• Defines Mahalanobis distance for descriptor space
![Page 32: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/32.jpg)
One region detail
full frame
affine normalized
region
region close-up
benefit of SIFT
![Page 33: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/33.jpg)
A region tracked over 70 frames
Close-ups of sample frames
First 8 dimensions (columnwise) of the SIFTdescriptor evolving with time.
![Page 34: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/34.jpg)
3. Lessons from text retrieval –google like object retrieval
![Page 35: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/35.jpg)
Text retrieval overview
Stemming
Stop-list
Inverted file
Reject the very common words, e.g. “the”, “a”, “of”
Represent words by stems, e.g. “walking”, “walks” -> “walk”
Ideal book index: Term List of hits (occurrences in documents)
People [d1:hit hit hit], [d4:hit hit] …
Common [d1:hit hit], [d3: hit], [d4: hit hit hit] …
Sculpture [d2:hit], [d3: hit hit hit] …
• word matches are pre-computed
![Page 36: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/36.jpg)
Ranking • frequency of words in document (tf-idf)
• proximity weighting (google)
• PageRank (google)
![Page 37: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/37.jpg)
Building a visual vocabulary
Vector quantize descriptors• k-means clustering with Mahalanobis distance
+
+
Implementation• use 48 shots from Lola = 200K track descriptors (means)
• 6K clusters for Shape Adapted regions
• 10K clusters for Maximally Stable regions
![Page 38: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/38.jpg)
Why cluster separately ?
• the two types of regions cover different and independent scene regions
• they may be thought of as different vocabularies for describing the scene
Shape adapted regions
Maximally stable regions
![Page 39: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/39.jpg)
Samples of visual words:
Maximally stable regionsShape adapted regions
![Page 40: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/40.jpg)
Matching on visual words
• No loss of matching accuracy against standard nearest neighbour matching
• Use same visual vocabulary for matching frames outside training shots in Lola i.e. for unseen shots
• Use same visual vocabulary for matching within Groundhog Day
![Page 41: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/41.jpg)
Video representation
“histogram” of visual words
appearing in the frame
assign visual words
• in advance: represent all key frames by visual words
• video is represented as a (weighted) matrix of occurrences
visual word list
key frames
store frequency and position of words
![Page 42: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/42.jpg)
Matching stages for query region
Stage 1: match to key frames based on the visual word histogram
visual word histogram for query region
visual word list
key frames
score kf 1
score kf 2
3
0
4
0
3040
![Page 43: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/43.jpg)
• Discard mismatches• require spatial agreement with the neighbouring matches
• Compute matching score• score each match with the number of agreement matches
• accumulate the score from all matches
Image 1 Image 2
Stage 2: spatial consistency
NB very weak measure of spatial consistency
![Page 44: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/44.jpg)
Example
query region
![Page 45: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/45.jpg)
Visual indexing using text retrieval methods
Initial matches:• on common visual words
Stop-list:• Stop top 5% (large clusters of over 3K points) and bottom 10%
Spatial consistency ranking on 15 nearest neighbours:
• reject matches with no support
• rank frame by number of supporting matches
![Page 46: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/46.jpg)
Example : Run Lola Run
1 12 16 20Rank:
20 keyframes retrieved
all correct
![Page 47: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/47.jpg)
Appraisal
• matches on visual words are “pre-computed”, so at run-time retrieval is immediate (0.1s on a 2Gh machine)
• can search for objects and combinations of objects that were not considered when matches pre-computed
• only failures (against ground truth) are when descriptors are missing: e.g. motion or optical blur
![Page 48: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/48.jpg)
4. Extensions
![Page 49: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/49.jpg)
Extension I: searching from other sources
So far query frame chosen from within video ….
Extend to use external images to search for particular objects or places
image of object
detect affine co-variant regions
vector quantize onto clusters
visual word histogram
e.g. for product placement or company logos or particular type of vehicle or building
![Page 50: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/50.jpg)
Sony logo
Retrieve shots from Lola and Groundhog Day
![Page 51: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/51.jpg)
Retrieved shots in groundhog day for search on Sony logo
![Page 52: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/52.jpg)
Extension II: Object level grouping
The problem: searching on one object visual aspect will not return other aspects
A solution: use motion within a shot to group aspects
![Page 53: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/53.jpg)
Motion based grouping example
Input shot
• Track detected regions through shot• Group tracks into independently moving objects using
rigidity• Use tracks for each object to associate visual aspects
![Page 54: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/54.jpg)
Off the shelf tracking
Tracks repaired using inter-frame matching
• jump short gaps
• fill in missed regions
only tracks longer than 10 frames shown
![Page 55: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/55.jpg)
Grouping using robust affine factorization
3 largest groups
![Page 56: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/56.jpg)
query (portal) frame with outlined query
region
Object level matches
associated query frames
Extend from image based retrieval
to accessing an object model defined over multiple images
![Page 57: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/57.jpg)
Query frame with outlined query region
Object level matches – van example
Examples of retrieved frames
![Page 58: Video Google: A Text Retrieval Approach to Object Matching ...cmp.felk.cvut.cz/cmp/events/colloquium-2004-01-12/... · e.g. 1: local verification Matched invariants do not imply that](https://reader033.fdocuments.in/reader033/viewer/2022051904/5ff59d2e0759151dd5714099/html5/thumbnails/58.jpg)
Futures – research issues• Segmentation that commutes with viewpoint
• have seen a few examples here, more are required to cope with varying scene structure
• Deformable/articulated objects – Luc Van Gool’s talk
• Import further ideas from text retrieval, e.g. latent semantic indexing to find visual content
• Further reading: papers at ICCV 03 and ECCV 04