Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’...
Transcript of Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’...
![Page 1: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/1.jpg)
Taking Computer Vision Into The Wild
Neeraj Kumar
October 4, 2011 CSE 590V – Fall 2011
University of Washington
![Page 2: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/2.jpg)
A Joke
Q. What is computer vision?
A. If it doesn’t work (in the wild), it’s computer vision.
(I’m only half-‐joking)
![Page 3: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/3.jpg)
Instant Object RecogniWon Paper*
![Page 4: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/4.jpg)
Instant Object RecogniWon Paper*
1. Design new algorithm
2. Pick dataset(s) to evaluate on 3. Repeat unWl conference deadline:
a. Train classifiers b. Evaluate on test set c. Tune parameters and tweak algorithm
4. Brag about results with ROC curves
-‐ Fixed set of training examples
-‐ Fixed set of classes/objects
*Just add grad students
-‐ Training examples only have one object, oaen in center of image -‐ Fixed test set, usually from same overall dataset as training
-‐ MTurk filtering, pruning responses, long training Wmes, …
-‐ How does it do on real data? New classes?
![Page 5: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/5.jpg)
Instant Object RecogniWon Paper
1. User proposes new object class 2. System gathers images from flickr
3. Repeat unWl convergence: a. Choose windows to label b. Get labels from MTurk c. Improve classifier (detector)
4. Also evaluate on Pascal VOC
[S. Vijayanarasimhan & K. Grauman – Large-‐Scale Live AcWve Learning:
Training Object Detectors with Crawled Data and Crowds (CVPR 2011)]
-‐ Which windows to pick?
-‐ Which images to label?
-‐ What representaWon?
-‐ How does it compare to state of the art?
![Page 6: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/6.jpg)
Object RepresentaWon
Root from here
Deformable Parts: Root + Parts + Context
P=6 parts, from bootstrap set
C=3 context windows, excluding object candidate, defined to the lea, right, above
![Page 7: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/7.jpg)
Features: Sparse Max Pooling
Bag of Words Sparse Max Pooling Base features SIFT SIFT
Build vocabulary tree ✔ ✔
QuanWze features Nearest neighbor, hard decision
Weighted nearest neighbors, sparse coded
Aggregate features SpaWal pyramid Max pooling
[Y.-‐L. Boureau, F. Bach, Y. LeCun, J. Ponce – Learning Mid-‐level Features for RecogniWon (CVPR 2010]
[J. Yang, K. Yu, Y. Gong, T. Huang – Linear SpaWal Pyramid Matching Sparse Coding for Image ClassificaWon (CVPR 2009)]
![Page 8: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/8.jpg)
How to Generate Root Windows?
100,000s of possible locaWons, aspect raWos, sizes
1000s of images X
= too many possibiliWes!
![Page 9: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/9.jpg)
Jumping Windows Training Image Novel Query Image
• Build lookup table of how frequently given feature in a grid cell predicts bounding box
• Use lookup table to vote for candidate windows in query image a la generalized Hough transform
![Page 10: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/10.jpg)
Pick Examples via Hyperplane Hashing
[P. Jain, S. Vijayanarasimhan & K. Grauman – Hashing Hyperplane Queries to
Near Points with ApplicaWons to Large-‐Scale AcWve Learning (NIPS 2010)]
• Want to label “hard” examples near the hyperplane boundary
• But hyperplane keeps changing, so have to recompute distances…
• Instead, hash all unlabeled examples into table
• At run-‐Wme, hash current hyperplane to get index into table, to pick examples close to it
![Page 11: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/11.jpg)
Comparison on Pascal VOC
• Comparable to state-‐of-‐the-‐art, beuer on few classes
• Many fewer annotaWons required! • Training Wme is 15mins vs 7 hours (LSVM) vs 1 week (SP+MKL)
![Page 12: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/12.jpg)
Online Live Learning for Pascal
• Comparable to state-‐of-‐the-‐art, beuer on fewer classes
• But using flickr data vs. Pascal data, and automaWcally
![Page 13: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/13.jpg)
Sample Results Co
rrect
Incorrect
![Page 14: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/14.jpg)
Lessons Learned • It is possible to leave the sandbox • And sWll do well on sandbox evaluaWons
• Sparse max pooling with a part model works well
• Linear SVMs can be compeWWve with these features
• Jumping windows is MUCH faster than sliding
• Picking examples to get labeled is a big win
• Linear SVMs also allow for fast hyperplane hashing
![Page 15: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/15.jpg)
LimitaWons
“Hell is other people”
Jean-‐Paul Sartre
users
With apologies to
![Page 16: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/16.jpg)
![Page 17: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/17.jpg)
It doesn’t work well enough
Solving Real Problems for Users
Users want to do stuff
Users express their displeasure gracefully *With apologies to John Gabriel
![Page 18: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/18.jpg)
…And Never The Twain Shall Meet?
Pascal VOC Results from Previous Paper
0
10
20
30
40
50
60
70
80
90
100
bicyc. bird boul car chair dinin. horse person poue. sofa tvmon. Mean
Ours� BoF SP� LLC SP� LSVM+HOG SP+MKL
Current Best of Vision Algorithms
User ExpectaWon
?
![Page 19: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/19.jpg)
SegmentaWon
DetecWon Shape EsWmaWon
Stereo
Tracking Geometry
Simplify Problem!
Unsolved Vision Problems
![Page 20: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/20.jpg)
Columbia University
University of Maryland
Smithsonian InsWtuWon
![Page 21: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/21.jpg)
![Page 22: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/22.jpg)
Easier SegmentaWon for Leafsnap
![Page 23: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/23.jpg)
Plants vs Birds
2d 3d
Doesn’t move Moves
Okay to pluck from tree Not okay to pluck from tree
Mostly single color Many colors
Very few parts Many parts
Adequately described by boundary Not well described by boundary
RelaWvely easy to segment Hard to segment
![Page 24: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/24.jpg)
Human-‐Computer CooperaWon
[S. Branson, C. Wah, F. Schroff, B. Babenko, P. Welinder, P. Perona, S. Belongie – Visual RecogniWon with Humans in the Loop (ECCV 2010)]
What color is it?
Red!
Where’s the beak?
Top-‐right!
Where’s the tail?
Bouom-‐lea!
Describe its beak
Uh, it’s pointy?
Where is it?
Okay.
Bouom-‐lea!
<Shape descriptor>
![Page 25: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/25.jpg)
20 QuesWons
hup://20q.net/
![Page 26: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/26.jpg)
InformaWon Gain for 20Q
Pick most informaWve quesWon to ask next
Expected informaWon gain of class c, given image & previous responses
Probability of ge|ng response ui, given image & previous responses
Entropy of class c, given image and possible new response ui Entropy of class c right now
![Page 27: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/27.jpg)
Answers make distribuWon peakier
![Page 28: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/28.jpg)
IncorporaWng Computer Vision
Probability of class c, given image and any set of responses
Bayes’ rule
Assume variaWons in user responses are NOT image-‐dependent
ProbabiliWes affect entropies!
![Page 29: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/29.jpg)
IncorporaWng Computer Vision…
…leads to different quesWons
![Page 30: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/30.jpg)
Ask for User Confidences
![Page 31: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/31.jpg)
Modeling User Responses is EffecWve!
![Page 32: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/32.jpg)
Birds-‐200 Dataset
hup://www.vision.caltech.edu/visipedia/CUB-‐200.html
![Page 33: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/33.jpg)
Results
![Page 34: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/34.jpg)
Results
With fewer quesWons, CV does beuer With more quesWons, humans do beuer
![Page 35: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/35.jpg)
Lessons Learned • Computer vision is not (yet) good enough for users
• But users can meet vision halfway
• Minimizing user effort is key!
• Users are not to be trusted (fully) • Adding vision improves recogniWon
• For fine-‐scale categorizaWon, auributes do beuer than 1-‐vs-‐all classifiers if there are enough of them
Classifier 200
(1-‐vs-‐all) 288 aur. 100 aur. 50 aur. 20 aur. 10 aur.
Avg # QuesWons
6.43 6.72 7.01 7.67 8.81 9.52
![Page 36: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/36.jpg)
LimitaWons • Real system sWll requires much human effort
• Only birds • CollecWng and labeling data • Crowdsourcing? • Experts?
• Building usable system
• Minimizing
![Page 37: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54c960326a89370f0a2828/html5/thumbnails/37.jpg)
Visipedia
hup://www.vision.caltech.edu/visipedia/