Prof. Feng Liu
Transcript of Prof. Feng Liu
![Page 1: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/1.jpg)
Prof. Feng Liu
Winter 2020
http://www.cs.pdx.edu/~fliu/courses/cs410/
02/27/2020
![Page 2: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/2.jpg)
Last Time
Introduction to object recognition
2
The slides for this topic are used from Prof. S. Lazebnik.
![Page 3: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/3.jpg)
Today
Machine learning approach to object recognition
◼ Classifiers
◼ Bag-of-features models
3
The slides for this topic are used from Prof. S. Lazebnik.
![Page 4: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/4.jpg)
Recognition: A machine learning approach
Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, Kristen Grauman, and Derek Hoiem
![Page 5: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/5.jpg)
The machine learning framework
Apply a prediction function to a feature representation of
the image to get the desired output:
f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
![Page 6: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/6.jpg)
The machine learning framework
y = f(x)
Training: given a training set of labeled examples {(x1,y1),
…, (xN,yN)}, estimate the prediction function f by minimizing
the prediction error on the training set
Testing: apply f to a never before seen test example x and
output the predicted value y = f(x)
output prediction function
Image feature
![Page 7: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/7.jpg)
Prediction
Steps
Training
LabelsTraining Images
Training
Training
Image
Features
Image
Features
Testing
Test Image
Learned
model
Learned
model
Slide credit: D. Hoiem
![Page 8: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/8.jpg)
Features
Raw pixels
Histograms
GIST descriptors
…
![Page 9: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/9.jpg)
Classifiers: Nearest neighbor
f(x) = label of the training example nearest to x
All we need is a distance function for our inputs
No training required!
Test example
Training examples from class
1
Training examples from class
2
![Page 10: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/10.jpg)
Classifiers: Linear
Find a linear function to separate the classes:
f(x) = sgn(w x + b)
![Page 11: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/11.jpg)
Images in the training set must be annotated with the
“correct answer” that the model is expected to produce
Contains a motorbike
Recognition task and supervision
![Page 12: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/12.jpg)
Unsupervised “Weakly” supervised Fully supervised
Definition depends on task
![Page 13: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/13.jpg)
Generalization
How well does a learned model generalize from
the data it was trained on to a new test set?
Training set (labels known) Test set (labels unknown)
![Page 14: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/14.jpg)
Generalization
Components of generalization error
◼ Bias: how much the average model over all training sets differ from
the true model?
Error due to inaccurate assumptions/simplifications made by the model
◼ Variance: how much models estimated from different training sets
differ from each other
Underfitting: model is too “simple” to represent all the relevant
class characteristics
◼ High bias and low variance
◼ High training error and high test error
Overfitting: model is too “complex” and fits irrelevant
characteristics (noise) in the data
◼ Low bias and high variance
◼ Low training error and high test error
![Page 15: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/15.jpg)
Bias-variance tradeoff
Training error
Test error
Slide credit: D. Hoiem
Underfitting Overfitting
Complexity Low BiasHigh Variance
High BiasLow Variance
Err
or
![Page 16: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/16.jpg)
Bias-variance tradeoff
Many training examples
Few training examples
Complexity Low BiasHigh Variance
High BiasLow Variance
Test
Err
or
Slide credit: D. Hoiem
![Page 17: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/17.jpg)
Effect of Training Size
Testing
Training
Generalization Error
Slide credit: D. Hoiem
Number of Training Examples
Err
or
Fixed prediction model
![Page 18: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/18.jpg)
Datasets
Circa 2001: 5 categories, 100s of images per
category
Circa 2004: 101 categories
Today: up to thousands of categories, millions
of images
![Page 19: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/19.jpg)
Caltech 101 & 256
Griffin, Holub, Perona, 2007
Fei-Fei, Fergus, Perona, 2004
http://www.vision.caltech.edu/Image_Datasets/Caltech101/http://www.vision.caltech.edu/Image_Datasets/Caltech256/
![Page 20: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/20.jpg)
Caltech-101: Intraclass variability
![Page 21: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/21.jpg)
The PASCAL Visual Object Classes
Challenge (2005-present)
Challenge classes:Person: person Animal: bird, cat, cow, dog, horse, sheep Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor
http://host.robots.ox.ac.uk/pascal/VOC/
![Page 22: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/22.jpg)
Main competitions
◼ Classification: For each of the twenty classes,
predicting presence/absence of an example of that
class in the test image
◼ Detection: Predicting the bounding box and label of
each object from the twenty target classes in the test
image
The PASCAL Visual Object Classes
Challenge (2005-present)
http://pascallin.ecs.soton.ac.uk/challenges/VOC/
![Page 23: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/23.jpg)
“Taster” challenges
◼ Segmentation:
Generating pixel-wise
segmentations giving
the class of the object
visible at each pixel, or
"background"
otherwise
◼ Person layout:
Predicting the
bounding box and label
of each part of a
person (head, hands,
feet)
The PASCAL Visual Object Classes
Challenge (2005-present)
http://pascallin.ecs.soton.ac.uk/challenges/VOC/
![Page 24: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/24.jpg)
“Taster” challenges
◼ Action classification
The PASCAL Visual Object Classes
Challenge (2005-present)
http://pascallin.ecs.soton.ac.uk/challenges/VOC/
![Page 25: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/25.jpg)
Russell, Torralba, Murphy, Freeman, 2008
LabelMehttp://labelme.csail.mit.edu/
![Page 26: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/26.jpg)
80 Million Tiny Images
http://people.csail.mit.edu/torralba/tinyimages/
![Page 28: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/28.jpg)
Today
Machine learning approach to object recognition
◼ Classifiers
◼ Bag-of-features models
28
The slides for this topic are used from Prof. S. Lazebnik.
![Page 29: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/29.jpg)
Bag-of-features models
![Page 30: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/30.jpg)
Origin 1: Texture recognition
Texture is characterized by the repetition of basic
elements or textons
For stochastic textures, it is the identity of the
textons, not their spatial arrangement, that matters
Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
![Page 31: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/31.jpg)
Origin 1: Texture recognition
Universal texton dictionary
histogram
Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
![Page 32: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/32.jpg)
Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)
Origin 2: Bag-of-words models
![Page 33: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/33.jpg)
US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/
Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)
Origin 2: Bag-of-words models
![Page 34: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/34.jpg)
US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/
Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)
Origin 2: Bag-of-words models
![Page 35: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/35.jpg)
US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/
Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)
Origin 2: Bag-of-words models
![Page 36: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/36.jpg)
1. Extract features
2. Learn “visual vocabulary”
3. Quantize features using visual vocabulary
4. Represent images by frequencies of “visual words”
Bag-of-features steps
![Page 37: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/37.jpg)
1. Feature extraction
Regular grid or interest regions
![Page 38: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/38.jpg)
Normalize patch
Detect patches
Compute descriptor
Slide credit: Josef Sivic
1. Feature extraction
![Page 39: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/39.jpg)
…
1. Feature extraction
Slide credit: Josef Sivic
![Page 40: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/40.jpg)
2. Learning the visual vocabulary
…
Slide credit: Josef Sivic
![Page 41: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/41.jpg)
2. Learning the visual vocabulary
Clustering
…
Slide credit: Josef Sivic
![Page 42: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/42.jpg)
2. Learning the visual vocabulary
Clustering
…
Visual vocabulary
Slide credit: Josef Sivic
![Page 43: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/43.jpg)
K-means clustering
• Want to minimize sum of squared Euclidean
distances between points xi and their
nearest cluster centers mk
Algorithm:
• Randomly initialize K cluster centers
• Iterate until convergence:
◼ Assign each data point to the nearest center
◼ Re-compute each cluster center as the mean of
all points assigned to it
−=k
ki
ki mxMXDcluster
clusterinpoint
2)(),(
![Page 44: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/44.jpg)
Clustering and vector quantization
• Clustering is a common method for learning a visual
vocabulary or codebook
◼ Unsupervised learning process
◼ Each cluster center produced by k-means becomes a codevector
◼ Codebook can be learned on separate training set
◼ Provided the training set is sufficiently representative, the
codebook will be “universal”
• The codebook is used for quantizing features
◼ A vector quantizer takes a feature vector and maps it to the index
of the nearest codevector in a codebook
◼ Codebook = visual vocabulary
◼ Codevector = visual word
![Page 45: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/45.jpg)
Example codebook
…
Source: B. LeibeAppearance codebook
![Page 46: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/46.jpg)
Another codebook
Appearance codebook…
………
…
Source: B. Leibe
![Page 47: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/47.jpg)
Yet another codebook
Fei-Fei et al. 2005
![Page 48: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/48.jpg)
Visual vocabularies: Issues
• How to choose vocabulary size?
◼ Too small: visual words not representative of all
patches
◼ Too large: quantization artifacts, overfitting
• Computational efficiency
◼ Vocabulary trees
(Nister & Stewenius, 2006)
![Page 49: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/49.jpg)
Spatial pyramid representation
level 0
Extension of a bag of features
Locally orderless representation at several levels of resolution
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene CategoriesLazebnik, Schmid & Ponce (CVPR 2006)
![Page 50: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/50.jpg)
Spatial pyramid representation
level 0 level 1
Extension of a bag of features
Locally orderless representation at several levels of resolution
Lazebnik, Schmid & Ponce (CVPR 2006)
![Page 51: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/51.jpg)
Spatial pyramid representation
level 0 level 1 level 2
Extension of a bag of features
Locally orderless representation at several levels of resolution
Lazebnik, Schmid & Ponce (CVPR 2006)
![Page 52: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/52.jpg)
Scene category dataset
Multi-class classification results(100 training images per class)
![Page 53: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/53.jpg)
Caltech101 dataset
Multi-class classification results (30 training images per class)
![Page 54: Prof. Feng Liu](https://reader030.fdocuments.in/reader030/viewer/2022012705/61a6bc352542c665a3228fed/html5/thumbnails/54.jpg)
Next Time
More classification
Visual saliency
54