ImgeClassification

Scene Classification

Thomas Atta-Fosu, Daniel Hafley,

December 15, 2014

Thomas Atta-Fosu, Daniel Hafley, Multilabel Classification Problem December 15, 2014 1 / 17

Motivation

Given a collection of images from different (more than 2) scenes, wewish to classify each image into the right category.

Figure: A collection of images to sort into different scenes

We consider a subset of 8 scenes from the SUN Dataset: Coast, Forest,Highway, Inside City, Mountain, Open Country/Countryside, Street, TallBuilding

(a) Coast (b) Forest (c) Highway (d) Inside City

(a) Mountain (b) Open Country (c) Street (d) Tall Building

Feature Vectors

Most of the ’State-of-the-art’ techniques uses the ’bag of features’obtained from samples/statistics of filter responses of the image as afeature vector (gist, SIFT descriptors). We will not discuss the’SIFT’ technique in this talk (see [David G. Lowe,2004]) We discussthe ’gist’ scheme as outlined in [Oliva & Torralba,2001]

36 Gabor filtersThomas Atta-Fosu, Daniel Hafley, Multilabel Classification Problem December 15, 2014 4 / 17

Example

(a) Image (b) 1 Response (c) Patches

For each filter there are 16 mean values, 1 for each patch. Hence there are576 features for each image.

Train and Test Set

A total of 2688 images from all 8 categories were downloaded from theSUN Dataset website.

800 Train images: 100 images from each (category/scene)

The remaining 1888 Images were used as test set.

What we tried

Shearlet Coefficients motivated by the work in [Torralba et al,2003]

In progress

Gaussian Filters

Did not perform so well (Accuracy ≈ 40)

sampling specific pixel values in each patch

Gabor Filters

What we tried

In progress

Gaussian Filters

Gabor Filters

What we tried

In progress

Gaussian Filters

Gabor Filters

What we tried

In progress

Gaussian Filters

Gabor Filters

What we tried

In progress

Gaussian Filters

Gabor Filters

Choosing a learner

K-Nearest Neighbor

Did relatively well (Accuracy ≈ 60.)Generating the confusion matrix is very easy when KNN is used

One-Vs-All: We learn classifiers for each scene category, obtaining Wj

for scene category j .Logistic Regression

Could not learn very well on the train sample.

Performed very well.But Recall rate was poor (Due to imbalance in train set.)

Modified SVM with 2 penalty terms

Barely hurt Precision. Recall rate improved.

Choosing a learner

K-Nearest Neighbor

for scene category j .

Logistic Regression

Choosing a learner

K-Nearest Neighbor

Choosing a learner

K-Nearest Neighbor

Choosing a learner

K-Nearest Neighbor

Choosing a learner

K-Nearest Neighbor

Performed very well.

But Recall rate was poor (Due to imbalance in train set.)

Choosing a learner

K-Nearest Neighbor

Choosing a learner

K-Nearest Neighbor

Metrics

Due to imbalance in the test set, We used Precision and Recall as ourMetrics. After learning in the one-vs-all scheme, Precision and Recall oneach scene was computed.

1 2 3 4 5 6 7 80

Scenes

Precision by Scene

forest

highway insidecity

mountain

opencountry street

tallbuilding

SVM ModNormal SVM

Metrics

1 2 3 4 5 6 7 80

Scenes

Recall by Scene

forest

highway

insidecity

mountainopencountry

street

tallbuilding

SVM ModNormal SVM

Moving on: From One-vs-all to an All-inclusive scheme

The goal is to predict the type of scene to which an image belongs.

Use a multiclass SVM

Use a Hierarchical scheme

Use a Voting Scheme (Motivation: Next slide)

Classification Issues

In the One-vs-All scheme, an image could be labeled into at least 2categories i.e for some test image i , Wj ·Xi + bj > 0 for at least some 2 j ′s(scenes). Or, an image may not be classified into any of the scenes at all

In such cases, we propose the following voting method rule.

Choose Scene j s.t j = argmaxj Wj · X + bj

An Instance

Tall Building

Figure: Classified into 2 scenes

Table: Discriminant values for all 8scenes

Scene W · X + b

Coast -41.2944Forest -15.6873

Highway -17.8875InsideCity 0.1614Mountain -16.4401

OpenCountry -46.2454Street -3.1908

TallBuilding 23.0118

Results: Confusion Matrix

c f h ic m oc s tb

Coast 192 3 14 0 9 41 1 0

Forest 1 201 0 0 15 7 2 2

Highway 13 0 117 6 7 16 1 0

InsideCity 3 3 3 154 2 2 18 23

Mountain 5 17 8 6 188 44 0 6

OpenCountry 55 10 13 7 22 200 2 1

Street 0 3 5 14 10 3 151 6

TallBuilding 3 0 4 31 11 9 8 190

Accuracy ≈ 74%

Discussions & Conclusions

The choice and number of feature vectors for images has an effect onthe performance of a classifier

Practical Considerations has to be made when choosing the learningalgorithm for image classification

In a One-vs-all learning scheme, new rules may have to be devised(this is extensible to other learning problems)

Thank you

Thank you.

Questions?

References

A. Torralba, K.P. Murphy, W.T. Freeman, M.A. Rubin(2003)

Context-based Vision System for Place and Object Recognition

AI Memo 12(3),2003-005.

A. Bosch, Andrew Zisserman, Xavier Munoz(2008)

Scene Classification Using a Hybrid Generative/Discriminative Approach

IEEE Transactions on Pattern Analysis and Machine Learning vol. No. 4,April 2008.

Aude Oliva, Antonio Torralba (2001)

Modeling the Shape of the Scene: A Holistic Representation of the SpatialEnvelope

International Journal of Computer Vision 42(3), 147-175 2001.

David G. Lowe(2004)

Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision 60(2), p.91-110 2004.

ImgeClassification

Documents

Transcript of ImgeClassification