ECCV2010: feature learning for image classification, part 3

41
1 Part 3: Image Classification using Sparse Coding: Advanced Topics Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University

Transcript of ECCV2010: feature learning for image classification, part 3

Page 1: ECCV2010: feature learning for image classification, part 3

1

Part 3:Image Classification using

Sparse Coding: Advanced Topics

Kai Yu

Dept. of Media AnalyticsNEC Laboratories America

Andrew Ng

Computer Science Dept.Stanford University

Page 2: ECCV2010: feature learning for image classification, part 3

Outline of Part 3

04/11/23 2

•Why can sparse coding learn good features?- Intuition, topic model view, and geometric view- A theoretical framework: local coordinate coding- Two practical coding methods

•Recent advances in sparse coding for image classification

•Why can sparse coding learn good features?- Intuition, topic model view, and geometric view- A theoretical framework: local coordinate coding- Two practical coding methods

•Recent advances in sparse coding for image classification

Page 3: ECCV2010: feature learning for image classification, part 3

Outline of Part 3

04/11/23 3

•Why can sparse coding learn good features?- Intuition, topic model view, and geometric view- A theoretical framework: local coordinate coding- Two practical coding methods

•Recent advances in sparse coding for image classification

•Why can sparse coding learn good features?- Intuition, topic model view, and geometric view- A theoretical framework: local coordinate coding- Two practical coding methods

•Recent advances in sparse coding for image classification

Page 4: ECCV2010: feature learning for image classification, part 3

Intuition: why sparse coding helps classification?

04/11/23 4

• The coding is a nonlinear feature mapping• Represent data in a higher dimensional space• Sparsity makes prominent patterns more distinctive

• The coding is a nonlinear feature mapping• Represent data in a higher dimensional space• Sparsity makes prominent patterns more distinctive

Figure from http://www.dtreg.com/svm.htm

Page 5: ECCV2010: feature learning for image classification, part 3

A “topic model” view to sparse coding

04/11/23 5

• Each basis is a “direction” or a “topic”.• Sparsity: each datum is a linear combination of only a few bases.• Applicable to image denoising, inpainting, and super-resolution.

• Each basis is a “direction” or a “topic”.• Sparsity: each datum is a linear combination of only a few bases.• Applicable to image denoising, inpainting, and super-resolution.

Both figures adapted from CVPR10 tutorial by F. Bach, J. Mairal, J. Ponce and G. Sapiro

Basis 1

Basis 2

Page 6: ECCV2010: feature learning for image classification, part 3

A geometric view to sparse coding

04/11/23 6

Data manifold

Basis

Data

• Each basis is somewhat like a pseudo data point – “anchor point”• Sparsity: each datum is a sparse combination of neighbor anchors.• The coding scheme explores the manifold structure of data.

• Each basis is somewhat like a pseudo data point – “anchor point”• Sparsity: each datum is a sparse combination of neighbor anchors.• The coding scheme explores the manifold structure of data.

Page 7: ECCV2010: feature learning for image classification, part 3

MNIST Experiment: Classification using SC

04/11/23 7

• 60K training, 10K for test

• Let k=512

• Linear SVM on sparse codes

• 60K training, 10K for test

• Let k=512

• Linear SVM on sparse codes

Try different values

Page 8: ECCV2010: feature learning for image classification, part 3

MNIST Experiment: Lambda = 0.0005

04/11/23 8

Each basis is like a part or direction.

Each basis is like a part or direction.

Page 9: ECCV2010: feature learning for image classification, part 3

MNIST Experiment: Lambda = 0.005

04/11/23 9

Again, each basis is like a part or direction.

Again, each basis is like a part or direction.

Page 10: ECCV2010: feature learning for image classification, part 3

MNIST Experiment: Lambda = 0.05

04/11/23 10

Now, each basis is more like a digit !

Now, each basis is more like a digit !

Page 11: ECCV2010: feature learning for image classification, part 3

MNIST Experiment: Lambda = 0.5

04/11/23 11

Like clustering now!Like clustering now!

Page 12: ECCV2010: feature learning for image classification, part 3

Geometric view of sparse coding

04/11/23 12

Error: 4.54%

• When SC achieves the best classification accuracy, the learned bases are like digits – each basis has a clear local class association.

• Implication: exploring data geometry may be useful for classification.

• When SC achieves the best classification accuracy, the learned bases are like digits – each basis has a clear local class association.

• Implication: exploring data geometry may be useful for classification.

Error: 3.75% Error: 2.64%

Page 13: ECCV2010: feature learning for image classification, part 3

Distribution of coefficients (MNIST)

04/11/23 13

Neighbor bases tend to get nonzero coefficients

Neighbor bases tend to get nonzero coefficients

Page 14: ECCV2010: feature learning for image classification, part 3

Distribution of coefficient (SIFT, Caltech101)

04/11/23 14

Similar observation here!Similar observation here!

Page 15: ECCV2010: feature learning for image classification, part 3

Recap: two different views to sparse coding

04/11/23 15

View 1Discover “topic” components

• Each basis is a “direction”• Sparsity: each datum is a linear combination of several bases.• Related to topic model

View 2Geometric structure of data manifold

• Each basis is an “anchor point”• Sparsity: each datum is a linear combination of neighbor anchors. • Somewhat like a soft VQ (link to BoW)

• Either can be valid for sparse coding under certain circumstances.• View 2 seems to be helpful to sensory data classification.

• Either can be valid for sparse coding under certain circumstances.• View 2 seems to be helpful to sensory data classification.

Page 16: ECCV2010: feature learning for image classification, part 3

Outline of Part 3

04/11/23 16

•Why can sparse coding learn good features?- Intuition, topic model view, and geometric view- A theoretical framework: local coordinate coding- Two practical coding methods

•Recent advances in sparse coding for image classification

•Why can sparse coding learn good features?- Intuition, topic model view, and geometric view- A theoretical framework: local coordinate coding- Two practical coding methods

•Recent advances in sparse coding for image classification

Page 17: ECCV2010: feature learning for image classification, part 3

Key theoretical question

04/11/23 17

• Why unsupervised feature learning via sparse coding can help classification?

• Why unsupervised feature learning via sparse coding can help classification?

Page 18: ECCV2010: feature learning for image classification, part 3

The image classification setting for analysis

Sparse Coding

Dense local feature

Linear Pooling

Linear SVM

Function on images Function on patches

Implication:Learning an image classifier is a matter of learning nonlinear functions on patches.

Page 19: ECCV2010: feature learning for image classification, part 3

Illustration: nonlinear learning via local coding

04/11/23 19

data points bases

locally linear

Page 20: ECCV2010: feature learning for image classification, part 3

How to learn a nonlinear function?

04/11/23

Step 1: Learning the dictionary from unlabeled dataStep 1: Learning the dictionary from unlabeled data

Page 21: ECCV2010: feature learning for image classification, part 3

How to learn a nonlinear function?

04/11/23

Step 2: Use the dictionary to encode dataStep 2: Use the dictionary to encode data

Page 22: ECCV2010: feature learning for image classification, part 3

• Nonlinear local learning via learning a global linear function.• Nonlinear local learning via learning a global linear function.

How to learn a nonlinear function?

04/11/23

Global linear weights to be

learnedSparse codes

of data

Step 3: Estimate parameters Step 3: Estimate parameters

Page 23: ECCV2010: feature learning for image classification, part 3

Local Coordinate Coding (LCC): connect coding to nonlinear function learning

04/11/23 23

Locality termFunction approximation

error

Coding error

If f(x) is (alpha, beta)-Lipschitz smooth

Yu et al NIPS-09

The key message: A good coding scheme should1. have a small coding error,2. and also be sufficiently local

Page 24: ECCV2010: feature learning for image classification, part 3

Outline of Part 3

04/11/23 24

•Why can sparse coding learn good features?- Intuition, topic model view, and geometric view- A theoretical framework: local coordinate coding- Two practical coding methods

•Recent advances in sparse coding for image classification

•Why can sparse coding learn good features?- Intuition, topic model view, and geometric view- A theoretical framework: local coordinate coding- Two practical coding methods

•Recent advances in sparse coding for image classification

Page 25: ECCV2010: feature learning for image classification, part 3

Application of LCC theory

04/11/23 25

• Fast Implementation with a large dictionary

• A simple geometric way to improve BoW

• Fast Implementation with a large dictionary

• A simple geometric way to improve BoW

Wang et al, CVPR 10

Zhou et al, ECCV 10

Page 26: ECCV2010: feature learning for image classification, part 3

Application of LCC theory

04/11/23 26

• Fast Implementation with a large dictionary

• A simple geometric way to improve BoW

• Fast Implementation with a large dictionary

• A simple geometric way to improve BoW

Page 27: ECCV2010: feature learning for image classification, part 3

The larger dictionary, the higher accuracy, but also the higher computation cost

04/11/23 27

The same observation for Caltech-256, PASCAL, ImageNet, … The same observation for Caltech-256, PASCAL, ImageNet, …

Yu et al NIPS-09

Yang et al CVPR 09

Page 28: ECCV2010: feature learning for image classification, part 3

Locality-constrained linear coding a fast implementation of LCC

04/11/23 28

• Dictionary Learning: k-means (or hierarchical k-means)• Dictionary Learning: k-means (or hierarchical k-means)

• Coding for X, Step 1 – ensure locality: find the K nearest bases

Step 2 – ensure low coding error:

Wang et al, CVPR 10

Page 29: ECCV2010: feature learning for image classification, part 3

Competitive in accuracy, cheap in computation

04/11/23 29

Wang et al CVPR 10

Sparse coding

Significantly better than sparse coding

This is one of the two major algorithms applied by NEC-UIUC team to achieve the No.1 position in ImageNet challenge 2010!

Comparable with sparse coding

Page 30: ECCV2010: feature learning for image classification, part 3

Application of the LCC theory

04/11/23 30

• Fast Implementation with a large dictionary

• A simple geometric way to improve BoW

• Fast Implementation with a large dictionary

• A simple geometric way to improve BoW

Page 31: ECCV2010: feature learning for image classification, part 3

Interpret “BoW + linear classifier”

data pointscluster centers

Piece-wise local constant (zero-order)

Page 32: ECCV2010: feature learning for image classification, part 3

Super-vector coding: a simple geometric way to improve BoW (VQ)

data pointscluster centers

Piecewise local linear (first-order)Local tangent

Zhou et al, ECCV 10

Page 33: ECCV2010: feature learning for image classification, part 3

04/11/23 33

Quantization error

Function approximation error

If f(x) is beta-Lipschitz smooth, and

Super-vector coding: a simple geometric way to improve BoW (VQ)

Local tangent

Page 34: ECCV2010: feature learning for image classification, part 3

04/11/23

Global linear weights to be

learned

Super-vector

codes of data

Let be the VQ coding of Let be the VQ coding of

Super-vector coding: learning nonlinear function via a global linear model

This is one of the two major algorithms applied by NEC-UIUC team to achieve the No.1 position in PASCAL VOC 2009!

Page 35: ECCV2010: feature learning for image classification, part 3

Summary of Geometric Coding Methods

Super-vector CodingVector Quantization(BoW)

(Fast) Local Coordinate Coding

• All lead to higher-dimensional, sparse, and localized coding

• All explore geometric structure of data

• New coding methods are suitable for linear classifiers.

• Their implementations are quite straightforward.

• All lead to higher-dimensional, sparse, and localized coding

• All explore geometric structure of data

• New coding methods are suitable for linear classifiers.

• Their implementations are quite straightforward.

Page 36: ECCV2010: feature learning for image classification, part 3

Things not covered here

04/11/23 36

• Improved LCC using Local Tangent, Yu & Zhang, ICML10

• Mixture of Sparse Coding, Yang et al ECCV 10

• Deep Coding Network, Lin et al NIPS 10

• Pooling methods• Max-pooling works well in practice, but appears to be ad-hoc.

• An interesting analysis on max-pooling, Boureau et al. ICML 2010

• We are working on a linear pooling method, which has a similar effect as max-pooling. Some preliminary results already in the super-vector coding paper, Zhou et al, ECCV2010.

• Improved LCC using Local Tangent, Yu & Zhang, ICML10

• Mixture of Sparse Coding, Yang et al ECCV 10

• Deep Coding Network, Lin et al NIPS 10

• Pooling methods• Max-pooling works well in practice, but appears to be ad-hoc.

• An interesting analysis on max-pooling, Boureau et al. ICML 2010

• We are working on a linear pooling method, which has a similar effect as max-pooling. Some preliminary results already in the super-vector coding paper, Zhou et al, ECCV2010.

Page 37: ECCV2010: feature learning for image classification, part 3

Outline of Part 3

04/11/23 37

•Why can sparse coding learn good features?- Intuition, topic model view, and geometric view- A theoretical framework: local coordinate coding- Two practical coding methods

•Recent advances in sparse coding for image classification

•Why can sparse coding learn good features?- Intuition, topic model view, and geometric view- A theoretical framework: local coordinate coding- Two practical coding methods

•Recent advances in sparse coding for image classification

Page 38: ECCV2010: feature learning for image classification, part 3

Fast approximation of sparse coding via neural networks

04/11/23 38

Gregor & LeCun, ICML-10

• The method aims at improving sparse coding speed in coding time, not training speed, potentially make sparse coding practical for video.

• Idea: Given a trained sparse coding model, use its input outputs as training data to train a feed-forward model

• They showed a speedup of X20 faster. But not evaluated on real video data.

• The method aims at improving sparse coding speed in coding time, not training speed, potentially make sparse coding practical for video.

• Idea: Given a trained sparse coding model, use its input outputs as training data to train a feed-forward model

• They showed a speedup of X20 faster. But not evaluated on real video data.

Page 39: ECCV2010: feature learning for image classification, part 3

Group sparse coding

04/11/23 39

• Sparse coding is on patches, the image representation is unlikely sparse.

• Idea: enforce joint sparsity via L1/L2 norm on sparse codes of a group of patches.

• The resultant image representation becomes sparse, which can save the memory cost, but the classification accuracy decreases.

• Sparse coding is on patches, the image representation is unlikely sparse.

• Idea: enforce joint sparsity via L1/L2 norm on sparse codes of a group of patches.

• The resultant image representation becomes sparse, which can save the memory cost, but the classification accuracy decreases.

Bengio et al, NIPS 09

Page 40: ECCV2010: feature learning for image classification, part 3

Learning hierarchical dictionary

04/11/23 40

Jenatton, Mairal, Obozinski, and Bach, 2010

A node can be active only if its ancestors are active.

Page 41: ECCV2010: feature learning for image classification, part 3

Reference

04/11/23 41

1. Image Classification using Super-Vector Coding of Local Image Descriptors, Xi Zhou, Kai Yu, Tong Zhang, and Thomas Huang. In ECCV 2010.

2. Efficient Highly Over-Complete Sparse Coding using a Mixture Model, Jianchao Yang, Kai Yu, and Thomas Huang. In ECCV 2010.

3. Learning Fast Approximations of Sparse Coding, Karol Gregor and Yann LeCun. In ICML 2010.

4. Improved Local Coordinate Coding using Local Tangents, Kai Yu and Tong Zhang. In ICML 2010.

5. Sparse Coding and Dictionary Learning for Image Analysis, Francis Bach,  Julien Mairal, Jean Ponce, and Guillermo Sapiro. CVPR 2010 Tutorial

6. Supervised translation-invariant sparse coding, Jianchao Yang, Kai Yu, and Thomas Huang, In CVPR 2010.

7. Learning locality-constrained linear coding for image classification, Jingjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas Huang, and Yihong Gong. In CVPR 2010.

8. Group Sparse Coding, Samy Bengio, Fernando Pereira, Yoram Singer, and Dennis  Strelow, In NIPS*2009.

9. Nonlinear learning using local coordinate coding, Kai Yu, Tong Zhang, and Yihong Gong. In NIPS*2009.

10. Linear spatial pyramid matching using sparse coding for image classification, Jianchao Yang, Kai Yu, Yihong Gong, and Thomas Huang. In CVPR 2009.

11. Efficient sparse coding algorithms. Honglak Lee, Alexis Battle, Raina Rajat and Andrew Y.Ng. In NIPS*2007.