A Friendly Guide To Sparse Coding

Sparse Coding

Shao-Chuan Wang

Review of PCA

IntroducingSparsity

Solving theOptimizationProblem

LearningDictionary

Applications

A Friendly Guide To Sparse Coding

Shao-Chuan Wang

Research Center for Information Technology InnovationAcademia Sinica

E-mail: scwang ASCII(64) ntu.edu.tw

December 3, 2009

Sparse Coding : Shao-Chuan Wang (Academia Sinica) 1 / 18

Sparse Coding

Shao-Chuan Wang

Review of PCA

IntroducingSparsity


LearningDictionary

Applications

Outline

1 Review of PCA

2 Introducing Sparsity

3 Solving the Optimization Problem

4 Learning Dictionary

5 Applications


Sparse Coding

Shao-Chuan Wang

Review of PCA

IntroducingSparsity


LearningDictionary

Applications

PCA Review

x ∈ <m, D = [d1, d2, d3, ...dp] ∈ <m×p, where dj ∈ <m. If xcan be approximated by the linear combination of D, i.e.,

x ∼ x = Dα, (1)

where α ∈ <p and α is new coordinate in terms of the newbasis D.


Sparse Coding

Shao-Chuan Wang

Review of PCA

IntroducingSparsity


LearningDictionary

Applications

PCA Review

We want x is as close as possible to x , i.e., minimizereconstruction error; If we define the error metric, L2 normfor instance,

Error =‖ x − Dα ‖22 (2)

How to get D?


Sparse Coding

Shao-Chuan Wang

Review of PCA

IntroducingSparsity


LearningDictionary

Applications

PCA Review

If our goal is to minimize total error, then given a datasetS = {x (i), y (i)}Ni=0...

minD,α

∑i

‖ x (i) − Dα(i) ‖22 (3)


Sparse Coding

Shao-Chuan Wang

Review of PCA

IntroducingSparsity


LearningDictionary

Applications

PCA Review

Without loss of generality, let’s assume dTi dj = δij (For any

vectors spaces, the basis can be orthonormalized byGram-Schmidt process), from Eq. (1) we know that DT

satisfies DT x = DT x = α.

minD

∑i

‖ x (i) − DDT x (i) ‖22 (4)


Sparse Coding

Shao-Chuan Wang

Review of PCA

IntroducingSparsity


LearningDictionary

Applications

PCA Review

Using Pythagorean theorem, (4) becomes,

minD

∑i

‖ x (i) − DDT x (i) ‖22

= minD

(∑

i

‖ x (i) ‖22 −

∑i

‖ DDT x (i) ‖22)

⇒ D = arg maxD

∑i

‖ DDT x (i) ‖22


Sparse Coding

Shao-Chuan Wang

Review of PCA

IntroducingSparsity


LearningDictionary

Applications

PCA Review

This optimization problem can be rewritten as

D = arg maxD

∑i

‖ DDT x (i) ‖22

= arg maxD

∑j ,k

dTj (

∑i

x (i)(x (i))T )dk ,

and solve the eigenvalue problems of covariance matrix∑i x

(i)(x (i))T .


Sparse Coding

Shao-Chuan Wang

Review of PCA

IntroducingSparsity


LearningDictionary

Applications

Introducing Sparsity

How about regularization?

minD,α

∑i

‖ x (i) − Dα(i) ‖22 +λψ(α), λ ≥ 0,

where λψ(α) is called regularization, or sparsity, or priorterm, and λ is the strength of regularization. Intuitively,ψ(α) is a term to ”confine” the ”quota” of αi and thereforemake α ”sparse”. In fact, regularized linear regression alsointroduces the sparsity on θ coefficients.


Sparse Coding

Shao-Chuan Wang

Review of PCA

IntroducingSparsity


LearningDictionary

Applications

Introducing Sparsity

Hence, we can conclude that sparse coding is a moregeneralized form of principle component analysis. (PCA +Sparsity = Sparse PCA (Zou et al., 2004)). dT

i dj may 6= 0.Also if m = p, then no dimension ”reduction” anymore, andonly sparsity affect the basis. Or even, we can make p > m,using an over-complete basis and let sparsity dominate Dand α.


Sparse Coding

Shao-Chuan Wang

Review of PCA

IntroducingSparsity


LearningDictionary

Applications

Solve the Optimization Problem

How to solve the optimization problem? ⇒ Too Hard!.Hence, we assume D is known first (i.e., designed D). Twogreedy algorithms are the most popular:

Matching Pursuit

Orthogonal Matching Pursuit


Sparse Coding

Shao-Chuan Wang

Review of PCA

IntroducingSparsity


LearningDictionary

Applications

Matching Pursuit

minα∈<p

‖ x − Dα︸︷︷︸r

‖22 s.t. ‖α‖0 ≤ L (5)

1: α← 0.2: r ← x (residual).3: while ‖α‖0 < L do

Pick the element who correlates the most with theresidual.

i ← arg maxi=1,...,p ‖ dTi r ‖

Subtract the contribution and update α

α[i ]← α[i ] + dTi

r

r ← r − (dTi

r)di

end whileSparse Coding : Shao-Chuan Wang (Academia Sinica) 12 / 18

Sparse Coding

Shao-Chuan Wang

Review of PCA

IntroducingSparsity


LearningDictionary

Applications

Orthogonal Matching Pursuit

minα∈<p

‖ x − Dα︸︷︷︸r

‖22 s.t. ‖α‖0 ≤ L (6)

1: Γ = ø.2: while ‖α‖0 < L do

Pick the element that most reduces the objective

i ← arg mini∈ΓC {minα′ ‖x − DΓ⋃{i}α

′‖22}

Update the active set: Γ← Γ⋃{i}.

Update α and the residual

αΓ ← (DTΓ DΓ)−1DΓ

T x ,

r ← x − DαΓ.

end whileSparse Coding : Shao-Chuan Wang (Academia Sinica) 13 / 18

Sparse Coding

Shao-Chuan Wang

Review of PCA

IntroducingSparsity


LearningDictionary

Applications

Learning Dictionary

How do we learn D from the data?

minD,α

∑i

‖ x (i) − Dα(i) ‖22 +λ‖α‖0,1,2, λ ≥ 0, (7)

Brute force

K-means-like

FOCUSS (K. Engan et al., 2003)K-SVD (M. Aharon et al., 2005)

Online Dictionary Learning (J. Mairal et al., 2009)


Sparse Coding

Shao-Chuan Wang

Review of PCA

IntroducingSparsity


LearningDictionary

Applications

K-SVD (M. Aharon et al., 2005)

1: Initialize D ∈ <m×k with random normalized dictionary;2: Repeat until convergence {

Sparse Coding Stage:Use pursuit algorithm to compute sparse code α(i) of x (i)

Codebook Update Stage:For j = 1, 2, ..., k do {

Define the cluster of examples that use dj

ω ← {i | 1 ≤ i ≤ M, α(i)[j ] 6= 0}.For each i ∈ ω do r (i) ← x (i) − Dα(i).

d , β ← arg mind ′,β∈<|ω|

∑ı∈ω‖r (i) + α(i)[j ]dj − d ′β‖2

2,

dj ← d , and replace α(i)[j ] 6= 0 with β.

}}


Sparse Coding

Shao-Chuan Wang

Review of PCA

IntroducingSparsity


LearningDictionary

Applications

Applications

Image De-noise(Roth and Black,2009)

Edge Detection (J.Marial et al., 2008)

Image In-painting(Roth and Black,2009)

Super-resolution(Yang et al, 2008)

Signal Compression(in replace of VQusing K-means)


Sparse Coding

Shao-Chuan Wang

Review of PCA

IntroducingSparsity


LearningDictionary

Applications

Bibliography I

H. Zou, T. Hastie, and R. Tibshirani,Sparse Principal Component Analysis. Journal ofComputational and Graphical Statistics, 2004.

K. Kreutz-Delgado, J. F. Murray, B. D. Rao,K. Engan,T.-W. Lee and T. J. Sejnowski,Dictionary learning algorithms for sparse representation.Neural Computation, 2003.

M. Aharon, M. Elad, and A. M. Bruckstein,The K-SVD: An algorithm for designing of overcompletedictionaries for sparse representations. IEEETransactions on Signal Processing, November 2006.


Sparse Coding

Shao-Chuan Wang

Review of PCA

IntroducingSparsity


LearningDictionary

Applications

Bibliography II

S. Roth, M. J. BlackFields of Experts. IJCV, 2009.

J. Mairal, M. Leordeanu, F. Bach, M. Hebert, and J.Ponce,Discriminative Sparse Image Models for Class-SpecificEdge Detection and Image Interpretation. ECCV 2008.

J. Mairal, F. Bach, J. Ponce, and G. Sapiro,Online dictionary learning for sparse coding. ICML 2009.

J. Yang, J. Wright, T. Huang, Y. Ma,Image Super-Resolution as Sparse Representation ofRaw Image Patches. CVPR 2008.


A Friendly Guide To Sparse Coding

Education

Transcript of A Friendly Guide To Sparse Coding