Speech data compression through sparse coding of innovations ...
A Friendly Guide To Sparse Coding
-
Upload
shao-chuan-wang -
Category
Education
-
view
9.866 -
download
3
Transcript of A Friendly Guide To Sparse Coding
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
A Friendly Guide To Sparse Coding
Shao-Chuan Wang
Research Center for Information Technology InnovationAcademia Sinica
E-mail: scwang ASCII(64) ntu.edu.tw
December 3, 2009
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 1 / 18
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
Outline
1 Review of PCA
2 Introducing Sparsity
3 Solving the Optimization Problem
4 Learning Dictionary
5 Applications
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 2 / 18
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
PCA Review
x ∈ <m, D = [d1, d2, d3, ...dp] ∈ <m×p, where dj ∈ <m. If xcan be approximated by the linear combination of D, i.e.,
x ∼ x = Dα, (1)
where α ∈ <p and α is new coordinate in terms of the newbasis D.
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 3 / 18
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
PCA Review
We want x is as close as possible to x , i.e., minimizereconstruction error; If we define the error metric, L2 normfor instance,
Error =‖ x − Dα ‖22 (2)
How to get D?
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 4 / 18
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
PCA Review
If our goal is to minimize total error, then given a datasetS = {x (i), y (i)}Ni=0...
minD,α
∑i
‖ x (i) − Dα(i) ‖22 (3)
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 5 / 18
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
PCA Review
Without loss of generality, let’s assume dTi dj = δij (For any
vectors spaces, the basis can be orthonormalized byGram-Schmidt process), from Eq. (1) we know that DT
satisfies DT x = DT x = α.
minD
∑i
‖ x (i) − DDT x (i) ‖22 (4)
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 6 / 18
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
PCA Review
Using Pythagorean theorem, (4) becomes,
minD
∑i
‖ x (i) − DDT x (i) ‖22
= minD
(∑
i
‖ x (i) ‖22 −
∑i
‖ DDT x (i) ‖22)
⇒ D = arg maxD
∑i
‖ DDT x (i) ‖22
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 7 / 18
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
PCA Review
This optimization problem can be rewritten as
D = arg maxD
∑i
‖ DDT x (i) ‖22
= arg maxD
∑j ,k
dTj (
∑i
x (i)(x (i))T )dk ,
and solve the eigenvalue problems of covariance matrix∑i x
(i)(x (i))T .
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 8 / 18
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
Introducing Sparsity
How about regularization?
minD,α
∑i
‖ x (i) − Dα(i) ‖22 +λψ(α), λ ≥ 0,
where λψ(α) is called regularization, or sparsity, or priorterm, and λ is the strength of regularization. Intuitively,ψ(α) is a term to ”confine” the ”quota” of αi and thereforemake α ”sparse”. In fact, regularized linear regression alsointroduces the sparsity on θ coefficients.
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 9 / 18
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
Introducing Sparsity
Hence, we can conclude that sparse coding is a moregeneralized form of principle component analysis. (PCA +Sparsity = Sparse PCA (Zou et al., 2004)). dT
i dj may 6= 0.Also if m = p, then no dimension ”reduction” anymore, andonly sparsity affect the basis. Or even, we can make p > m,using an over-complete basis and let sparsity dominate Dand α.
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 10 / 18
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
Solve the Optimization Problem
How to solve the optimization problem? ⇒ Too Hard!.Hence, we assume D is known first (i.e., designed D). Twogreedy algorithms are the most popular:
Matching Pursuit
Orthogonal Matching Pursuit
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 11 / 18
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
Matching Pursuit
minα∈<p
‖ x − Dα︸ ︷︷ ︸r
‖22 s.t. ‖α‖0 ≤ L (5)
1: α← 0.2: r ← x (residual).3: while ‖α‖0 < L do
Pick the element who correlates the most with theresidual.
i ← arg maxi=1,...,p ‖ dTi r ‖
Subtract the contribution and update α
α[i ]← α[i ] + dTi
r
r ← r − (dTi
r)di
end whileSparse Coding : Shao-Chuan Wang (Academia Sinica) 12 / 18
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
Orthogonal Matching Pursuit
minα∈<p
‖ x − Dα︸ ︷︷ ︸r
‖22 s.t. ‖α‖0 ≤ L (6)
1: Γ = ø.2: while ‖α‖0 < L do
Pick the element that most reduces the objective
i ← arg mini∈ΓC {minα′ ‖x − DΓ⋃{i}α
′‖22}
Update the active set: Γ← Γ⋃{i}.
Update α and the residual
αΓ ← (DTΓ DΓ)−1DΓ
T x ,
r ← x − DαΓ.
end whileSparse Coding : Shao-Chuan Wang (Academia Sinica) 13 / 18
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
Learning Dictionary
How do we learn D from the data?
minD,α
∑i
‖ x (i) − Dα(i) ‖22 +λ‖α‖0,1,2, λ ≥ 0, (7)
Brute force
K-means-like
FOCUSS (K. Engan et al., 2003)K-SVD (M. Aharon et al., 2005)
Online Dictionary Learning (J. Mairal et al., 2009)
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 14 / 18
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
K-SVD (M. Aharon et al., 2005)
1: Initialize D ∈ <m×k with random normalized dictionary;2: Repeat until convergence {
Sparse Coding Stage:Use pursuit algorithm to compute sparse code α(i) of x (i)
Codebook Update Stage:For j = 1, 2, ..., k do {
Define the cluster of examples that use dj
ω ← {i | 1 ≤ i ≤ M, α(i)[j ] 6= 0}.For each i ∈ ω do r (i) ← x (i) − Dα(i).
d , β ← arg mind ′,β∈<|ω|
∑ı∈ω‖r (i) + α(i)[j ]dj − d ′β‖2
2,
dj ← d , and replace α(i)[j ] 6= 0 with β.
}}
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 15 / 18
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
Applications
Image De-noise(Roth and Black,2009)
Edge Detection (J.Marial et al., 2008)
Image In-painting(Roth and Black,2009)
Super-resolution(Yang et al, 2008)
Signal Compression(in replace of VQusing K-means)
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 16 / 18
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
Applications
Image De-noise(Roth and Black,2009)
Edge Detection (J.Marial et al., 2008)
Image In-painting(Roth and Black,2009)
Super-resolution(Yang et al, 2008)
Signal Compression(in replace of VQusing K-means)
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 16 / 18
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
Applications
Image De-noise(Roth and Black,2009)
Edge Detection (J.Marial et al., 2008)
Image In-painting(Roth and Black,2009)
Super-resolution(Yang et al, 2008)
Signal Compression(in replace of VQusing K-means)
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 16 / 18
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
Applications
Image De-noise(Roth and Black,2009)
Edge Detection (J.Marial et al., 2008)
Image In-painting(Roth and Black,2009)
Super-resolution(Yang et al, 2008)
Signal Compression(in replace of VQusing K-means)
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 16 / 18
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
Applications
Image De-noise(Roth and Black,2009)
Edge Detection (J.Marial et al., 2008)
Image In-painting(Roth and Black,2009)
Super-resolution(Yang et al, 2008)
Signal Compression(in replace of VQusing K-means)
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 16 / 18
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
Bibliography I
H. Zou, T. Hastie, and R. Tibshirani,Sparse Principal Component Analysis. Journal ofComputational and Graphical Statistics, 2004.
K. Kreutz-Delgado, J. F. Murray, B. D. Rao,K. Engan,T.-W. Lee and T. J. Sejnowski,Dictionary learning algorithms for sparse representation.Neural Computation, 2003.
M. Aharon, M. Elad, and A. M. Bruckstein,The K-SVD: An algorithm for designing of overcompletedictionaries for sparse representations. IEEETransactions on Signal Processing, November 2006.
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 17 / 18
Sparse Coding
Shao-Chuan Wang
Review of PCA
IntroducingSparsity
Solving theOptimizationProblem
LearningDictionary
Applications
Bibliography II
S. Roth, M. J. BlackFields of Experts. IJCV, 2009.
J. Mairal, M. Leordeanu, F. Bach, M. Hebert, and J.Ponce,Discriminative Sparse Image Models for Class-SpecificEdge Detection and Image Interpretation. ECCV 2008.
J. Mairal, F. Bach, J. Ponce, and G. Sapiro,Online dictionary learning for sparse coding. ICML 2009.
J. Yang, J. Wright, T. Huang, Y. Ma,Image Super-Resolution as Sparse Representation ofRaw Image Patches. CVPR 2008.
Sparse Coding : Shao-Chuan Wang (Academia Sinica) 18 / 18