Sparse coding for machine learning, image processing and ...

97
Repr´ esentations parcimonieuses pour le traitement d’image et la vision artificielle Julien Mairal Department of Statistics, University of California, Berkeley INRIA - projet Willow, Paris Journ´ ees ORASIS, Praz-sur-Arly 7 Juin 2011 Julien Mairal Journ´ ees ORASIS 2011 1/97

Transcript of Sparse coding for machine learning, image processing and ...

Page 1: Sparse coding for machine learning, image processing and ...

Representations parcimonieuses pour le

traitement d’image et la vision artificielle

Julien Mairal

Department of Statistics, University of California, BerkeleyINRIA - projet Willow, Paris

Journees ORASIS, Praz-sur-Arly7 Juin 2011

Julien Mairal Journees ORASIS 2011 1/97

Page 2: Sparse coding for machine learning, image processing and ...

Remerciements

Francis Jean Rodolphe Guillaume GuillermoBach Ponce Jenatton Obozinski Sapiro

Miki Andrew Florent Y-Lan WillowElad Zisserman Couzinie-Devy Boureau people

Julien Mairal Journees ORASIS 2011 2/97

Page 3: Sparse coding for machine learning, image processing and ...

What is this lecture about?

Why sparsity, what for and how?

Signal and image processing: Restoration, reconstruction.

Machine learning: Selecting relevant features.

Computer vision: Modelling image patches and imagedescriptors.

Optimization: Solving challenging problems.

Julien Mairal Journees ORASIS 2011 3/97

Page 4: Sparse coding for machine learning, image processing and ...

1 Image Processing Applications

2 Sparse Linear Models and Dictionary Learning

3 Computer Vision Applications

Julien Mairal Journees ORASIS 2011 4/97

Page 5: Sparse coding for machine learning, image processing and ...

1 Image Processing ApplicationsImage DenoisingInpainting, DemosaickingVideo ProcessingOther Applications

2 Sparse Linear Models and Dictionary Learning

3 Computer Vision Applications

Julien Mairal Journees ORASIS 2011 5/97

Page 6: Sparse coding for machine learning, image processing and ...

The Image Denoising Problem

y︸︷︷︸

measurements

= xorig︸︷︷︸

original image

+ w︸︷︷︸noise

Julien Mairal Journees ORASIS 2011 6/97

Page 7: Sparse coding for machine learning, image processing and ...

Sparse representations for image restoration

y︸︷︷︸

measurements

= xorig︸︷︷︸

original image

+ w︸︷︷︸

noise

Energy minimization problem - MAP estimation

E (x) =1

2‖y − x‖22

︸ ︷︷ ︸

relation to measurements

+ Pr(x)︸ ︷︷ ︸

image model (-log prior)

Some classical priors

Smoothness λ‖Lx‖22Total variation λ‖∇x‖21MRF priors

. . .

Julien Mairal Journees ORASIS 2011 7/97

Page 8: Sparse coding for machine learning, image processing and ...

What is a Sparse Linear Model?

Let x in Rm be a signal.

Let D = [d1, . . . ,dp] ∈ Rm×p be a set of

normalized “basis vectors”.We call it dictionary.

D is “adapted” to y if it can represent it with a few basis vectors—thatis, there exists a sparse vector α in R

p such that y ≈ Dα. We call αthe sparse code.

y

︸ ︷︷ ︸

y∈Rm

d1 d2 · · · dp

︸ ︷︷ ︸

D∈Rm×p

α[1]α[2]...

α[p]

︸ ︷︷ ︸

α∈Rp,sparse

Julien Mairal Journees ORASIS 2011 8/97

Page 9: Sparse coding for machine learning, image processing and ...

First Important Idea

Why Sparsity?

A dictionary can be good for representing a class ofsignals, but not for representing white Gaussian noise.

Julien Mairal Journees ORASIS 2011 9/97

Page 10: Sparse coding for machine learning, image processing and ...

The Sparse Decomposition Problem

minα∈Rp

1

2‖y −Dα‖22

︸ ︷︷ ︸

data fitting term

+ λψ(α)︸ ︷︷ ︸

sparsity-inducingregularization

ψ induces sparsity in α. It can be

the ℓ0 “pseudo-norm”. ‖α‖0△

= #{i s.t. α[i ] 6= 0} (NP-hard)

the ℓ1 norm. ‖α‖1△

=∑p

i=1 |α[i ]| (convex),

. . .

This is a selection problem. When ψ is the ℓ1-norm, the problem iscalled Lasso [Tibshirani, 1996] or basis pursuit [Chen et al., 1999]

Julien Mairal Journees ORASIS 2011 10/97

Page 11: Sparse coding for machine learning, image processing and ...

Sparse representations for image restoration

Designed dictionaries

[Haar, 1910], [Zweig, Morlet, Grossman ∼70s], [Meyer, Mallat,Daubechies, Coifman, Donoho, Candes ∼80s-today]. . . (see [Mallat,1999])Wavelets, Curvelets, Wedgelets, Bandlets, . . . lets

Learned dictionaries of patches

[Olshausen and Field, 1997], [Engan et al., 1999], [Lewicki andSejnowski, 2000], [Aharon et al., 2006] , [Roth and Black, 2005], [Leeet al., 2007]

minαi ,D∈D

i

1

2‖yi −Dαi‖

22

︸ ︷︷ ︸

reconstruction

+λψ(αi )︸ ︷︷ ︸

sparsity

ψ(α) = ‖α‖0 (“ℓ0 pseudo-norm”)

ψ(α) = ‖α‖1 (ℓ1 norm)

Julien Mairal Journees ORASIS 2011 11/97

Page 12: Sparse coding for machine learning, image processing and ...

Sparse representations for image restoration

Solving the denoising problem

[Elad and Aharon, 2006]

Extract all overlapping 8× 8 patches yi .

Solve a matrix factorization problem:

minαi ,D∈D

n∑

i=1

1

2‖yi −Dαi‖

22

︸ ︷︷ ︸

reconstruction

+λψ(αi)︸ ︷︷ ︸

sparsity

,

with n > 100, 000

Average the reconstruction of each patch.

Julien Mairal Journees ORASIS 2011 12/97

Page 13: Sparse coding for machine learning, image processing and ...

Sparse representations for image restorationK-SVD: [Elad and Aharon, 2006]

Figure: Dictionary trained on a noisy version of the imageboat.

Julien Mairal Journees ORASIS 2011 13/97

Page 14: Sparse coding for machine learning, image processing and ...

Sparse representations for image restorationGrayscale vs color image patches

Julien Mairal Journees ORASIS 2011 14/97

Page 15: Sparse coding for machine learning, image processing and ...

Sparse representations for image restoration

Inpainting, Demosaicking

minD∈D,α

i

1

2‖βi (yi −Dαi )‖

22 + λiψ(αi )

RAW Image Processing

Whitebalance.Black

substraction.

Denoising

Demosaicking

Conversionto sRGB.Gamma

correction.

Julien Mairal Journees ORASIS 2011 15/97

Page 16: Sparse coding for machine learning, image processing and ...

Sparse representations for image restoration[Mairal, Sapiro, and Elad, 2008d]

Julien Mairal Journees ORASIS 2011 16/97

Page 17: Sparse coding for machine learning, image processing and ...

Sparse representations for image restorationInpainting, [Mairal, Elad, and Sapiro, 2008b]

Julien Mairal Journees ORASIS 2011 17/97

Page 18: Sparse coding for machine learning, image processing and ...

Sparse representations for image restorationInpainting, [Mairal, Elad, and Sapiro, 2008b]

Julien Mairal Journees ORASIS 2011 18/97

Page 19: Sparse coding for machine learning, image processing and ...

Sparse representations for video restoration

Key ideas for video processing

[Protter and Elad, 2009]

Using a 3D dictionary.

Processing of many frames at the same time.

Dictionary propagation.

Julien Mairal Journees ORASIS 2011 19/97

Page 20: Sparse coding for machine learning, image processing and ...

Sparse representations for image restorationInpainting, [Mairal, Sapiro, and Elad, 2008d]

Figure: Inpainting results.

Julien Mairal Journees ORASIS 2011 20/97

Page 21: Sparse coding for machine learning, image processing and ...

Sparse representations for image restorationInpainting, [Mairal, Sapiro, and Elad, 2008d]

Figure: Inpainting results.

Julien Mairal Journees ORASIS 2011 21/97

Page 22: Sparse coding for machine learning, image processing and ...

Sparse representations for image restorationInpainting, [Mairal, Sapiro, and Elad, 2008d]

Figure: Inpainting results.

Julien Mairal Journees ORASIS 2011 22/97

Page 23: Sparse coding for machine learning, image processing and ...

Sparse representations for image restorationInpainting, [Mairal, Sapiro, and Elad, 2008d]

Figure: Inpainting results.

Julien Mairal Journees ORASIS 2011 23/97

Page 24: Sparse coding for machine learning, image processing and ...

Sparse representations for image restorationInpainting, [Mairal, Sapiro, and Elad, 2008d]

Figure: Inpainting results.

Julien Mairal Journees ORASIS 2011 24/97

Page 25: Sparse coding for machine learning, image processing and ...

Sparse representations for image restorationColor video denoising, [Mairal, Sapiro, and Elad, 2008d]

Figure: Denoising results. σ = 25

Julien Mairal Journees ORASIS 2011 25/97

Page 26: Sparse coding for machine learning, image processing and ...

Sparse representations for image restorationColor video denoising, [Mairal, Sapiro, and Elad, 2008d]

Figure: Denoising results. σ = 25

Julien Mairal Journees ORASIS 2011 26/97

Page 27: Sparse coding for machine learning, image processing and ...

Sparse representations for image restorationColor video denoising, [Mairal, Sapiro, and Elad, 2008d]

Figure: Denoising results. σ = 25

Julien Mairal Journees ORASIS 2011 27/97

Page 28: Sparse coding for machine learning, image processing and ...

Sparse representations for image restorationColor video denoising, [Mairal, Sapiro, and Elad, 2008d]

Figure: Denoising results. σ = 25

Julien Mairal Journees ORASIS 2011 28/97

Page 29: Sparse coding for machine learning, image processing and ...

Sparse representations for image restorationColor video denoising, [Mairal, Sapiro, and Elad, 2008d]

Figure: Denoising results. σ = 25

Julien Mairal Journees ORASIS 2011 29/97

Page 30: Sparse coding for machine learning, image processing and ...

Digital ZoomingCouzinie-Devy, 2010, Original

Julien Mairal Journees ORASIS 2011 30/97

Page 31: Sparse coding for machine learning, image processing and ...

Digital ZoomingCouzinie-Devy, 2010, Bicubic

Julien Mairal Journees ORASIS 2011 31/97

Page 32: Sparse coding for machine learning, image processing and ...

Digital ZoomingCouzinie-Devy, 2010, Proposed method

Julien Mairal Journees ORASIS 2011 32/97

Page 33: Sparse coding for machine learning, image processing and ...

Digital ZoomingCouzinie-Devy, 2010, Original

Julien Mairal Journees ORASIS 2011 33/97

Page 34: Sparse coding for machine learning, image processing and ...

Digital ZoomingCouzinie-Devy, 2010, Bicubic

Julien Mairal Journees ORASIS 2011 34/97

Page 35: Sparse coding for machine learning, image processing and ...

Digital ZoomingCouzinie-Devy, 2010, Proposed approach

Julien Mairal Journees ORASIS 2011 35/97

Page 36: Sparse coding for machine learning, image processing and ...

Inverse half-toningOriginal

Julien Mairal Journees ORASIS 2011 36/97

Page 37: Sparse coding for machine learning, image processing and ...

Inverse half-toningReconstructed image

Julien Mairal Journees ORASIS 2011 37/97

Page 38: Sparse coding for machine learning, image processing and ...

Inverse half-toningOriginal

Julien Mairal Journees ORASIS 2011 38/97

Page 39: Sparse coding for machine learning, image processing and ...

Inverse half-toningReconstructed image

Julien Mairal Journees ORASIS 2011 39/97

Page 40: Sparse coding for machine learning, image processing and ...

Inverse half-toningOriginal

Julien Mairal Journees ORASIS 2011 40/97

Page 41: Sparse coding for machine learning, image processing and ...

Inverse half-toningReconstructed image

Julien Mairal Journees ORASIS 2011 41/97

Page 42: Sparse coding for machine learning, image processing and ...

Inverse half-toningOriginal

Julien Mairal Journees ORASIS 2011 42/97

Page 43: Sparse coding for machine learning, image processing and ...

Inverse half-toningReconstructed image

Julien Mairal Journees ORASIS 2011 43/97

Page 44: Sparse coding for machine learning, image processing and ...

Inverse half-toningOriginal

Julien Mairal Journees ORASIS 2011 44/97

Page 45: Sparse coding for machine learning, image processing and ...

Inverse half-toningReconstructed image

Julien Mairal Journees ORASIS 2011 45/97

Page 46: Sparse coding for machine learning, image processing and ...

Important messages

Patch-based approaches are achieving state-of-the-art results formany image processing task.

Dictionary Learning adapts to the data you want to restore.

Dictionary Learning is well adapted to data that admit sparserepresentation. Sparsity is for sparse data only.

Julien Mairal Journees ORASIS 2011 46/97

Page 47: Sparse coding for machine learning, image processing and ...

Next topics

A bit of machine learning.

Why does the ℓ1-norm induce sparsity?

Some properties of the Lasso.

Links between dictionary learning and matrix factorizationtechniques.

A simple algorithm for learning dictionaries.

Julien Mairal Journees ORASIS 2011 47/97

Page 48: Sparse coding for machine learning, image processing and ...

1 Image Processing Applications

2 Sparse Linear Models and Dictionary LearningThe machine learning point of viewWhy does the ℓ1-norm induce sparsity?Dictionary Learning and Matrix Factorization

3 Computer Vision Applications

Julien Mairal Journees ORASIS 2011 48/97

Page 49: Sparse coding for machine learning, image processing and ...

Sparse Linear Model: Machine Learning Point of View

Let (y i , xi )ni=1 be a training set, where the vectors xi are in Rp and are

called features. The scalars y i are in

{−1,+1} for binary classification problems.

{1, . . . ,N} for multiclass classification problems.

R for regression problems.

In a linear model, on assumes a relation y ≈ w⊤x (or y ≈ sign(w⊤x)),and solves

minw∈Rp

1

n

n∑

i=1

ℓ(y i ,w⊤xi )

︸ ︷︷ ︸

data-fitting

+ λψ(w)︸ ︷︷ ︸

regularization

.

Julien Mairal Journees ORASIS 2011 49/97

Page 50: Sparse coding for machine learning, image processing and ...

Sparse Linear Models: Machine Learning Point of View

A few examples:

Ridge regression: minw∈Rp

1

2n

n∑

i=1

(y i −w⊤xi )2 + λ‖w‖22.

Linear SVM: minw∈Rp

1

n

n∑

i=1

max(0, 1− y iw⊤xi ) + λ‖w‖22.

Logistic regression: minw∈Rp

1

n

n∑

i=1

log(

1 + e−y iw⊤xi)

+ λ‖w‖22.

The squared ℓ2-norm induces smoothness in w. When one knows inadvance that w should be sparse, one should use a sparsity-inducing

regularization such as the ℓ1-norm. [Chen et al., 1999, Tibshirani, 1996]

The purpose of the regularization is to add additional a-priori

knowledge in the regularization.

Julien Mairal Journees ORASIS 2011 50/97

Page 51: Sparse coding for machine learning, image processing and ...

Sparse Linear Models: the Lasso

Signal processing: D is a dictionary in Rn×p,

minα∈Rp

1

2‖y −Dα‖22 + λ‖α‖1.

Machine Learning:

minw∈Rp

1

2n

n∑

i=1

(y i − xi⊤w)2 + λ‖w‖1 = minw∈Rp

1

2n‖y−X⊤w‖22 + λ‖w‖1,

with X△

= [x1, . . . , xn], and y△

= [y1, . . . , yn]⊤.

Useful tool in signal processing, machine learning, statistics,. . . as long asone wishes to select features.

Julien Mairal Journees ORASIS 2011 51/97

Page 52: Sparse coding for machine learning, image processing and ...

Why does the ℓ1-norm induce sparsity?Exemple: quadratic problem in 1D

minα∈R

1

2(y − α)2 + λ|α|

Piecewise quadratic function with a kink at zero.

Derivative at 0+: g+ = −y + λ and 0−: g− = −y − λ.

Optimality conditions. α is optimal iff:

|α| > 0 and (y − α) + λ sign(α) = 0

α = 0 and g+ ≥ 0 and g− ≤ 0

The solution is a soft-thresholding:

α⋆ = sign(y)(|y | − λ)+.

Julien Mairal Journees ORASIS 2011 52/97

Page 53: Sparse coding for machine learning, image processing and ...

Why does the ℓ1-norm induce sparsity?

y

α⋆

(a) soft-thresholding operator

y

α⋆

(b) hard-thresholding operator

Julien Mairal Journees ORASIS 2011 53/97

Page 54: Sparse coding for machine learning, image processing and ...

Why does the ℓ1-norm induce sparsity?Analysis of the norms in 1D

ψ(α) = α2

ψ′(α) = 2α

ψ(α) = |α|

ψ′−(α) = −1, ψ′

+(α) = 1

The gradient of the ℓ2-norm vanishes when α get close to 0. On itsdifferentiable part, the norm of the gradient of the ℓ1-norm is constant.

Julien Mairal Journees ORASIS 2011 54/97

Page 55: Sparse coding for machine learning, image processing and ...

Why does the ℓ1-norm induce sparsity?Physical illustration

E1 = 0 E1 = 0

y0

Julien Mairal Journees ORASIS 2011 55/97

Page 56: Sparse coding for machine learning, image processing and ...

Why does the ℓ1-norm induce sparsity?Physical illustration

E1 =k12 (y0 − y)2

E2 =k22 y

2 y

y

E1 =k12 (y0 − y)2

E2 = mgy

Julien Mairal Journees ORASIS 2011 56/97

Page 57: Sparse coding for machine learning, image processing and ...

Why does the ℓ1-norm induce sparsity?Physical illustration

E1 =k12 (y0 − y)2

E2 =k22 y

2 y

y = 0 !!

E1 =k12 (y0 − y)2

E2 = mgy

Julien Mairal Journees ORASIS 2011 57/97

Page 58: Sparse coding for machine learning, image processing and ...

Why does the ℓ1-norm induce sparsity?Geometric explanation

x

y

x

y

minα∈Rp

1

2‖y −Dα‖22 + λ‖α‖1

minα∈Rp

‖y −Dα‖22 s.t. ‖α‖1 ≤ T .

Julien Mairal Journees ORASIS 2011 58/97

Page 59: Sparse coding for machine learning, image processing and ...

Important property of the LassoPiecewise linearity of the regularization path

0 1 2 3 4−0.5

0

0.5

1

1.5

λ

co

eff

icie

nt

va

lue

s

α1

α2

α3

α4

α5

Figure: Regularization path of the Lasso

min1‖y −Dα‖2 + λ‖α‖ .

Julien Mairal Journees ORASIS 2011 59/97

Page 60: Sparse coding for machine learning, image processing and ...

Optimization for Dictionary Learning

minα∈Rp×n

D∈D

n∑

i=1

1

2‖yi −Dαi‖

22 + λ‖αi‖1

D△

= {D ∈ Rm×p s.t. ∀j = 1, . . . , p, ‖dj‖2 ≤ 1}.

Classical optimization alternates between D and α.

Good results, but slow!

Instead use online learning [Mairal et al., 2009]

Julien Mairal Journees ORASIS 2011 60/97

Page 61: Sparse coding for machine learning, image processing and ...

Optimization for Dictionary LearningInpainting a 12-Mpixel photograph

Julien Mairal Journees ORASIS 2011 61/97

Page 62: Sparse coding for machine learning, image processing and ...

Optimization for Dictionary LearningInpainting a 12-Mpixel photograph

Julien Mairal Journees ORASIS 2011 62/97

Page 63: Sparse coding for machine learning, image processing and ...

Optimization for Dictionary LearningInpainting a 12-Mpixel photograph

Julien Mairal Journees ORASIS 2011 63/97

Page 64: Sparse coding for machine learning, image processing and ...

Optimization for Dictionary LearningInpainting a 12-Mpixel photograph

Julien Mairal Journees ORASIS 2011 64/97

Page 65: Sparse coding for machine learning, image processing and ...

Matrix Factorization Problems and Dictionary Learning

minα∈Rp×n

D∈D

n∑

i=1

1

2‖yi −Dαi‖

22 + λ‖αi‖1

can be rewritten

minα∈Rp×n

D∈D

1

2‖Y −Dα‖2F + λ‖α‖1,

where Y = [y1, . . . , yn] and α = [α1, . . . ,αn].

Julien Mairal Journees ORASIS 2011 65/97

Page 66: Sparse coding for machine learning, image processing and ...

Matrix Factorization Problems and Dictionary LearningPCA

minα∈Rp×n

D∈Rm×p

1

2‖Y −Dα‖2F s.t. D⊤D = I and αα⊤ is diagonal.

D = [d1, . . . ,dp] are the principal components.

Julien Mairal Journees ORASIS 2011 66/97

Page 67: Sparse coding for machine learning, image processing and ...

Matrix Factorization Problems and Dictionary LearningHard clustering

minα∈{0,1}p×n

D∈Rm×p

1

2‖Y −Dα‖2F s.t. ∀i ∈ {1, . . . , p},

p∑

j=1

αi [j ] = 1.

D = [d1, . . . ,dp] are the centroids of the p clusters.

Julien Mairal Journees ORASIS 2011 67/97

Page 68: Sparse coding for machine learning, image processing and ...

Matrix Factorization Problems and Dictionary LearningSoft clustering

minα∈Rp×n

+

D∈Rm×p

1

2‖Y −Dα‖2F , s.t. ∀i ∈ {1, . . . , p},

p∑

j=1

αi [j ] = 1.

D = [d1, . . . ,dp] are the centroids of the p clusters.

Julien Mairal Journees ORASIS 2011 68/97

Page 69: Sparse coding for machine learning, image processing and ...

Matrix Factorization Problems and Dictionary LearningNon-negative matrix factorization [Lee and Seung, 2001]

minα∈Rp×n

+

D∈Rm×p+

1

2‖Y −Dα‖2F

Julien Mairal Journees ORASIS 2011 69/97

Page 70: Sparse coding for machine learning, image processing and ...

Matrix Factorization Problems and Dictionary LearningNMF+sparsity?

minα∈Rp×n

+

D∈Rm×p+

1

2‖Y −Dα‖2F + λ‖α‖1.

Most of these formulations can be addressed the same types of

algorithms.

Julien Mairal Journees ORASIS 2011 70/97

Page 71: Sparse coding for machine learning, image processing and ...

Matrix Factorization Problems and Dictionary LearningNatural Patches

(a) PCA (b) NNMF (c) DL

Julien Mairal Journees ORASIS 2011 71/97

Page 72: Sparse coding for machine learning, image processing and ...

Matrix Factorization Problems and Dictionary LearningFaces

(d) PCA (e) NNMF (f) DL

Julien Mairal Journees ORASIS 2011 72/97

Page 73: Sparse coding for machine learning, image processing and ...

Important messages

The ℓ1-norm induces sparsity and shrinks the coefficients(soft-thresholding)

The regularization path of the Lasso is piecewise linear.

Learning the dictionary is simple, fast and scalable.

Dictionary learning is related to several matrix factorizationproblems.

Software SPAMS is available for all of this:

www.di.ens.fr/willow/SPAMS/.

Julien Mairal Journees ORASIS 2011 73/97

Page 74: Sparse coding for machine learning, image processing and ...

Next topics: Computer Vision

Intriguing results on the use of dictionary learning for bags of words.

Modelling the local appearance of image patches.

Julien Mairal Journees ORASIS 2011 74/97

Page 75: Sparse coding for machine learning, image processing and ...

1 Image Processing Applications

2 Sparse Linear Models and Dictionary Learning

3 Computer Vision ApplicationsLearning codebooks for image classificationModelling the local appearance of image patches

Julien Mairal Journees ORASIS 2011 75/97

Page 76: Sparse coding for machine learning, image processing and ...

Learning Codebooks for Image Classification

Idea

Replacing Vector Quantization by Learned Dictionaries!

unsupervised: [Yang et al., 2009]

supervised: [Boureau et al., 2010, Yang et al., 2010]

Julien Mairal Journees ORASIS 2011 76/97

Page 77: Sparse coding for machine learning, image processing and ...

Learning Codebooks for Image Classification

Let an image be represented by a set of low-level descriptors yi at Nlocations identified with their indices i = 1, . . . ,N.

hard-quantization:

yi ≈ Dαi , αi ∈ {0, 1}p and

p∑

j=1

αi [j ] = 1

soft-quantization:

αi [j ] =e−β‖yi−dj‖

22

∑pk=1 e

−β‖yi−dk‖22

sparse coding:

yi ≈ Dαi , αi = argminα

1

2‖yi −Dα‖22 + λ‖α‖1

Julien Mairal Journees ORASIS 2011 77/97

Page 78: Sparse coding for machine learning, image processing and ...

Learning Codebooks for Image ClassificationTable from Boureau et al. [2010]

Yang et al. [2009] have won the PASCAL VOC’09 challenge using thiskind of technique.

Julien Mairal Journees ORASIS 2011 78/97

Page 79: Sparse coding for machine learning, image processing and ...

Learning dictionaries with a discriminative cost function

Idea:

Let us consider 2 sets S−, S+ of signals representing 2 different classes.Each set should admit a dictionary best adapted to its reconstruction.

Classification procedure for a signal y ∈ Rn:

min(R⋆(y,D−),R⋆(y,D+))

whereR⋆(y,D) = min

α∈Rp‖y −Dα‖22 s.t. ‖α‖0 ≤ L.

“Reconstructive” training{

minD−

i∈S−R⋆(yi ,D−)

minD+

i∈S+R⋆(yi ,D+)

[Grosse et al., 2007], [Huang and Aviyente, 2006],[Sprechmann et al., 2010] for unsupervised clustering (CVPR ’10)

Julien Mairal Journees ORASIS 2011 79/97

Page 80: Sparse coding for machine learning, image processing and ...

Learning dictionaries with a discriminative cost function

“Discriminative” training

[Mairal, Bach, Ponce, Sapiro, and Zisserman, 2008a]

minD−,D+

i

D(

λzi(R⋆(yi ,D−)− R⋆(yi ,D+)

))

,

where zi ∈ {−1,+1} is the label of yi .

Logistic regression function

Julien Mairal Journees ORASIS 2011 80/97

Page 81: Sparse coding for machine learning, image processing and ...

Learning dictionaries with a discriminative cost functionExamples of dictionaries

Top: reconstructive, Bottom: discriminative, Left: Bicycle, Right:Background.

Julien Mairal Journees ORASIS 2011 81/97

Page 82: Sparse coding for machine learning, image processing and ...

Learning dictionaries with a discriminative cost functionTexture segmentation

Julien Mairal Journees ORASIS 2011 82/97

Page 83: Sparse coding for machine learning, image processing and ...

Learning dictionaries with a discriminative cost functionTexture segmentation

Julien Mairal Journees ORASIS 2011 83/97

Page 84: Sparse coding for machine learning, image processing and ...

Learning dictionaries with a discriminative cost functionPixelwise classification

Julien Mairal Journees ORASIS 2011 84/97

Page 85: Sparse coding for machine learning, image processing and ...

Learning dictionaries with a discriminative cost functionweakly-supervised pixel classification

Julien Mairal Journees ORASIS 2011 85/97

Page 86: Sparse coding for machine learning, image processing and ...

Application to edge detection and classification[Mairal, Leordeanu, Bach, Hebert, and Ponce, 2008c]

Good edges Bad edges

Julien Mairal Journees ORASIS 2011 86/97

Page 87: Sparse coding for machine learning, image processing and ...

Application to edge detection and classificationBerkeley segmentation benchmark

Raw edge detection on the right

Julien Mairal Journees ORASIS 2011 87/97

Page 88: Sparse coding for machine learning, image processing and ...

Application to edge detection and classificationBerkeley segmentation benchmark

Raw edge detection on the right

Julien Mairal Journees ORASIS 2011 88/97

Page 89: Sparse coding for machine learning, image processing and ...

Application to edge detection and classificationBerkeley segmentation benchmark

Julien Mairal Journees ORASIS 2011 89/97

Page 90: Sparse coding for machine learning, image processing and ...

Application to edge detection and classificationContour-based classifier: [Leordeanu, Hebert, and Sukthankar, 2007]

Is there a bike, a motorbike, a car or a person on thisimage?

Julien Mairal Journees ORASIS 2011 90/97

Page 91: Sparse coding for machine learning, image processing and ...

Application to edge detection and classification

Julien Mairal Journees ORASIS 2011 91/97

Page 92: Sparse coding for machine learning, image processing and ...

Application to edge detection and classificationPerformance gain due to the prefiltering

Ours + [Leordeanu ’07] [Leordeanu ’07] [Winn ’05]

96.8% 89.4% 76.9%

Recognition rates for the same experiment as [Winn et al., 2005] onVOC 2005.

Category Ours+[Leordeanu ’07] [Leordeanu ’07]Aeroplane 71.9% 61.9%

Boat 67.1% 56.4%Cat 82.6% 53.4%Cow 68.7% 59.2%Horse 76.0% 67%

Motorbike 80.6% 73.6%Sheep 72.9% 58.4%

Tvmonitor 87.7% 83.8%

Average 75.9% 64.2 %

Recognition performance at equal error rate for 8 classes on a subset ofimages from Pascal 07.

Julien Mairal Journees ORASIS 2011 92/97

Page 93: Sparse coding for machine learning, image processing and ...

Important messages

Learned dictionaries are well adapted to model the localappearance of images and edges.

They can be used to learn dictionaries of SIFT features.

Julien Mairal Journees ORASIS 2011 93/97

Page 94: Sparse coding for machine learning, image processing and ...

References I

M. Aharon, M. Elad, and A. M. Bruckstein. The K-SVD: An algorithm for designingof overcomplete dictionaries for sparse representations. IEEE Transactions onSignal Processing, 54(11):4311–4322, November 2006.

Y-L. Boureau, F. Bach, Y. Lecun, and J. Ponce. Learning mid-level features forrecognition. In Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR), 2010.

S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basispursuit. SIAM Journal on Scientific Computing, 20:33–61, 1999.

M. Elad and M. Aharon. Image denoising via sparse and redundant representationsover learned dictionaries. IEEE Transactions on Image Processing, 54(12):3736–3745, December 2006.

K. Engan, S. O. Aase, and J. H. Husoy. Frame based signal compression usingmethod of optimal directions (MOD). In Proceedings of the 1999 IEEEInternational Symposium on Circuits Systems, volume 4, 1999.

R. Grosse, R. Raina, H. Kwong, and A. Y. Ng. Shift-invariant sparse coding for audioclassification. In Proceedings of the Twenty-third Conference on Uncertainty inArtificial Intelligence, 2007.

Julien Mairal Journees ORASIS 2011 94/97

Page 95: Sparse coding for machine learning, image processing and ...

References IIA. Haar. Zur theorie der orthogonalen funktionensysteme. Mathematische Annalen,

69:331–371, 1910.

K. Huang and S. Aviyente. Sparse representation for signal classification. In Advancesin Neural Information Processing Systems, Vancouver, Canada, December 2006.

D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. InAdvances in Neural Information Processing Systems, 2001.

H. Lee, A. Battle, R. Raina, and A. Y. Ng. Efficient sparse coding algorithms. InB. Scholkopf, J. Platt, and T. Hoffman, editors, Advances in Neural InformationProcessing Systems, volume 19, pages 801–808. MIT Press, Cambridge, MA, 2007.

M. Leordeanu, M. Hebert, and R. Sukthankar. Beyond local appearance: Categoryrecognition from pairwise interactions of simple features. In Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007.

M. S. Lewicki and T. J. Sejnowski. Learning overcomplete representations. NeuralComputation, 12(2):337–365, 2000.

J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Discriminative learneddictionaries for local image analysis. In Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR), 2008a.

Julien Mairal Journees ORASIS 2011 95/97

Page 96: Sparse coding for machine learning, image processing and ...

References IIIJ. Mairal, M. Elad, and G. Sapiro. Sparse representation for color image restoration.

IEEE Transactions on Image Processing, 17(1):53–69, January 2008b.

J. Mairal, M. Leordeanu, F. Bach, M. Hebert, and J. Ponce. Discriminative sparseimage models for class-specific edge detection and image interpretation. InProceedings of the European Conference on Computer Vision (ECCV), 2008c.

J. Mairal, G. Sapiro, and M. Elad. Learning multiscale sparse representations forimage and video restoration. SIAM Multiscale Modelling and Simulation, 7(1):214–241, April 2008d.

J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online dictionary learning for sparsecoding. In Proceedings of the International Conference on Machine Learning(ICML), 2009.

S. Mallat. A Wavelet Tour of Signal Processing, Second Edition. Academic Press,New York, September 1999.

B. A. Olshausen and D. J. Field. Sparse coding with an overcomplete basis set: Astrategy employed by V1? Vision Research, 37:3311–3325, 1997.

M. Protter and M. Elad. Image sequence denoising via sparse and redundantrepresentations. IEEE Transactions on Image Processing, 18(1):27–36, 2009.

Julien Mairal Journees ORASIS 2011 96/97

Page 97: Sparse coding for machine learning, image processing and ...

References IVS. Roth and M. J. Black. Fields of experts: A framework for learning image priors. In

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2005.

P. Sprechmann, I. Ramirez, G. Sapiro, and Y. C. Eldar. Collaborative hierarchicalsparse modeling. Technical report, 2010. Preprint arXiv:1003.0400v1.

R. Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the RoyalStatistical Society. Series B, 58(1):267–288, 1996.

J. Winn, A. Criminisi, and T. Minka. Object categorization by learned universal visualdictionary. In Proceedings of the IEEE International Conference on ComputerVision (ICCV), 2005.

J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparsecoding for image classification. In Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR), 2009.

J. Yang, K. Yu, , and T. Huang. Supervised translation-invariant sparse coding. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2010.

Julien Mairal Journees ORASIS 2011 97/97