A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... ·...
Transcript of A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... ·...
![Page 1: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/1.jpg)
Yaniv Romano The Electrical Engineering Department Technion – Israel Institute of technology
Joint work with
Jeremias Sulam Vardan Papyan Prof. Michael Elad
1
The research leading to these results has been received funding from the European union's Seventh Framework Program (FP/2007-2013) ERC grant Agreement ERC-SPARSE- 320649
A Quest for a Universal Model for Signals:
From Sparsity to ConvNets
![Page 2: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/2.jpg)
Local Sparsity
=
Signal
Dictionary (learned) Sparse vector
![Page 3: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/3.jpg)
Local Sparsity
Independent patch-processing
=
![Page 4: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/4.jpg)
Local Sparsity
Independent patch-processing
=
![Page 5: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/5.jpg)
=
Convolutional Sparse Coding
Independent patch-processing
Local Sparsity
![Page 6: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/6.jpg)
Multi-Layer Convolutional Sparse Coding
Convolutional Sparse Coding
Independent patch-processing
Local Sparsity
![Page 7: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/7.jpg)
Convolutional Sparse Coding
Independent patch-processing
Local Sparsity
Convolutional Neural Networks
Multi-Layer Convolutional Sparse Coding
![Page 8: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/8.jpg)
Forward pass Multi-Layer
Convolutional Sparse Coding
Extension of the classical sparse theory to a multi-layer setting
The forward pass is a sparse-coding algorithm,
serving our model
Convolutional Sparse Coding
!
Convolutional Neural Networks
Independent patch-processing
Local Sparsity
![Page 9: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/9.jpg)
Our Story Begins with…
9
![Page 10: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/10.jpg)
Image Denoising
Many (thousands) image denoising algorithms can be cast as the minimization of an energy function of the form
10
Original Image 𝐗
White Gaussian Noise
𝐄
Noisy Image 𝐘
Relation to measurements
Prior or regularization
min𝐗
1
2𝐗 − 𝐘 2
2 + G 𝐗 Topic=image and
noise and (removal
or denoising)
![Page 11: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/11.jpg)
Leading Image Denoising Methods…
are built upon powerful patch-based local models: Popular local models: GMM Sparse-Representation Example-based Low-rank Field-of-Experts & Neural networks
11
Working locally allows us to learn the model,
e.g. dictionary learning
![Page 12: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/12.jpg)
• It does come up in many applications
• Other inverse problems can be recast as an iterative denoising [Zoran & Weiss
‘11] [Venkatakrishnan, Bouman & Wohlberg ‘13] [Brifman, Romano & Elad ’16]
• In a recent work we show that a denoiser 𝑓 𝐗 can form a regularizer: Regularization by Denoising (RED) [Romano, Elad & Milanfar ‘16]
• It is the simplest inverse problem revealing the limitations of the model…
12
min𝐗 1
2𝐇𝐗 − 𝐘 2
2 + 𝜌
2 𝐗T 𝐗 − 𝑓 𝐗
G 𝐗 regularizer
min𝐗
1
2𝐇𝐗 − 𝐘 2
2 + G 𝐗
Relation to measurements
Prior
𝛻𝐗G 𝐗 = 𝐗 − 𝑓 𝐗
Under simple conditions on 𝑓 𝐱
Relation to measurements
Why is it so Popular?
![Page 13: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/13.jpg)
• Assumes that every patch is a linear combination of a few columns, called atoms, taken from a matrix called a dictionary
• The operator 𝐑i extracts the i-th 𝑛-dimensional patch from 𝐗 ∈ ℝ𝑁
• Sparse coding:
𝑛
𝑚 > 𝑛
𝑛
𝐑i𝐗
𝛄i
= 𝛀
The Sparse-Land Model
13
𝐏0 : min𝛄i
𝛄i 0 s. t. 𝐑i𝐗 = 𝛀𝛄i
![Page 14: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/14.jpg)
Patch Denoising Given a noisy patch 𝐑i𝐘, solve
14
Greedy methods such as Orthogonal Matching Pursuit
(OMP) or Thresholding
Convex relaxations such as Basis Pursuit (BP)
Clean patch: 𝛀𝛄 i 𝐏0 and (𝐏0
ϵ) are hard to solve
𝐏0ϵ : 𝛄 i = argmin
𝛄i
𝛄i 0 s. t. 𝐑i𝐘 − 𝛀𝛄i 2 ≤ ϵ
𝐏1ϵ : min
𝛄i 𝛄i 1 + ξ 𝐑i𝐘 − 𝛀𝛄i 2
2
![Page 15: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/15.jpg)
Recall K-SVD Denoising [Elad & Aharon, ‘06]
15
Noisy Image Reconstructed Image
Denoise each patch
Using OMP
Initial Dictionary Using K-SVD
Update the dictionary
• Despite its simplicity, this is a very well-performing algorithm • We refer to this framework as “patch averaging” • A modification of this method leads to state-of-the-art results
[Mairal, Bach, Ponce, Spairo, Zisserman, `09]
![Page 16: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/16.jpg)
• Many researchers kept revisiting this algorithm with a feeling that key features are still lacking
• Here is what WE thought of…
16
What is ?
Gap: Global-LocalThe
Efficient Independent Local Patch Processing VS.
The Global Need to Model The Entire Image
![Page 17: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/17.jpg)
𝐘
• Many researchers kept revisiting this algorithm with a feeling that key features are still lacking
• Here is what WE thought of… Whitening the residual image [Romano & Elad ‘13]
17
𝐑i𝐘 𝛀𝛄𝐢
What is ?
![Page 18: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/18.jpg)
𝐗 𝐘
• Many researchers kept revisiting this algorithm with a feeling that key features are still lacking
• Here is what WE thought of… Whitening the residual image [Romano & Elad ‘13]
18
𝐑i𝐘 𝛀𝛄i
Orthogonal (when using the OMP)
What is ?
![Page 19: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/19.jpg)
𝐗 𝐘
• Many researchers kept revisiting this algorithm with a feeling that key features are still lacking
• Here is what WE thought of… Whitening the residual image [Romano & Elad ‘13]
19
𝐑i𝐘 𝛀𝛄i
Orthogonal (when using the OMP)
The orthogonally is lost due to patch-averaging
What is ?
![Page 20: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/20.jpg)
𝐗 𝐘
• Many researchers kept revisiting this algorithm with a feeling that key features are still lacking
• Here is what WE thought of… Whitening the residual image [Romano & Elad ‘13]
20
𝐑i𝐘 𝛀𝛄i
Orthogonal (when using the OMP)
What is ?
![Page 21: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/21.jpg)
• Many researchers kept revisiting this algorithm with a feeling that key features are still lacking
• Here is what WE thought of… Whitening the residual image [Romano & Elad ‘13]
SOS-Boosting [Romano & Elad ’15]
21
Denoise
What is ?
![Page 22: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/22.jpg)
• Many researchers kept revisiting this algorithm with a feeling that key features are still lacking
• Here is what WE thought of… Whitening the residual image [Romano & Elad ‘13]
SOS-Boosting [Romano & Elad ’15]
22
Denoise
Previous Result
Strengthen
What is ?
![Page 23: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/23.jpg)
• Many researchers kept revisiting this algorithm with a feeling that key features are still lacking
• Here is what WE thought of… Whitening the residual image [Romano & Elad ‘13]
SOS-Boosting [Romano & Elad ’15]
23
Denoise
Previous Result
Strengthen – Operate
What is ?
![Page 24: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/24.jpg)
• Many researchers kept revisiting this algorithm with a feeling that key features are still lacking
• Here is what WE thought of… Whitening the residual image [Romano & Elad ‘13]
SOS-Boosting [Romano & Elad ’15]
24
Denoise
Previous Result
Strengthen – Operate – Subtract
What is ?
![Page 25: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/25.jpg)
25
Relation to Game Theory: Encourage overlapping patches to reach a consensus Relation to Graph Theory: Minimizing a Laplacian regularization functional:
𝐗 𝑘+1 = 𝑓 𝐘 + 𝜌𝐗 𝑘 − 𝜌𝐗 𝑘
𝐗 ∗ = argmin𝐗
𝐗 − 𝑓 𝐘 22 + 𝜌𝐗𝑇 𝐗 − 𝑓 𝐗
• Many researchers kept revisiting this algorithm with a feeling that key features are still lacking
• Here is what WE thought of… Whitening the residual image [Romano & Elad ‘13]
SOS-Boosting [Romano & Elad ’15]
Why boosting? Since it’s guaranteed to be able to improve any denoiser
What is ?
![Page 26: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/26.jpg)
26
• Many researchers kept revisiting this algorithm with a feeling that key features are still lacking
• Here is what WE thought of… Whitening the residual image [Romano & Elad ‘13]
SOS-Boosting [Romano & Elad ’15]
Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14]
What is ?
![Page 27: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/27.jpg)
27
• Many researchers kept revisiting this algorithm with a feeling that key features are still lacking
• Here is what WE thought of… Whitening the residual image [Romano & Elad ‘13]
SOS-Boosting [Romano & Elad ’15]
Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14]
A multi-scale treatment [Ophir, Lustig & Elad ‘11] [Sulam, Ophir & Elad ‘14] [Papyan & Elad ‘15]
What is ?
![Page 28: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/28.jpg)
28
• Many researchers kept revisiting this algorithm with a feeling that key features are still lacking
• Here is what WE thought of… Whitening the residual image [Romano & Elad ‘13]
SOS-Boosting [Romano & Elad ’15]
Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14]
A multi-scale treatment [Ophir, Lustig & Elad ‘11] [Sulam, Ophir & Elad ‘14] [Papyan & Elad ‘15]
Leveraging the context of the patch [Romano & Elad ‘16] [Romano, Isidoro & Milanfar ‘16]
• Con-Patch
Self-Similarity feature
What is ?
![Page 29: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/29.jpg)
29
• Many researchers kept revisiting this algorithm with a feeling that key features are still lacking
• Here is what WE thought of… Whitening the residual image [Romano & Elad ‘13]
SOS-Boosting [Romano & Elad ’15]
Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14]
A multi-scale treatment [Ophir, Lustig & Elad ‘11] [Sulam, Ophir & Elad ‘14] [Papyan & Elad ‘15]
Leveraging the context of the patch [Romano & Elad ‘16] [Romano, Isidoro & Milanfar ‘16]
• Con-Patch
• RAISR (Upscaling)
Edge Features: Direction, Strength, Consistency
What is ?
![Page 30: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/30.jpg)
30
• Many researchers kept revisiting this algorithm with a feeling that key features are still lacking
• Here is what WE thought of… Whitening the residual image [Romano & Elad ‘13]
SOS-Boosting [Romano & Elad ’15]
Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14]
A multi-scale treatment [Ophir, Lustig & Elad ‘11] [Sulam, Ophir & Elad ‘14] [Papyan & Elad ‘15]
Leveraging the context of the patch [Romano & Elad ‘16] [Romano, Isidoro & Milanfar ‘16]
• Con-Patch
• RAISR (Upscaling)
Edge Features: Direction, Strength, Consistency
What is ?
![Page 31: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/31.jpg)
31
• Many researchers kept revisiting this algorithm with a feeling that key features are still lacking
• Here is what WE thought of… Whitening the residual image [Romano & Elad ‘13]
SOS-Boosting [Romano & Elad ’15]
Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14]
A multi-scale treatment [Ophir, Lustig & Elad ‘11] [Sulam, Ophir & Elad ‘14] [Papyan & Elad ‘15]
Leveraging the context of the patch [Romano & Elad ‘16] [Romano, Isidoro & Milanfar ‘16]
Enforcing the local model on the final patches (EPLL)
• Sparse EPLL [Sulam & Elad ‘15]
K-SVD EPLL Noisy
What is ?
![Page 32: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/32.jpg)
32
• Many researchers kept revisiting this algorithm with a feeling that key features are still lacking
• Here is what WE thought of… Whitening the residual image [Romano & Elad ‘13]
SOS-Boosting [Romano & Elad ’15]
Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14]
A multi-scale treatment [Ophir, Lustig & Elad ‘11] [Sulam, Ophir & Elad ‘14] [Papyan & Elad ‘15]
Leveraging the context of the patch [Romano & Elad ‘16] [Romano, Isidoro & Milanfar ‘16]
Enforcing the local model on the final patches (EPLL)
• Sparse EPLL [Sulam & Elad ‘15]
• Image Synthesis [Ren, Romano & Elad ‘17]
What is ?
![Page 33: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/33.jpg)
33
• Many researchers kept revisiting this algorithm with a feeling that key features are still lacking
• Here is what WE thought of… Whitening the residual image [Romano & Elad ‘13]
SOS-Boosting [Romano & Elad ’15]
Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14]
A multi-scale treatment [Ophir, Lustig & Elad ‘11] [Sulam, Ophir & Elad ‘14] [Papyan & Elad ‘15]
Leveraging the context of the patch [Romano & Elad ‘16] [Romano, Isidoro & Milanfar ‘16]
Enforcing the local model on the final patches (EPLL)
• Sparse EPLL [Sulam & Elad ‘15]
• Image Synthesis [Ren, Romano & Elad ‘17]
What is ?
![Page 34: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/34.jpg)
• Many researchers kept revisiting this algorithm with a feeling that key features are still lacking
• Here is what WE thought of… Whitening the residual image [Romano & Elad ‘13]
SOS-Boosting [Romano & Elad ’15]
Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14]
A multi-scale treatment [Ophir, Lustig & Elad ‘11] [Sulam, Ophir & Elad ‘14] [Papyan & Elad ‘15]
Leveraging the context of the patch [Romano & Elad ‘16] [Romano, Isidoro & Milanfar ‘16]
Enforcing the local model on the final patches (EPLL)
• Sparse EPLL [Sulam & Elad ‘15]
• Image Synthesis [Ren, Romano & Elad ‘17]
…
What is ?
34
![Page 35: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/35.jpg)
• Many researchers kept revisiting this algorithm with a feeling that key features are still lacking
• Here is what WE thought of… Whitening the residual image [Romano & Elad ‘13]
SOS-Boosting [Romano & Elad ’15]
Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14]
A multi-scale treatment [Ophir, Lustig & Elad ‘11] [Sulam, Ophir & Elad ‘14] [Papyan & Elad ‘15]
Leveraging the context of the patch [Romano & Elad ‘16] [Romano, Isidoro & Milanfar ‘16]
Enforcing the local model on the final patches (EPLL)
• Sparse EPLL [Sulam & Elad ‘15]
• Image Synthesis [Ren, Romano & Elad ‘17]
…
What is ?
35
Missing a theoretical backbone!
Why can we use a local prior to solve a global problem?
![Page 36: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/36.jpg)
Now, Our Story Takes a Surprising Turn
36
![Page 37: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/37.jpg)
37
Convolutional Sparse Coding
[LeCun, Bottou, Bengio and Haffner ‘98] [Lewicki & Sejnowski ‘99] [Hashimoto & Kurata, ‘00] [Mørup, Schmidt & Hansen ’08]
[Zeiler, Krishnan, Taylor & Fergus ‘10] [Jarrett, Kavukcuoglu, Gregor, LeCun ‘11] [Heide, Heidrich & Wetzstein ‘15] [Gu, Zuo, Xie, Meng, Feng & Zhang ‘15] …
Working Locally Thinking Globally: Theoretical Guarantees for Convolutional Sparse Coding
Vardan Papyan, Jeremias Sulam and Michael Elad
Convolutional Dictionary Learning via Local Processing Vardan Papyan, Yaniv Romano, Jeremias Sulam, and Michael Elad
![Page 38: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/38.jpg)
Intuitively…
38
𝐗
The first filter The second filter
= + + + + + + + = + + + + + + +
![Page 39: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/39.jpg)
Convolutional Sparse Coding (CSC)
39
∗ = +
+ +
+
…
∗
∗ ∗ ∗
![Page 40: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/40.jpg)
𝐑i𝐗 = 𝛀𝛄i
𝐗 = 𝐃𝚪
=
Convolutional Sparse Representation
40
𝑛
(2𝑛 − 1)𝑚
𝐑i𝐗 𝑛
(2𝑛 − 1)𝑚
𝐑i+1𝐗
𝛄i 𝛄i+1
stripe-dictionary
Adjacent representations overlap, as they skip by 𝑚 items as we sweep through the patches of 𝐗
stripe vector
𝐃L
![Page 41: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/41.jpg)
CSC Relation to Our Story
• A clear global model: every patch has a sparse representation w.r.t. to the same local dictionary 𝛀, just as we have assumed
• No notion of disagreement on the patch overlaps
• Related to the current common practice of patch averaging (𝐑iT
- put the patch 𝛀𝛄i back in the i-th location of the global vector)
𝐗 = 𝐃𝚪 =1
𝑛 𝐑i
T𝛀𝛄ii
• What about the Pursuit? “Patch averaging”: independent sparse coding for each patch
CSC: should seek all the representations together
• Is there a bridge between the two? We’ll come back to this later …
• What about the theory? Until recently little was known regrading the theoretical aspects of CSC
41
![Page 42: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/42.jpg)
Classical Sparse Theory (Noiseless)
42
Definition: Mutual Coherence: μ 𝐃 = maxi≠j
|diTdj|
[Donoho & Elad ‘03]
Theorem: For a signal 𝐗 = 𝐃𝚪, if 𝚪 0 <1
21 +
1
μ 𝐃
then this solution is necessarily the sparsest
[Donoho & Elad ‘03]
Theorem: The OMP and BP are guaranteed to recover the
true sparse code assuming that 𝚪 0 <1
21 +
1
μ 𝐃
[Tropp ‘04], [Donoho & Elad ‘03]
𝐏0 : min𝚪
𝚪 0 s. t. 𝐗 = 𝐃𝚪
![Page 43: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/43.jpg)
• Assuming that 𝑚 = 2 and 𝑛 = 64 we have that [Welch, ’74]
μ 𝐃 ≥ 0.063
• As a result, uniqueness and success of pursuits is guaranteed as long as
𝚪 0 <1
21 +
1
μ(𝐃)≤1
21 +
1
0.063≈ 8
• Less than 8 non-zeros GLOBALLY are allowed!!! This is a very pessimistic result!
• Repeating the above for the noisy case leads to even worse performance predictions
• Bottom line: Classic SparseLand theory cannot provide good explanations for the CSC model
CSC: The Need for a Theoretical Study
43
![Page 44: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/44.jpg)
• If 𝚪 is locally sparse [Papyan, Sulam & Elad ‘16]
The solution of 𝐏0,∞ is necessarily unique
The global OMP and BP are guaranteed to recover it
𝑚 = 2 𝛄i 𝛄i 𝛄i 𝛄i 𝛄i 𝛄i 𝛄i 𝛄i 𝛄i 𝛄i 𝛄i 𝛄i 𝛄i 𝛄i 𝛄i 𝛄i 𝛄i 𝛄i 𝛄i
44
This result poses a local constraint for a global guarantee, and as such, the guarantees scale linearly with the dimension of the signal
𝚪 0,∞s is low all 𝛄i are sparse every
patch has a sparse representation over 𝛀
𝚪 0,∞s = max
i 𝛄i 0
𝐏0,∞ : min𝚪
𝚪 0,∞s s. t. 𝐗 = 𝐃𝚪
Moving to Local Sparsity: Stripes
![Page 45: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/45.jpg)
• In practice, 𝐘 = 𝐃𝚪 + 𝐄, where 𝐄 is due to noise or model deviations
• If 𝚪 is locally sparse and the noise is bounded [Papyan, Sulam & Elad ‘16]
The solution of the 𝐏0,∞ϵ is stable
The solution obtained via global OMP/BP is stable
45
From Ideal to Noisy Signals
𝐃𝚪 𝐄 𝐘
𝐏0,∞ϵ : min
𝚪 𝚪 0,∞
s s. t. 𝐘 − 𝐃𝚪 2 ≤ ϵ 𝚪
The true and estimated representation are close
How close is 𝚪 to 𝚪?
![Page 46: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/46.jpg)
Global Pursuit via Local Processing
46
=
𝑛
𝑚
𝑚 𝐃L αi
𝐏1ϵ : 𝚪BP = min
𝚪 1
2𝐘 − 𝐃𝚪 2
2 + ξ 𝚪 1
𝐗 = 𝐃𝚪
= 𝐑iT𝐬i
i
= 𝐑iT𝐃L𝛂i
i
si are slices not patches
![Page 47: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/47.jpg)
min𝐬𝑖 , 𝛂𝑖
1
2𝐘 − 𝐑i
T𝐬ii 2
2
+ 𝜉 𝛂𝑖 1
i
s. t. 𝐬i = 𝐃L𝛂i
47
Using variable splitting si = 𝐃Lαi
• These two problems are equivalent, and convex w.r.t their variables
• The new formulation targets the local slices, and their sparse representations
• Can be solved via ADMM, replace the constraint with a penalty
Global Pursuit via Local Processing (2)
𝐏1ϵ : 𝚪BP = min
𝚪 1
2𝐘 − 𝐃𝚪 2
2 + ξ 𝚪 1
![Page 48: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/48.jpg)
Local Sparse Coding:
Slice Reconstruction:
Slice Aggregation:
Local Laplacian:
Dual Variable Update:
𝛂𝑖 = argmin𝛂𝑖
𝜌
2𝐬𝑖 − 𝐃𝐿𝛂𝑖 + 𝐮𝑖 𝟐
𝟐 + 𝜉 𝛂𝑖 𝟏
Slice-Based Pursuit
48
𝐩𝑖 =1
𝜌𝐑𝑖𝐘 + 𝐃𝐿𝛂𝑖 − 𝐮𝑖
𝐗 = 𝐑𝑗𝐓𝐩𝑗
𝑗
𝐬𝑖 = 𝐩𝑖 −1
𝜌 + 𝑛𝐑𝑖𝐗
𝐮𝑖 = 𝐮𝑖 + 𝐬𝑖 − 𝐃𝐿𝛂𝑖
Comment: One iteration of this
procedure amounts to … the very same
patch-averaging algorithm we started with
![Page 49: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/49.jpg)
49
Two Comments About this Scheme
Patches Slices
Patches Slices
The Proposed Scheme can be used for Dictionary (𝐃L) Learning
We work with Slices and not Patches
Slice-based DL algorithm using standard patch-based tools, leading
to a faster and simpler method, compared to existing methods
Patches extracted from natural images, and their corresponding slices. Observe how the slices are
far simpler, and contained by their corresponding patches
[Wohlberg, 2016] Ours
![Page 50: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/50.jpg)
50
Two Comments About this Scheme
Patches Slices
Patches Slices
The Proposed Scheme can be used for Dictionary (𝐃L) Learning
We work with Slices and not Patches
Slice-based DL algorithm using standard patch-based tools, leading
to a faster and simpler method, compared to existing methods
Patches extracted from natural images, and their corresponding slices. Observe how the slices are
far simpler, and contained by their corresponding patches
[Wohlberg, 2016] Ours
Heide et. al Wohlberg
![Page 51: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/51.jpg)
51
Going Deeper Convolutional Neural Networks
Analyzed via Convolutional Sparse Coding
Joint work with Vardan Papyan and Michael Elad
![Page 52: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/52.jpg)
CSC and CNN
• There is an analogy between CSC and CNN:
Convolutional structure, data driven models, ReLU is a sparsifying operator, and more…
• We propose a principled way to analyze CNN
• But first, a short review of CNN…
52
CNN
Convolutional Neural Networks
SparseLand Sparse Representation Theory
*Our analysis holds true for fully connected networks as well
The Underlying Idea
Modeling data sources enables a theoretical analysis of algorithms’ performance
*
![Page 53: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/53.jpg)
CNN
53
𝑁
𝑁
𝐘 𝑚1
𝑁
𝑁
𝑛0
𝑛0 𝐖1
𝑚2
𝑁
𝑁 𝑛1
𝑛1
𝐖2
ReLU z = max 0, z
[LeCun, Bottou, Bengio and Haffner ‘98] [Krizhevsky, Sutskever & Hinton ‘12] [Simonyan & Zisserman ‘14] [He, Zhang, Ren & Sun ‘15]
![Page 54: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/54.jpg)
𝑓 𝐗, 𝐖i , 𝐛i = ReLU 𝐛2 +𝐖2T ReLU 𝐛1 +𝐖1
T𝐗 𝑓 𝐗, 𝐖i , 𝐛i = ReLU 𝐛2 +𝐖2T ReLU 𝐛1 +𝐖1
T𝐗 𝑓 𝐗, 𝐖i , 𝐛i = ReLU 𝐛2 +𝐖2T ReLU 𝐛1 +𝐖1
T𝐘 𝑓 𝐗, 𝐖i , 𝐛i = ReLU 𝐛2 +𝐖2T ReLU 𝐛1 +𝐖1
T𝐘 𝑓 𝐗, 𝐖i , 𝐛i = ReLU 𝐛2 +𝐖2T ReLU 𝐛1 +𝐖1
T𝐘 𝑓 𝐗, 𝐖i , 𝐛i = ReLU 𝐛2 +𝐖2T ReLU 𝐛1 +𝐖1
T𝐘 𝑓 𝐗, 𝐖i , 𝐛i = ReLU 𝐛2 +𝐖2T ReLU 𝐛1 +𝐖1
T𝐘 𝑓 𝐘, 𝐖i , 𝐛i = ReLU 𝐛2 +𝐖2T ReLU 𝐛1 +𝐖1
T𝐘
Mathematically...
54
𝐙2 ∈ ℝ𝑁𝑚2
𝑚1
ReLU ReLU
𝐖2T ∈ ℝ𝑁𝑚2×𝑁𝑚1
𝑛1𝑚1 𝑚2
𝐖1T ∈ ℝ𝑁𝑚1×𝑁
𝑚1 𝑛0
𝐛1 ∈ ℝ𝑁𝑚1
𝐛2 ∈ ℝ𝑁𝑚2
𝐘 ∈ ℝ𝑁
![Page 55: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/55.jpg)
min𝐖i , 𝐛i ,𝐔
ℓ h 𝐘j , 𝐔, 𝑓 𝐘j, 𝐖i , 𝐛ij
• Consider the task of classification
• Given a set of signals 𝐘j j and their corresponding labels
h 𝐘j j, the CNN learns an end-to-end mapping
Training Stage of CNN
55
Output of last layer Classifier True label
![Page 56: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/56.jpg)
Back to CSC
56
𝐗 ∈ ℝ𝑁
𝑚1
𝑛0
𝐃1 ∈ ℝ𝑁×𝑁𝑚1
𝑛1𝑚1 𝑚2
𝐃2 ∈ ℝ𝑁𝑚1×𝑁𝑚2
𝑚1
𝚪1 ∈ ℝ𝑁𝑚1
𝚪1 ∈ ℝ𝑁𝑚1
𝚪2 ∈ ℝ𝑁𝑚2
Convolutional sparsity (CSC) assumes an
inherent structure is present in natural
signals
We propose to impose the same structure on the
representations themselves
Multi-Layer CSC (ML-CSC)
![Page 57: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/57.jpg)
Intuition: From Atoms to Molecules
57
𝐗 ∈ ℝ𝑁 𝐃1 ∈ ℝ𝑁×𝑁𝑚1 𝐃2 ∈ ℝ𝑁𝑚1×𝑁𝑚2
𝚪1 ∈ ℝ𝑁𝑚1
𝚪2 ∈ ℝ𝑁𝑚2
• Columns in 𝐃1 are convolutional atoms
• Columns in 𝐃2 combine the atoms in 𝐃𝟏, creating more complex structures
The dictionary 𝐃1𝐃2 is a superposition of the atoms of 𝐃1
• The size of the effective atoms is increased throughout the layers (receptive field)
𝚪1 ∈ ℝ𝑁𝑚1
![Page 58: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/58.jpg)
Intuition: From Atoms to Molecules
58
𝐗 ∈ ℝ𝑁 𝐃1 ∈ ℝ𝑁×𝑁𝑚1 𝐃2 ∈ ℝ𝑁𝑚1×𝑁𝑚2
𝚪1 ∈ ℝ𝑁𝑚1
𝚪2 ∈ ℝ𝑁𝑚2
• One could chain the multiplication
of all the dictionaries into one effective dictionary 𝐃eff = 𝐃1𝐃2𝐃3 ∙∙∙ 𝐃K
and then 𝐗 = 𝐃eff 𝚪K as in SparseLand
• However, a key property in this model is the sparsity of each representation (feature-maps)
![Page 59: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/59.jpg)
A Small Taste: Model Training (MNIST)
59
𝐃1𝐃2𝐃3 (28×28)
MNIST Dictionary: •D1: 32 filters of size 7×7, with stride of 2 (dense) •D2: 128 filters of size 5×5×32 with stride of 1 - 99.09 % sparse •D3: 1024 filters of size 7×7×128 – 99.89 % sparse
𝐃1𝐃2 (15×15)
𝐃1 (7×7)
![Page 60: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/60.jpg)
A Small Taste: Pursuit
60
Γ1
Γ2
Γ3
Γ0
Y
99.51% sparse (5 nnz)
99.52% sparse (30 nnz)
94.51 % sparse (213 nnz)
![Page 61: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/61.jpg)
A Small Taste: Pursuit
61
Γ1
Γ2
Γ3
Γ0
Y
99.41 % sparse (6 nnz) 99.25 % sparse
(47 nnz)
92.20 % sparse (302 nnz)
![Page 62: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/62.jpg)
A Small Taste: Model Training (CIFAR)
62
𝐃1𝐃2𝐃3 (32×32)
CIFAR Dictionary: • D1: 64 filters of size 5x5x3, stride of 2
dense • D2: 256 filters of size 5x5x64, stride of 2
82.99 % sparse • D3: 1024 filters of size 5x5x256
90.66 % sparse
𝐃1𝐃2 (13×13) 𝐃1 (5×5×3)
![Page 63: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/63.jpg)
Deep Coding Problem (𝐃𝐂𝐏)
63
Find 𝚪j j=1
K s. t.
𝐗 = 𝐃1𝚪1 𝚪1 0,∞s ≤ λ1
𝚪1 = 𝐃2𝚪2 𝚪2 0,∞s ≤ λ2
⋮ ⋮𝚪K−1 = 𝐃K𝚪K 𝚪K 0,∞
s ≤ λK
Find 𝚪j j=1
K s. t.
𝐘 − 𝐃1𝚪1 2 ≤ ℰ0 𝚪1 0,∞s ≤ λ1
𝚪1 −𝐃2𝚪2 2 ≤ ℰ1 𝚪2 0,∞s ≤ λ2
⋮ ⋮𝚪K−1 − 𝐃K𝚪K 2 ≤ ℰK−1 𝚪K 0,∞
s ≤ λK
Noiseless Pursuit
Noisy Pursuit
![Page 64: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/64.jpg)
Deep Learning Problem 𝐃𝐋𝐏
64
min𝐃i i=1
K ,𝐔 ℓ h 𝐘j , 𝐔, 𝐃𝐂𝐏
⋆ 𝐘j, 𝐃𝐢
j
Supervised Dictionary Learning
Task driven dictionary learning [Mairal, Bach, Sapiro & Ponce ’12]
Unsupervised Dictionary Learning
Deepest representation obtained from DCP
Classifier True label
Find 𝐃i i=1K s. t.
𝐘j − 𝐃1𝚪𝟏j
2≤ ℰ0 𝚪𝟏
j
0,∞
s≤ λ1
𝚪1j− 𝐃2𝚪𝟐
j
𝟐≤ ℰ𝟏 𝚪𝟐
j
0,∞
s≤ λ2
⋮ ⋮
𝚪K−1j
− 𝐃K𝚪𝑲j
𝟐≤ ℰ𝐊−𝟏 𝚪K
j
0,∞
s≤ λK
j=1
𝐽
![Page 65: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/65.jpg)
• The simplest pursuit algorithm (single-layer case) is the THR algorithm, which operates on a given input signal 𝐘 by:
• Restricting the coefficients to be nonnegative does not restrict the expressiveness of the model
65
ReLU = Soft Nonnegative Thresholding
ML-CSC: The Simplest Pursuit
and 𝚪 is sparse 𝚪 = 𝒫𝛽 𝐃T𝐘 𝐘 = 𝐃𝚪 + 𝐄
![Page 66: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/66.jpg)
• Layered thresholding (LT):
• Forward pass of CNN:
Consider this for Solving the DCP
66
Estimate 𝚪1 via the THR algorithm
Estimate 𝚪2 via the THR algorithm
𝑓 𝐗 = ReLU 𝐛2 +𝐖2T ReLU 𝐛1 +𝐖1
T𝐘
𝚪 2 = 𝒫β2 𝐃2T 𝒫β1 𝐃1
T𝐘
ReLU Soft Non-Negative THR
Bias 𝐛 Thresholds 𝛽
Dictionary D Weights 𝐖
Forward pass 𝑓 ⋅ Layered Soft NN THR 𝐃𝐂𝐏∗
𝐃𝐂𝐏λℰ : Find 𝚪j j=1
K s. t.
𝐘 − 𝐃1𝚪1 2 ≤ ℰ0 𝚪1 0,∞s ≤ λ1
𝚪1 − 𝐃2𝚪2 2 ≤ ℰ1 𝚪2 0,∞s ≤ λ2
⋮ ⋮𝚪K−1 − 𝐃K𝚪K 2 ≤ ℰK−1 𝚪K 0,∞
s ≤ λK
![Page 67: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/67.jpg)
• Layered thresholding (LT):
• Forward pass of CNN:
Consider this for Solving the DCP
67
Estimate 𝚪1 via the THR algorithm
Estimate 𝚪2 via the THR algorithm
𝑓 𝐗 = ReLU 𝐛2 +𝐖2T ReLU 𝐛1 +𝐖1
T𝐘
𝚪 2 = 𝒫β2 𝐃2T 𝒫β1 𝐃1
T𝐘
The layered (soft nonnegative) thresholding and the forward pass algorithm
are the very same things !!!
𝐃𝐂𝐏λℰ : Find 𝚪j j=1
K s. t.
𝐘 − 𝐃1𝚪1 2 ≤ ℰ0 𝚪1 0,∞s ≤ λ1
𝚪1 − 𝐃2𝚪2 2 ≤ ℰ1 𝚪2 0,∞s ≤ λ2
⋮ ⋮𝚪K−1 − 𝐃K𝚪K 2 ≤ ℰK−1 𝚪K 0,∞
s ≤ λK
![Page 68: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/68.jpg)
• DLP (supervised ):
• CNN training:
Consider this for Solving the DLP
68
The thresholds for the DCP should
also learned
min𝐃i i=1
K ,𝐔 ℓ h 𝐘j , 𝐔, 𝐃𝐂𝐏
⋆ 𝐘j, 𝐃𝐢
j
min𝐖i , 𝐛i ,U
ℓ h 𝐘j , 𝐔, 𝑓 𝐘, 𝐖i , 𝐛ij
Estimate via the layered THR algorithm
Layered Soft NN THR 𝐃𝐂𝐏∗
Forward pass 𝑓 ⋅
CNN
Language
SparseLand
Language
![Page 69: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/69.jpg)
• DLP (supervised ):
• CNN training:
Consider this for Solving the DLP
69
The thresholds for the DCP should
also learned
min𝐃i i=1
K ,𝐔 ℓ h 𝐘j , 𝐔, 𝐃𝐂𝐏
⋆ 𝐘j, 𝐃𝐢
j
Estimate via the layered THR algorithm
min𝐖i , 𝐛i ,U
ℓ h 𝐘j , 𝐔, 𝑓 𝐘, 𝐖i , 𝐛ij
Layered THR Pursuit 𝐃𝐂𝐏∗
CNN Language
SparseLand Language
Forward pass 𝑓 ⋅
The problem solved by the training stage of CNN and the DLP are equivalent as well, assuming that the DCP is approximated via
the layered thresholding algorithm
* Recall that for the ML-CSC, there exists an unsupervised avenue for training the dictionaries that has no simple parallel in CNN
*
![Page 70: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/70.jpg)
𝐘
Theoretical Questions
70
M A 𝚪 i i=1
K 𝐗 = 𝐃1𝚪1
𝚪1 = 𝐃2𝚪2 ⋮
𝚪K−1 = 𝐃K𝚪K
𝚪i is 𝐋0,∞ sparse
𝐃𝐂𝐏λℰ
Layered Thresholding
(Forward Pass)
Other?
𝐗
![Page 71: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/71.jpg)
Theoretical Path: Possible Questions
• Having established the importance of the ML-CSC model and its associated pursuit, the DCP problem, we now turn to its analysis
• The main questions we aim to address:
I. Uniqueness of the solution (set of representations) to the 𝐃𝐂𝐏λ ?
II. Stability of the solution to the 𝐃𝐂𝐏λℰ problem ?
III. Stability of the solution obtained via the hard and soft layered THR algorithms (forward pass) ?
IV. Limitations of this (very simple) algorithm and alternative pursuit?
V. Algorithms for training the dictionaries 𝐃i i=1K vs. CNN ?
VI. New insights on how to operate on signals via CNN ?
71
![Page 72: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/72.jpg)
Uniqueness of 𝐃𝐂𝐏λ
72
Theorem: If a set of solutions 𝚪i i=1K is
found for (𝐃𝐂𝐏λ) such that:
𝚪i 0,∞s = λi <
1
21 +
1
μ 𝐃i
Then these are necessarily the unique solution to this problem.
The feature maps CNN aims to recover are unique
[Donoho & Elad ‘03]
𝐃𝐂𝐏λ : Find a set of representations satisfying 𝐗 = 𝐃1𝚪1 𝚪1 0,∞
s ≤ λ1𝚪1 = 𝐃2𝚪2 𝚪2 0,∞
s ≤ λ2⋮ ⋮
𝚪K−1 = 𝐃K𝚪K 𝚪K 0,∞s ≤ λK
Is this set unique?
Mutual Coherence:
μ 𝐃 = maxi≠j
|diTdj|
![Page 73: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/73.jpg)
• Suppose that we manage to solve the
𝐃𝐂𝐏λℰ and find a feasible set of
representations satisfying all the conditions
• The question we pose is: How close is 𝚪 i to 𝚪i?
Stability of 𝐃𝐂𝐏λℰ
73
𝚪 i i=1
K
Is this set stable?
𝐃𝐂𝐏λℰ : Find a set of representations satisfying
𝐘 − 𝐃1𝚪1 2 ≤ ℰ0 𝚪1 0,∞s ≤ λ1
𝚪1 − 𝐃2𝚪2 2 ≤ ℰ1 𝚪2 0,∞s ≤ λ2
⋮ ⋮𝚪K−1 −𝐃K𝚪K 2 ≤ ℰK−1 𝚪K 0,∞
s ≤ λK
![Page 74: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/74.jpg)
Stability of 𝐃𝐂𝐏λℰ
74
Theorem: If the true representations 𝚪i i=1K satisfy
𝚪i 0,∞s ≤ λi <
1
21 +
1
μ 𝐃i
then the set of solutions 𝚪 i i=1
K obtained by solving this
problem (somehow) with ℰ0 = 𝐄 22 and ℰi = 0 (𝑖 ≥ 1)
must obey
𝚪 i − 𝚪i 2
2≤ 4 E 2
2 1
1− 2λj − 1 μ 𝐃j
𝑖
𝑗=1
Observe this annoying effect of error magnification as we
dive into the model
The problem CNN aims to solve is stable under certain conditions
![Page 75: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/75.jpg)
Local Noise Assumption
• Our analysis relied on the local sparsity of the underlying solution 𝚪, which was enforced through the ℓ0,∞ norm
• In what follows, we present stability guarantees that will also depend on the local energy in the noise vector 𝐄
• This will be enforced via the ℓ2,∞ norm, defined as:
75
𝐄 2,∞p
= maxi 𝐑i𝐄 2
![Page 76: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/76.jpg)
Stability of Layered-THR
76
Theorem: If 𝚪i 0,∞s <
1
21 +
1
μ 𝐃i⋅𝚪 imin
𝚪 imax −
1
μ 𝐃i⋅
εLi−1
𝚪 imax
then the layered hard THR (with the proper thresholds) will find the correct supports and
𝚪 iLT − 𝚪i 2,∞
p≤ εL
i
where we have defined εL0 = 𝐄 2,∞
p and
εLi = 𝚪i 0,∞
p⋅ εL
i−1 + μ 𝐃i 𝚪i 0,∞s − 1 𝚪 i
max
* Least-Squares update of the non-zeros?
*
The stability of the forward pass is guaranteed if the underlying representations are locally sparse and
the noise is locally bounded
![Page 77: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/77.jpg)
Limitations of the Forward Pass
• The stability analysis reveals several inherent limitations of the forward pass (a.k.a. Layered THR) algorithm:
• Even in the noiseless case, the forward pass is incapable of recovering the perfect solution of the DCP problem
• Its success depends on the ratio 𝚪 imin / 𝚪 i
max . This is a direct consequence of relying on a simple thresholding operator
• The distance between the true sparse vector and the estimated one increases exponentially as a function of the layer depth
• next we propose a new algorithm that attempts to solve some of these problems
77
![Page 78: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/78.jpg)
Special Case – Sparse Dictionaries
• Throughout the theoretical study we assumed that the representations in the different layers are 𝐋0,∞-sparse
• Do we know of a simple example of a set of dictionaries 𝐃i i=1K
and their corresponding signals 𝐗 that will obey this property?
• Assuming the dictionaries are sparse:
• In the context of CNN, the above happens if a sparsity promoting regularization, such as the 𝐋1, is employed on the filters
78
𝚪j 0,∞
s≤ 𝚪K 0,∞
s 𝐃i 0
K
i=j+1
Maximal number of
non-zeros in an atom in 𝐃i
![Page 79: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/79.jpg)
Better Pursuit ?
• So far we proposed the Layered THR:
• The motivation is clear – getting close to what CNN use
• However, this is the simplest and weakest pursuit known in the field of sparsity – Can we offer something better?
79
𝚪 𝐾 = 𝒫βK 𝐃KT …𝒫β2 𝐃2
T 𝒫β1 𝐃1T𝐗
𝐃𝐂𝐏λ : Find a set of representations satisfying 𝐗 = 𝐃1𝚪1 𝚪1 0,∞
s ≤ λ1𝚪1 = 𝐃2𝚪2 𝚪2 0,∞
s ≤ λ2⋮ ⋮
𝚪K−1 = 𝐃K𝚪K 𝚪K 0,∞s ≤ λK
![Page 80: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/80.jpg)
Layered Basis Pursuit (Noiseless)
80
• We can propose a Layered Basis Pursuit Algorithm:
𝚪1LBP = min
𝚪1 𝚪1 1 s. t. 𝐃1𝚪1 Deconvolutional = ܆
networks [Zeiler, Krishnan, Taylor &
Fergus ‘10] 𝚪2LBP = min
𝚪2 𝚪2 1 s. t. 𝚪1
LBP = 𝐃2𝚪2
𝐃𝐂𝐏λ : Find a set of representations satisfying 𝐗 = 𝐃1𝚪1 𝚪1 0,∞
s ≤ λ1𝚪1 = 𝐃2𝚪2 𝚪2 0,∞
s ≤ λ2⋮ ⋮
𝚪K−1 = 𝐃K𝚪K 𝚪K 0,∞s ≤ λK
![Page 81: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/81.jpg)
• As opposed to prior work in CNN, we can do far more than just proposing an algorithm – we can analyze its terms for success:
• Consequences: The layered BP can retrieve the underlying representations in the noiseless
case, a task in which the forward pass fails to provide
The Layered-BP’s success does not depend on the ratio 𝚪 imin / 𝚪 i
max
Guarantee for Success of Layered BP
81
Theorem: If a set of representations 𝚪i i=1K of the
Multi-Layered CSC model satisfy
𝚪i 0,∞s ≤ λi <
1
21 +
1
μ 𝐃i
then the Layered BP is guaranteed to find them
![Page 82: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/82.jpg)
Layered Basis Pursuit (Noisy)
82
𝚪1LBP = min
𝚪1 1
2𝐘 − 𝐃1𝚪1 2
2 + ξ1 𝚪1 1
𝚪2LBP = min
𝚪2 1
2 𝚪1
LBP −𝐃2𝚪2 2
2+ ξ2 𝚪2 1
𝐃𝐂𝐏λℰ : Find a set of representations satisfying
𝐘 − 𝐃1𝚪1 2 ≤ ℰ0 𝚪1 0,∞s ≤ λ1
𝚪1 − 𝐃2𝚪2 2 ≤ ℰ1 𝚪2 0,∞s ≤ λ2
⋮ ⋮𝚪K−1 −𝐃K𝚪K 2 ≤ ℰK−1 𝚪K 0,∞
s ≤ λK
Similarly to the noiseless case, this can be also solved via the Layered Basis Pursuit Algorithm… But in a
Lagrangian form
![Page 83: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/83.jpg)
Stability of Layered BP
83
Theorem: Assuming that 𝚪i 0,∞s ≤
1
31 +
1
μ 𝐃i
then for correctly chosen λi i=1K we are guaranteed that
1. The support of 𝚪 iLBP is contained in that of 𝚪i
2. The error is bounded: 𝚪 iLBP − 𝚪i 2,∞
p≤ εL
i , where
εLi = 7.5i 𝐄 2,∞
p 𝚪j 0,∞
pi
j=1
3. Every entry in 𝚪i greater than εLi / 𝚪i 0,∞
pwill be found
![Page 84: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/84.jpg)
Layered Iterative Soft-Thresholding
𝚪jt = 𝒮ξj/cj 𝚪j
t−1 +1
cj𝐃jT 𝚪 j−1 −𝐃j𝚪j
t−1
Layered Iterative Thresholding
84 * ci > 0.5 λmax(𝐃i
T𝐃i)
Layered BP: 𝚪jLBP = min
𝚪j 1
2 𝚪j−1
LBP −𝐃j𝚪j 2
2+ ξj 𝚪j 1
t
j
j
Note that our suggestion implies that groups of layers share the same dictionaries
Can be seen as a recurrent neural network [Gregor & LeCun ‘10]
![Page 85: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/85.jpg)
This Talk
85
Independent patch-processing
Local Sparsity
We described the limitations of patch-based processing and ways to overcome some of these
![Page 86: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/86.jpg)
This Talk
86
Independent patch-processing
Local Sparsity
Convolutional Sparse Coding
We presented a theoretical study of the CSC and a practical algorithm that works locally
![Page 87: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/87.jpg)
This Talk
87
Independent patch-processing
Local Sparsity
Convolutional Sparse Coding
Convolutional Neural
Networks
We mentioned several interesting connections between CSC and CNN and this led us to …
![Page 88: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/88.jpg)
This Talk
88
Multi-Layer Convolutional Sparse Coding
Independent patch-processing
Local Sparsity
Convolutional Sparse Coding
Convolutional Neural
Networks
… propose and analyze a multi-layer extension of CSC, shown to be tightly connected to CNN
![Page 89: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/89.jpg)
This Talk
89
Extension of the classical sparse theory to a multi-layer setting
Multi-Layer Convolutional Sparse Coding
Independent patch-processing
Local Sparsity
Convolutional Sparse Coding
Convolutional Neural
Networks
A novel interpretation and theoretical understanding of CNN
The ML-CSC was shown to enable a theoretical study of CNN, along with new insights
![Page 90: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/90.jpg)
This Talk
90
Extension of the classical sparse theory to a multi-layer setting
Multi-Layer Convolutional Sparse Coding
Independent patch-processing
Local Sparsity
Convolutional Sparse Coding
Convolutional Neural
Networks
A novel interpretation and theoretical understanding of CNN
The underlying idea: Modeling the data source
in order to be able to theoretically analyze
algorithms’ performance
![Page 91: A Quest for a Universal Model for Signals: From Sparsity ...yromano/presentations/From... · Exploiting self-similarities [Ram & Elad ‘13] [Romano, Protter & Elad ‘14] A multi-scale](https://reader030.fdocuments.in/reader030/viewer/2022041010/5eb6f90d862b8620110a92ef/html5/thumbnails/91.jpg)
Questions?
91