Convolution Neural Nets meet PDE's
Transcript of Convolution Neural Nets meet PDE's
![Page 1: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/1.jpg)
Convolution Neural Nets meetPDE’s
Eldad HaberLars Ruthotto
SIAM CS&E 2017
![Page 2: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/2.jpg)
I Convolution Neural Networks (CNN)
I Meet PDE’s
I Optimization
I Multiscale
I Example
I Future work
![Page 3: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/3.jpg)
CNN - A quick overviewI Neural Networks with a particular architecture
I Exist for a long time (90’s), (Lecun, Hinton)
I For large amounts of data can perform very wellI Applications
I Image classificationI Face recognitionI SegmentationI Driverless carsI ...
I A few recent quotes:I Apple Is Bringing the AI Revolution to Your
iPhone, WIRED 2016I Why Deep Learning Is Suddenly Changing Your
Life, FORTUNE 2016
![Page 4: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/4.jpg)
CNN - A quick overviewI Neural Networks with a particular architecture
I Exist for a long time (90’s), (Lecun, Hinton)
I For large amounts of data can perform very wellI Applications
I Image classificationI Face recognitionI SegmentationI Driverless carsI ...
I A few recent quotes:I Apple Is Bringing the AI Revolution to Your
iPhone, WIRED 2016I Why Deep Learning Is Suddenly Changing Your
Life, FORTUNE 2016
![Page 5: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/5.jpg)
CNN - A quick overview
ResNet architectureI Propagation: for j = 1, . . . ,N
Yj+1 = Yj + hσ(YjKj + bj)
I Classificationc ≈WYN
![Page 6: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/6.jpg)
CNN - A quick overview
Yj+1 = Yj + hσ(YjKj + bj) c ≈WYN
I Y1 - data c - class
I Kj - convolution kernel
I bj - bias
I W - Classifier
Forward: Given an image y1 find its class c
Inverse (training): Given data and classes {Y1, c}obtain Kj , bj and W that approximately classify thegiven data
![Page 7: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/7.jpg)
CNN - The optimization problem
Define a similarity measure S
minKj ,bj ,W
S(c,WYN)
s.t Yj+1 = Yj + hσ(YjKj + bj)
I How to solve the problem efficiently
I How to change scales
I Add robustness
![Page 8: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/8.jpg)
CNN as ODE
Yj+1 = Yj+hσ(YjKj+bj) ↔ Y = σ(YK(t)+b(t))
Path planning: Find different path’s for differentclasses
![Page 9: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/9.jpg)
Convolution and PDE’s1D convolution
Ky = [K1,K2,K3] ∗ [y1, . . . , yn]
Change the basis
Ky = s1 ∗ [0, 1, 0] +s22h∗ [−1, 0, 1] +
s3h2
[1,−2, 1]
y - a discretization (grid function) of y(x)At the limit h→ 0
Ky ≈ s1y + s2dy
dx+ s3
d2y
dx2
![Page 10: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/10.jpg)
CNN - continuous formulation
minsj(t),b(t),W
S(c,WYN)
s.t Y = σ
∑j
sj(t)DjY + b(t)
Dj - differential operators
Continuous formulation - independent of resolution
Guide to move between scales and add layers
![Page 11: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/11.jpg)
CNN - optimization
I Common - stochastic gradient descent and itsvariance
I Recent work on stochastic Newton (Nocedal’stalk on Fri)
I We useI stochastic GN with very large sample size - SAAI Variable projection for convex variables
![Page 12: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/12.jpg)
Computational bottleneck
Computing YK− y>1 −− ... −− ... −− ... −− y>s −
K
y>i training imageFor large images or 3D computationally expensive
Idea: Train on coarse mesh
![Page 13: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/13.jpg)
Multiscale learning
Restrict the images n timesInitialize the convolution kernels and classifiersfor k = n : −1 : 1 do
Solve the optimization problem on mesh kfrom its initial pointProlong the convolution kernel to level k − 1Update the classifier weights
end for
How to prolong the kernels?
![Page 14: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/14.jpg)
Moving Kernels between scales
Use multigrid methodology
Approach 1 rediscretization
The Kernel represents a differential operator. Findthe operator and rediscretize.Example in 1D
I Convolution kernel [−1 − 2 1] h = 1
I Operator: −2 + 2 ddx − 0.5 d2
dx2
I On a mesh size h = 2 the kernel is[−1 1 − 0.25]
![Page 15: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/15.jpg)
Moving Kernels between scales
Use multigrid methodology
Approach 2 Galerkin
I In MG we are usually given the fine meshoperator Kh
I The coarse mesh operator is defined as
KH = RKhP
R - restriction P - prolongation
![Page 16: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/16.jpg)
Moving Kernels between scales - Galerkin
KH = RKhP
Fine → Coarse, but not coarse to fine
Coarse → Fine - in general, non unique
Assume the same sparsity on fine grid leads to aunique solution
Can be computed solving a small linear system (sizeof stencil)
![Page 17: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/17.jpg)
2D example
Use MNIST library of hand writing
∗
∗
pro
lon
gat
ion
rest
rict
ion
fine convolution
coarse convolution
Kh =
−0.89 −2.03 4.30−2.07 0.00 −2.074.39 −2.03 1.28
KH =
−0.48 −0.17 0.82−0.15 −0.80 0.370.84 0.40 0.07
.
![Page 18: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/18.jpg)
Training MNISTMNIST is a data set of labeled images of handwritten digits
Goal - automatic classification
Small images 28× 28 toy data set
![Page 19: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/19.jpg)
Training MNIST - Experiment 1
Train on fine - classify on coarse
In some cases train using large computationalresources
In the ”field” poor resources - classify on coarsemesh
82% success using fine mesh kernels on coarse mesh
![Page 20: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/20.jpg)
Training MNIST - Experiment 1confusion matrix
correctly classified as ”83089517”
incorrectly classified as ”69599339”
![Page 21: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/21.jpg)
Experiment 2: multiscale
1 101 100.050.100.150.20
coarse meshfine mesh
iterationsobje
ctiv
efu
ncti
on
Multilevel convergence
![Page 22: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/22.jpg)
Experiment 2: multiscale
1 101 1080859095
100
coarse meshfine mesh
iterations
accu
racy
[%]
Classification accuracy for fine-level and multi-levellearning.
![Page 23: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/23.jpg)
Summary
I Proposed a new interpretation to CNN as aPDE optimization problem
I Can move between scales
I Optimization algorithms based on Newton
I Preliminary results on MNIST
To come
I More about optimization
I Depth of network and time
I Adaptive mesh refinement
![Page 24: Convolution Neural Nets meet PDE's](https://reader030.fdocuments.in/reader030/viewer/2022012714/61acb25da1862739c2362f04/html5/thumbnails/24.jpg)
An addCall for US students and postdocs from Iran,Iraq, Syria, Yemen, Somalia, Libya and SudanIf you are a student or a postdoc
1. Admitted or studying at a major US graduateschool
2. Working in the field of scientificcomputing/computational science andengineering with interest in numericaloptimization, partial differential equations ormachine learning
3. Would like to pursue a PhD degree at theUniversity of British Columbia
I would love to hear from you.