現代システム科学とは · f599-8531 1 1 tel 072-254-7361 2014.07 osaka prefecture university
LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and...
Transcript of LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and...
![Page 1: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/1.jpg)
LIMITING THE DEEP LEARNINGYIPING LU PEKING UNIVERSITY
![Page 2: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/2.jpg)
DEEP LEARNING IS SUCCESSFUL, BUT…
GPU
PHD
Professor
PHD
![Page 3: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/3.jpg)
LARGER THE BETTER?
Test error
decreasing
![Page 4: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/4.jpg)
LARGER THE BETTER?
Test error
decreasing
Why Not Limiting To 𝒑
𝒏= +∞
![Page 5: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/5.jpg)
TWO LIMITING
Depth Limiting
Wid
th L
imit
![Page 6: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/6.jpg)
DEPTH LIMITING: ODE
[He et al. 2015]
[E. 2017] [Haber et al. 2017] [Lu et al. 2017]
[Chen et al. 2018]
![Page 7: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/7.jpg)
BEYOND FINITE LAYER NEURAL NETWORK
Going into
infinite layer
Differential Equation
![Page 8: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/8.jpg)
ALSO THEORETICAL GUARANTEED
![Page 9: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/9.jpg)
TRADITIONAL WISDOM IN DEEP LEARNING
?
![Page 10: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/10.jpg)
BECOMING HOT NOW…
NeurIPS2018 Best Paper Award
![Page 11: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/11.jpg)
BRIDGING CONTROL AND LEARNING
Feedback is the
learning loss
Neural Network
Optimization
![Page 12: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/12.jpg)
BRIDGING CONTROL AND LEARNING
Feedback is the
learning loss
Neural Network
Optimization
Interpretability? PDE-Net ICML2018
![Page 13: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/13.jpg)
BRIDGING CONTROL AND LEARNING
Feedback is the
learning loss
Neural Network
Optimization
A better model?LM-ResNet ICML2018
DURR ICLR2019
Macaroon submitted
Interpretability? PDE-Net ICML2018
Chang B, Meng L, Haber E, et al. Reversible
architectures for arbitrarily deep residual neural
networks[C]//Thirty-Second AAAI Conference
on Artificial Intelligence. 2018.
Tao Y, Sun Q, Du Q, et al. Nonlocal Neural
Networks, Nonlocal Diffusion and Nonlocal
Modeling[C]//Advances in Neural Information
Processing Systems. 2018: 496-506.
![Page 14: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/14.jpg)
BRIDGING CONTROL AND LEARNING
Feedback is the
learning loss
Neural Network
Optimization
A better model?
New Algorithm?
LM-ResNet ICML2018
DURR ICLR2019
Macaroon submitted
YOPO Submitted
Interpretability? PDE-Net ICML2018
An W, Wang H, Sun Q, et al. A pid controller approach for stochastic
optimization of deep networks[C]//Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition. 2018: 8522-8531.
Chen T Q, Rubanova Y, Bettencourt J, et al. Neural ordinary differential
equations[C]//Advances in Neural Information Processing Systems.
2018: 6571-6583.
Li Q, Hao S. An optimal control approach to deep learning and
applications to discrete-weight neural networks[J]. arXiv preprint
arXiv:1803.01299, 2018.
Chang B, Meng L, Haber E, et al. Multi-level residual networks from
dynamical systems view[J]. arXiv preprint arXiv:1710.10348, 2017.
Chang B, Meng L, Haber E, et al. Reversible
architectures for arbitrarily deep residual neural
networks[C]//Thirty-Second AAAI Conference
on Artificial Intelligence. 2018.
Tao Y, Sun Q, Du Q, et al. Nonlocal Neural
Networks, Nonlocal Diffusion and Nonlocal
Modeling[C]//Advances in Neural Information
Processing Systems. 2018: 496-506.
![Page 15: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/15.jpg)
HOW DIFFERENTIAL EQUATION VIEW HELPS
NEURAL NETWORK DESIGNINGPRINCIPLED NEURAL ARCHITECTURE DESIGN
![Page 16: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/16.jpg)
NEURAL NETWORK AS SOLVING ODES
Dynamic System Nueral Network
Continuous limit Numerical Approximation
WRN, ResNeXt, Inception-ResNet, PolyNet, SENet etc…… :
New scheme to Approximate the right hand side term
Why not change the way to discrete u_t?
![Page 17: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/17.jpg)
MULTISTEP ARCHITECTURE?
𝒙𝒏+𝟏 = 𝒙𝒏 + 𝒇(𝒙𝒏)
𝒙𝒕 = 𝒇(𝒙)
![Page 18: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/18.jpg)
MULTISTEP ARCHITECTURE?
𝒙𝒏+𝟏 = 𝒙𝒏 + 𝒇(𝒙𝒏)
𝒙𝒕 = 𝒇(𝒙) 𝒙𝒏+𝟏 = (𝟏 − 𝒌𝒏)𝒙𝒏 + 𝒌𝒏𝒙𝒏−𝟏 + 𝒇(𝒙𝒏)Linear Multi-step Scheme
Linear Multi-step Residual Network
![Page 19: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/19.jpg)
MULTISTEP ARCHITECTURE?
𝒙𝒏+𝟏 = 𝒙𝒏 + 𝒇(𝒙𝒏)
𝒙𝒕 = 𝒇(𝒙) 𝒙𝒏+𝟏 = (𝟏 − 𝒌𝒏)𝒙𝒏 + 𝒌𝒏𝒙𝒏−𝟏 + 𝒇(𝒙𝒏)Linear Multi-step Scheme
Linear Multi-step Residual Network
Only One More
Parameter
![Page 20: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/20.jpg)
EXPERIMENT
![Page 21: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/21.jpg)
PLOT THE MOMENTUM
Analysis by zero stability
![Page 22: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/22.jpg)
MODIFIED EQUATION VIEW
𝟏 + 𝒌𝒏 ሶ𝒖 + 𝟏 − 𝒌𝒏𝚫𝒕
𝟐ሷ𝒖𝒏 + 𝒐 𝚫𝒕𝟑 = 𝒇(𝒖)
Learn A Momentum
𝒙𝒏+𝟏 = (𝟏 − 𝒌𝒏)𝒙𝒏+𝒌𝒏𝒙𝒏−𝟏 + 𝚫𝐭𝒇(𝒙𝒏)
[Su et al. 2016] [Dong et al. 2017]
![Page 23: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/23.jpg)
UNDERSTANDING DROPOUT
Noise can avoid overfit? ሶ𝑋 𝑡 = 𝑓 𝑋 𝑡 , 𝑎 𝑡 + 𝑔(𝑋 𝑡 , 𝑡)𝑑𝐵𝑡 , 𝑋 0 = 𝑋0
The numerical scheme is only need
to be weak convergence!
𝑬𝒅𝒂𝒕𝒂(𝑙𝑜𝑠𝑠(𝑋 𝑇 )
![Page 24: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/24.jpg)
STOCHASTIC DEPTH AS AN EXAMPLE
𝒙𝒏+𝟏 = 𝒙𝒏 + 𝜼𝒏𝒇 𝒙
= 𝒙𝒏 + 𝑬𝜼𝒏𝒇 𝒙𝒏 + 𝜼𝒏 − 𝑬𝜼𝒏 𝒇(𝒙𝒏)
![Page 25: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/25.jpg)
Natural Language
![Page 26: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/26.jpg)
UNDERSTANDING NATURAL LANGUAGE PROCESSING
Using a Neural Network to extract the feature of document!
Emb Emb Emb Emb Emb
Deep Neural Network
y1 y2 y3 y4 y5
Idea: Consider every word in a document as a particle in the n-body system.
![Page 27: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/27.jpg)
TRANSFORMER AS A SPLITTING SCHEME
FFN: advection
Self-Attention: particle interaction
Label
[Vaswani et al.2017]
[Devlin et al.2017]
![Page 28: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/28.jpg)
A BETTER SPLITTING SCHEME
![Page 29: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/29.jpg)
RESULT
![Page 30: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/30.jpg)
Computer Vision
![Page 31: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/31.jpg)
ONE NOISE LEVEL ONE NET
Network 1 𝝈 = 𝟐𝟓
![Page 32: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/32.jpg)
ONE NOISE LEVEL ONE NET
Network 1 𝝈 = 𝟐𝟓
Network 2 𝝈 = 𝟑𝟓
![Page 33: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/33.jpg)
WE WANT
One Model
![Page 34: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/34.jpg)
WE ALSO WANT GENERALIZATION
BM3D DnCNN
![Page 35: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/35.jpg)
RETHINKING TRADITIONAL FILTERING APPROACH
Perona-Malik Equation
input
processing
![Page 36: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/36.jpg)
RETHINKING TRADITIONAL FILTERING APPROACH
Perona-Malik Equation
input
processing
Noisy
![Page 37: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/37.jpg)
RETHINKING TRADITIONAL FILTERING APPROACH
Perona-Malik Equation
input
processing
Noisy
![Page 38: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/38.jpg)
RETHINKING TRADITIONAL FILTERING APPROACH
Perona-Malik Equation
input
processing
Over-smoothNoisy
![Page 39: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/39.jpg)
MOVING ENDPOINT CONTROL VS FIXED ENDPOINT CONTROL
Control dynamic:
Physic Law
𝝉 is the time
arrives the moon
minන0
𝜏
𝑅 𝑤 𝑡 , 𝑡 𝑑𝑡
𝑠. 𝑡. ሶ𝑋 = 𝑓 𝑋 𝑡 , 𝑤 𝑡 ,
𝜏 is the first time that dynamic meets X
![Page 40: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/40.jpg)
DURR
![Page 41: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/41.jpg)
GENERALIZE TO UNSEEN NOISE LEVEL
DURR
DnCNN
![Page 42: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/42.jpg)
Physics
![Page 43: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/43.jpg)
PHYSICS DISCOVERY
Our Work
![Page 44: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/44.jpg)
CONV FILTERS AS DIFFERENTIAL OPERATORS
1
1 -4 1
1
Δ𝑢 = 𝑢𝑥𝑥 + 𝑢𝑦𝑦
![Page 45: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/45.jpg)
PDE-NET
![Page 46: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/46.jpg)
PDE-NET
![Page 47: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/47.jpg)
PDE-NET 2.0
- Constrain the function space
- Theoretical Recover Guarantee(Coming Soon)
- Symbolic Discovery
![Page 48: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/48.jpg)
ON GOING WORK
Shock Wave Architecture SearcHYufeiWang, Ziju Shen, Zichao Long, Bin Dong Learning to
Solve Conservation Laws
![Page 49: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/49.jpg)
HOW DIFFERENTIAL EQUATION VIEW HELPS
OPTIMIZATIONCONSIDERING THE NEURAL NETWORK STRUCTURE
![Page 50: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/50.jpg)
RELATED WORKS
Maximal Principle [Li et al. 2018a] [li et al.2018b]
Adjoin Method [Chen et al.2018]
Multigrid [Chang et al.2017]
Domain decomposition
Our Work: ODE captures the composition structure of neural network!
![Page 51: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/51.jpg)
TOWARDS DEEP LEARNING MODELS RESISTANT TO ADVERSARIAL
ATTACKS
Problem:
- More capacity and stronger adversaries decrease transferability. Always 10 times wider
- PGD training is expansive!
Can adversarial
training cheaper
Robust Optimization
![Page 52: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/52.jpg)
TAKE NEURAL NETWORK ARCHITECTURE INTO CONSIDERATION
Adversary updater
Black box
Previous Work
Heavy gradient calculation
![Page 53: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/53.jpg)
TAKE NEURAL NETWORK ARCHITECTURE INTO CONSIDERATION
Adversary updater
Adversary updater
Black box
Previous Work
YOPO
Heavy gradient calculation
![Page 54: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/54.jpg)
DIFFERENTIAL GAME
Player1
Player2Trajectory
![Page 55: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/55.jpg)
DIFFERENTIAL GAME
Goal
Player1
Player2Trajectory
![Page 56: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/56.jpg)
DIFFERENTIAL GAME
Trajectory
Goal
Player1
Player2
Network
Layer2Network
Layer1
Composition
Structure
![Page 57: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/57.jpg)
DECOUPLE TRAINING
Synthetic gradients [Jaderberg et al.2017]
Lifted Neural Network [Askari et al.2018] [Gu et al.2018] [li et al.2019]
Delayed Gradient [Huo et al.2018]
Block Coordinate Descent Approach [Lau et al. 2018]
Can Control perspective helps us to understand decoupling?
Our idea: Decouple the gradient back propagation with the adversary updating.
![Page 58: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/58.jpg)
𝒑𝒔𝒕
𝒙𝒔𝒕
First Iteration
𝒑𝒔𝟏
YOPO(YOU ONLY PROPAGATE ONCE)
![Page 59: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/59.jpg)
𝒑𝒔𝒕
𝒙𝒔𝒕
𝒑𝒔𝟏
First Iteration Other Iteration
copy
𝒑𝒔𝟏
YOPO(YOU ONLY PROPAGATE ONCE)
![Page 60: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/60.jpg)
𝒑𝒔𝒕
𝒙𝒔𝒕
𝒑𝒔𝟏
First Iteration Other Iteration
copy
𝒑𝒔𝟏
YOPO(YOU ONLY PROPAGATE ONCE)
![Page 61: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/61.jpg)
WHY DECOUPLING
![Page 62: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/62.jpg)
RESULT
![Page 63: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/63.jpg)
SO, WHAT IS INFINITE WIDE NEURAL NETWORK
Neural Network As particle system Linearized Neural Network
Rotskoff G M, Vanden-Eijnden E. Neural networks as interacting particle
systems: Asymptotic convexity of the loss landscape and universal
scaling of the approximation error[J]. arXiv preprint arXiv:1805.00915,
2018.
Mei S, Montanari A, Nguyen P M. A mean field view of the landscape of
two-layer neural networks[J]. Proceedings of the National Academy of
Sciences, 2018, 115(33): E7665-E7671.
Infinite wide neural network=Kernel Method
Radom Feature + Neural Tangent Kernel
Jacot A, Gabriel F, Hongler C. Neural tangent kernel: Convergence and
generalization in neural networks[C]//Advances in neural information
processing systems. 2018: 8571-8580.
Du S S, Lee J D, Li H, et al. Gradient descent finds global minima of deep
neural networks[J]. arXiv preprint arXiv:1811.03804, 2018.
Allen-Zhu Z, Li Y, Song Z. A convergence theory for deep learning via
over-parameterization[J]. arXiv preprint arXiv:1811.03962, 2018.
![Page 64: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/64.jpg)
SO, WHAT IS INFINITE WIDE NEURAL NETWORK
Neural Network As particle system Linearized Neural Network
Rotskoff G M, Vanden-Eijnden E. Neural networks as interacting particle
systems: Asymptotic convexity of the loss landscape and universal
scaling of the approximation error[J]. arXiv preprint arXiv:1805.00915,
2018.
Mei S, Montanari A, Nguyen P M. A mean field view of the landscape of
two-layer neural networks[J]. Proceedings of the National Academy of
Sciences, 2018, 115(33): E7665-E7671.
Infinite wide neural network=Kernel Method
Radom Feature + Neural Tangent Kernel
Jacot A, Gabriel F, Hongler C. Neural tangent kernel: Convergence and
generalization in neural networks[C]//Advances in neural information
processing systems. 2018: 8571-8580.
Du S S, Lee J D, Li H, et al. Gradient descent finds global minima of deep
neural networks[J]. arXiv preprint arXiv:1811.03804, 2018.
Allen-Zhu Z, Li Y, Song Z. A convergence theory for deep learning via
over-parameterization[J]. arXiv preprint arXiv:1811.03962, 2018.
![Page 65: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/65.jpg)
TWO DIFFERENTIAL EQUATION FOR DEEP LEARNING
Feature Space
Data Space
Neural ODE
Cla
ssifie
r
Nonlocal PDE
Dog
Cat
Evolving Kernel
![Page 66: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/66.jpg)
WHAT FEATURE DOES NONLOCAL PDE CAPTURES
![Page 67: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/67.jpg)
WHAT FEATURE DOES NONLOCAL PDE CAPTURES
![Page 68: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/68.jpg)
WHAT FEATURE DOES NONLOCAL PDE CAPTURES
[Osher et al.2016]
Harmonic Equation -> Dimensionality
![Page 69: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/69.jpg)
WHAT FEATURE DOES NONLOCAL PDE CAPTURES
[Osher et al.2016]
Harmonic Equation -> Dimensionality
Dimension optimization is not
enough
![Page 70: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/70.jpg)
WHAT FEATURE DOES NONLOCAL PDE CAPTURES
[Osher et al.2016]
Harmonic Equation -> Dimensionality
Dimension optimization is not
enough
Biharmonic Equation -> Curvature
![Page 71: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/71.jpg)
SEMI-SUPERVISED LEARNING
![Page 72: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/72.jpg)
IMAGE INPAINTING
Weighted Nonlocal Laplacian
PSNR=23.51dB,SSIM=0.62
Weighted Curvature Regularzation
PSNR=24.07dB,SSIM=0.68
![Page 73: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/73.jpg)
IMAGE INPAINTING
![Page 74: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/74.jpg)
GOING INTO NEURAL!
![Page 75: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/75.jpg)
LOW FREQUENCY FIRST
Multigrid Method [Xu 1996]
Inverse Scale Space [Scherzer et al. 2001] [Burfer et al. 2006]
Ridge Regression [Smale et al. 2007] [Yao et al. 2007]
Experiment In Neural Network [Rahaman et al.2018] [Xu et al 2019]
![Page 76: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/76.jpg)
TEACHER STUDENT OPTIMIZATION
Train
![Page 77: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/77.jpg)
TEACHER STUDENT OPTIMIZATION
Train
![Page 78: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/78.jpg)
TEACHER STUDENT OPTIMIZATION
Train
Train
![Page 79: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/79.jpg)
TEACHER STUDENT OPTIMIZATION
Train
Train
Dark Knowledge
![Page 80: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/80.jpg)
NO EARLY STOPPING, NO DISTILLATION
Teach
er te
st accu
racy
![Page 81: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/81.jpg)
NO EARLY STOPPING, NO DISTILLATION
Teach
er te
st accu
racy
![Page 82: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/82.jpg)
NO EARLY STOPPING, NO DISTILLATION
Teach
er te
st accu
racy
![Page 83: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/83.jpg)
NO EARLY STOPPING, NO DISTILLATION
Teach
er te
st accu
racy
![Page 84: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/84.jpg)
NO EARLY STOPPING, NO DISTILLATION
Stu
den
t test a
ccu
racy
Teach
er te
st accu
racy
![Page 85: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/85.jpg)
NO EARLY STOPPING, NO DISTILLATION
Stu
den
t test a
ccu
racy
Teach
er te
st accu
racy
![Page 86: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/86.jpg)
LABEL DENOISING
NOISY LABLEinterpolate
NOISY LABLEinterpolate
NOISY LABLEinterpolate
![Page 87: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/87.jpg)
THEORY CAN GUARANTEE OUR METHOD!
Separate
Clusterable Dataset
…
Width enough neural netwrok
![Page 88: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/88.jpg)
THEORY CAN GUARANTEE OUR METHOD!
Separate
Clusterable Dataset
…
Width enough neural netwrok
Ensures margin
![Page 89: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/89.jpg)
LABEL DENOISING
We use a extremely large
network for experiment!
![Page 90: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/90.jpg)
FUTURE WORK: META-LEARING
Aggregating AIR and GT Label
[Tanaka et al.2018] [Sun et al. 2018]
[Han et al.2018] [Han et al.2019]
![Page 91: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/91.jpg)
FUTURE WORK: META-LEARING
Aggregating AIR and GT Label
[Tanaka et al.2018] [Sun et al. 2018]
[Han et al.2018] [Han et al.2019]
Meta-learner
Aggregated signal
Supervise Signal: Validation set Robustness
![Page 92: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/92.jpg)
NOISY LABEL LEADS TO ADVERSARIAL EXAMPLE?
![Page 93: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/93.jpg)
NOISY LABEL LEADS TO ADVERSARIAL EXAMPLE?
![Page 94: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/94.jpg)
NOISY LABEL LEADS TO ADVERSARIAL EXAMPLE?
initialization
Good Lip Constant
![Page 95: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/95.jpg)
NOISY LABEL LEADS TO ADVERSARIAL EXAMPLE?
initialization
Clean label training
Good Lip Constant
![Page 96: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/96.jpg)
NOISY LABEL LEADS TO ADVERSARIAL EXAMPLE?
initialization
Clean label training
Noisy label training
Bad Lip Constant
Good Lip Constant
![Page 97: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/97.jpg)
TAKE HOME MESSAGE
Feature Space
Data Space
Neural ODE
Cla
ssifie
r
Nonlocal PDE
Dog
Cat
Evolving Kernel
Optimization
Architecture
Geometry Feature
Implicit
Regularization
Depth Limit Width Limit
![Page 98: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/98.jpg)
RESEARCH INTEREST
Geometry Of Data
Control View Of Deep Learning
Kernel Learning
PDEs On Graphs
Learning with limited data(noisy, semi-supervise, weak supervise)
Computational Photography, Image Processing, Graphics.
![Page 99: LIMITINGTHE DEEP LEARNING - Stanford Universityyplu/limitingDeepLearning.pdfon Computer Vision and Pattern Recognition. 2018: 8522-8531. Chen T Q, Rubanova Y, Bettencourt J, et al.](https://reader030.fdocuments.in/reader030/viewer/2022040903/5e752c720094d922056b1b71/html5/thumbnails/99.jpg)
THANK YOU AND QUESTIONS?
Always looking forward to
cooperation opportunities
Contact: [email protected]
Lu Y, Zhong A, Li Q, Dong B. Beyond Finite Layer Neural Networks: Bridging Deep
Architectures and Numerical Differential Equations arXiv:1710.10121. ICML2018
Long Z, Lu Y, Ma X, Dong B. PDE-Net: Learning PDEs from Data arXiv:1710.09668.
ICML2018
Zhang S, Lu Y, Liu J, Dong B. Dynamically Unfolding Recurrent Restorer: A Moving
Endpoint Control Method for Image Restoration arXiv:1805.07709. ICLR2019
Long Z, Lu Y, Dong B. " PDE-Net 2.0: Learning PDEs from Data with A Numeric-
Symbolic Hybrid Deep Network“arXiv:1812.04426. Major Revision JCP.
Dinghuai Zhang, Tianyuan Zhang, Yiping Lu, Zhanxing Zhu, Bin Dong. You Only
Propogate Once: Accelerating Adversarial Training via Maximal Principle
arXiv:1905.00877
Yiping Lu, Di He, Zhuohan Li, Zhiqing Sun, Bin Dong, Tao Qin, Liwei Wang, Tie-
yan Liu. OreoNet: Understanding NLP from a neural ODE viewpoint. (Submitted)
Bin Dong, Haochen Ju, Yiping Lu, Zuoqiang Shi. CURE: Curvature Regularization
For Missing Data Recovery arXiv preprint arXiv:1901.09548, 2019.
Bin Dong, Jikai Hou, Yiping Lu, Zhihua Zhang. Distillation $\approx$ Early Stopping?
Extracting Knowledge Utilizing Anisotropic Information Retrieval.(Submitted)
Aoxiao Zhong Zhiqing Sun Zhanxing Zhu
Acknowledgement
Bin Dong Di He Xiaoshuai Zhang Liwei WangZhuohan LiJikai Hou
Tianyuan Zhang
Dinghuai Zhang
Quanzheng Li Zhihua Zhang