Deep Learning: AI Breakthrough
-
Upload
mohsen-fayyaz -
Category
Science
-
view
73 -
download
2
Transcript of Deep Learning: AI Breakthrough
![Page 1: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/1.jpg)
Deep Learning: AI Breakthrough
Mohsen Fayyaz
Sensifai
Tehran University – 15 Dey 1395 (4 Jan 2017)
![Page 2: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/2.jpg)
Video Processing and Deep Learning
![Page 3: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/3.jpg)
What is Video?
• Batches of Frames• Can we process video as batches of frames?
Motion cannot be inferred from single frame
![Page 4: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/4.jpg)
Why do we need video processing?
• Self-Driving Cars: Video Semantic Segmentation
Feature Space Optimization for Semantic Video Segmentation, Kundu et. al., 2016
![Page 5: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/5.jpg)
Why do we need video processing?
• Robots: Action Recognition
Simonyan et. al., 2014
![Page 6: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/6.jpg)
Why do we need video processing?
• Google, YouTube, Aparat : Video Tagging
Densecap, Johnson et. al., 2016 (Image captioning)
![Page 7: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/7.jpg)
Why do we need video processing?
• Network Video Broadcasting: Frame Prediction
Patraucean et. al., 2016
![Page 8: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/8.jpg)
From Images to Video
3
Image
CNN
Extracted
FeaturesFrames
?
Extracted
Features
Image Video
![Page 9: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/9.jpg)
From Images to Video
CNN
Extracted Spatio-Temporal
FeaturesFrames
LSTM
Donahe et. al., 2015
![Page 10: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/10.jpg)
From Images to Video
CNN
Extracted Spatio-Temporal
FeaturesFrames
LSTM
Donahe et. al., 2015
What if we want regional
features?
![Page 11: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/11.jpg)
From Images to Video - STFCN
CNN
Extracted Regional Spatio-Temporal
FeaturesFrames
Convolutional LSTM
Fayyaz et. al., 2016
![Page 12: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/12.jpg)
From Images to Video – C3D
3D
CNN
Extracted Regional Spatio-Temporal
FeaturesFrames
Tran et. al., 2015
![Page 13: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/13.jpg)
Now that we have the appropriate toolLet’s see some real world applications
![Page 14: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/14.jpg)
Video Semantic Segmentation - STFCN
Fayyaz et. al., 2016
![Page 15: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/15.jpg)
Video Semantic Segmentation – C3D
Tran et. al., 2015
![Page 16: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/16.jpg)
Action Recognition & Video Classification
Simonyan et. al., 2014
![Page 17: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/17.jpg)
Does video have visual data only?
![Page 18: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/18.jpg)
Action Recognition & Video Classification
Wu et al., 2015
Audio
+
Vision
![Page 19: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/19.jpg)
Let’s briefly take a look at some state-of-the-art Image based Networks
![Page 20: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/20.jpg)
Extremely Deep Networks
Residual Networks
• Problem: Gradients Vanish in Back-propagation
• Solution: Let’s make a shortcut for them!
• Y = 𝐻(𝑋,𝑊𝐻) -> Y = 𝐻 𝑋,𝑊𝐻 + 𝑋
![Page 21: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/21.jpg)
Extremely Deep Networks
Highway Networks
• Similar to ResNets
• The shortcuts are controlled using a learnable parameter to
have a better trade-off between being
• Y = 𝐻 𝑋,𝑊𝐻 . 𝑇 𝑋,𝑊𝑇 + 𝑋. (1 − 𝑇 𝑋,𝑊𝑇 )
![Page 22: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/22.jpg)
Extremely Deep Networks
DenseNets
• If ResNet works with just connecting previous layers, why
not connecting all?!
• 𝑌 = 𝐹(𝑋𝑛, 𝑋𝑛−1, …, 𝑋0)• Improvements in both Forward &
• Backward
![Page 23: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/23.jpg)
Now what if we use the idea of propagating data and gradients between shallow and
deep layers in video based networks?
![Page 24: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/24.jpg)
Up to here everything was SupervisedBut there are bunch of data across the
Internet with weak labels …Let’s go through Weakly-Supervised
methods
![Page 25: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/25.jpg)
Weakly Supervised Learning
Weakly Supervised Learning with CNNs
• Multiple Labeling
• Weakly Localization
• Data can be crawled
over Internet• Can be adopted to Video
Oquab et. al., 2015
![Page 26: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/26.jpg)
How about some Unsupervised methods …
![Page 27: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/27.jpg)
Unsupervised Learning
Anticipating Visual Representations From Unlabeled Video• Training on Big Huge amount of unlabeled Video across the net
• Training Classifiers on the final output
Vondrick et. al., 2016
![Page 28: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/28.jpg)
Practical considerations
![Page 29: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/29.jpg)
What Hardware do I use?
• NVIDIA GPU + SSD + HDD
• More info on:http://www.DeepLearning.ir
![Page 30: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/30.jpg)
What framework do I use?
Caffe
Torch
Tensorflow
Theano
Keras
Microsoft CNTK
Deeplearning4j
…
![Page 31: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/31.jpg)
What framework do I use?
Tensorflow Torch Theano
From Karpathy’s slides
![Page 32: Deep Learning: AI Breakthrough](https://reader033.fdocuments.in/reader033/viewer/2022042706/587f50851a28ab0d378b523b/html5/thumbnails/32.jpg)
Distributed Training:
Will be presented at my next presentation at Sharif University of Technology
on 22 Dey 1395 (11 Jan 2017)
From Karpathy’s slides