Thesis Presentation
-
Upload
reuben-feinman -
Category
Science
-
view
105 -
download
0
Transcript of Thesis Presentation
A Deep Belief Network
Approach to Learning Depth
from Optical Flow
Reuben Feinman1
Applied Mathematics Honors Thesis
by
Background
2
•Visual system of insects are exquisitely sensitive to motion
•Srinivasan et al 1989 showed that bees decipher the range of their targets by absolute motion and motion relative to the background
•Key idea: optical flow is important to navigation
Motion Parallax in the Dorsal Stream
Humans perceive depth rather precisely via motion parallax
• Motion is a powerful monocular cue to depth understanding
• Assists with interpretation of spatial relationships
• “Optical flow”: the motion information encoded in the visual system
3
source: opticflow.bu.edu
Deep Learning
4
•The mapping from motion to depth is highly nonlinear (Braunstein, 1976)•Great progress in deep learning; multiple layers of nonlinear processing, more complex input to output function
source: www.deeplearning.stanford.edu
Motion Information
Depth prediction
->->->->
-->
Computer Graphics•Need labeled training data; videos do not have ground truth depth
•Graphical scenes generated by a gaming engine provide large number of training samples for supervised learning
5
A scene excerpt from our CryEngine forest database
RGB frame
ground truth depth map
6
MT Motion Model • Hierarchical model of motion processing; alternate between template
matching and max pooling
• Convolutional learning of spatio-temporal features
• Extension of HMAX (Serre et al 2007)
Jhuang et al 2007
Population Responses
7
Dorsal velocity model outputs a motion energy feature map
•(# Speeds) x (# Directions) x Height x Width •In other words: Each pixel contains a feature vector X with (# Speeds) x (# Directions) dimensions
8
Deep Belief Networks
•MLP: fail•Lots of unlabeled data available; maybe we can exploit this data and extract deep hierarchical representations of our motion model outputs•Initialize network with feature detectors
source: http://deeplearning.net
The RBM Model
9
Maximum likelihood learning: update model parameters to maximize the likelihood of our training data
Standard RBM:
Gaussian-Bernoulli RBM:
P(v,h) = (1/Z)*exp(-E(v,h))
We then create a new “free energy” version which sums over all possible hidden states
P(v) = (1/Z)*exp(-F(v))
source: http://deeplearning.net
Justifying Greedy Layer-Wise Pre-Training
10
•We use a Markov chain with alternating Gibbs Samplingh’ ~ P(h | v = v)v’ ~ P(v | h = h’)
•Gibbs Sampling is guaranteed to reduce the KL divergence between the posterior distribution in a given layer and the model’s equilibrium distribution
Hinton et al 2006
The DBN
11
• The data: feature vectors have 72 elements, tuned to 9 different speeds and 8 directions (9*8 = 72)• DBN takes in 3x3 pixel window• 3 Hidden layers of 800 units; sigmoidal activation• Linear output layer
Technicalities:•Mini-batch training with batch size of 5000•Sparse initialization scheme•RMSprop learning rule (regularized mean squares)•Backpropagation fine-tuning with dropout, dropping 20% of units at each layer except for the input layer•Geometrically decaying learning rate (LR = 0.998*LR at each epoch)
Results
12
DBN Linear Regression Ground Truth
test set R2: 0.445 test set R2: 0.240
13
MLP (sparse initialization)
single-pixel linear
regression
3x3 window linear
regression
single-pixel DBN
3x3 window DBN
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0 1 2 3 4 5 6
R^2
Sco
reR^2 Score per Model
Markov Random Field SmoothingReceptive field can be a powerful tool for decoding
14
MRF defined by two potential functions:1) Φ = ∑_i [ (w • x_i − d_i) ^ 2 ]2) Ψ = ∑_<i,j> [ (d_i − d_j)^2 /( (d_i − d_j)^2 + 1) ) ]
(note: <i,j> = all neighboring pairs i,j)
P(d | x ; alpha, w) = (1/Z) * exp(− (alpha*Ψ + Φ)).Peter Orchard, University of Edinburgh
ground truth original prediction: 0.595 MRF prediction: 0.630
Drone Test
15
16
Future Work
• Increase pre-training dataset
• Real video labeled data with XBOX Kinect
• Down-sample motion features and ground truth
17
Thanks!
• Thomas Serre
• Stuart Geman
• David Mely
• Youssef Barhomi
18
Questions?
Normalizing the Data• Training a GB-RBM is hard; the distributions of spike firing rates have many
variations depending on the dataset
• We propose a normalized GB-RBM where the training data is normalized to zero mean and unit variance; all datasets thereafter (validation & test) are normalized with the same parameters
19
Dataset histograms before and after normalization