Learning neural networks with hidden layers · 2018-05-31 · 4 STAT/CSE 416: Intro to Machine...
Transcript of Learning neural networks with hidden layers · 2018-05-31 · 4 STAT/CSE 416: Intro to Machine...
STAT/CSE 416: Intro to Machine Learning
Learning neural networks with hidden layers
©2017 Emily Fox
OPTIONAL
STAT/CSE 416: Intro to Machine Learning2
Optimizing a single-layer neuron
We train to minimize sum of squared errors:
Taking gradients:
`(w) =1
2
X
i
[yi � g(w0 +X
j
wjxi[j])]2
@`
@wj= �
X
i
[yi � g(w0 +X
j
wjxi[j])]@
@wjg(w0 +
X
j
wjxi[j])@`
@wj= �
X
i
[yi � g(w0 +X
j
wjxi[j])]xi[j]g0(w0 +
X
j
wjxi[j])
Solution just depends on g’: derivative of activation function!
©2017 Emily Fox
x[1]
x[2]
x[d]
Σ
…
1
STAT/CSE 416: Intro to Machine Learning3
Forward propagation1-hidden layer:
Compute values left to right 1. Inputs: x[1],…,x[d]2. Hidden: v[1],…,v[d]3. Output: y
©2017 Emily Fox
out(x) = g(w0 +X
k
wkg(wk0 +
X
j
wkj x[j]))
For fixed weights, forming predictions is easy!
v[1]
v[2]
x[1]
x[2]
1
y
1
STAT/CSE 416: Intro to Machine Learning4
Multilayer neural networks
©2017 Emily Fox
Inference and Learning• Forward pass:
left to right, each hidden layer in turn
• Gradient computation:right to left, propagating gradient for each node
Forward
Gradient
STAT/CSE 416: Intro to Machine Learning5
Back-propagation – Learning
• Just gradient descent!!! • Recursive algorithm for computing gradient• For each example- Perform forward propagation - Start from output layer
• Compute gradient of node v[k] with parents u[1],u[2],…:• Update weight • Repeat (move to preceding layer)
©2017 Emily Fox
wkj
STAT/CSE 416: Intro to Machine Learning6
Convergence of backprop
Multilayer neural nets not convex- Gradient descent gets stuck in local minima
©2017 Emily Fox
STAT/CSE 416: Intro to Machine Learning
Deep features:
Deep learning+
Transfer learning
©2018 Emily Fox
STAT/CSE 416: Intro to Machine Learning8
Transfer learning: Use data from one task to help learn on another
Lots of data:Learn
neural netGreat accuracy
on cat v. dog
Some data: Neural net as feature extractor
+
Simple classifier
Great accuracy on 101
categories
Old idea, explored for deep learning by Donahue et al. ’14 & others
©2018 Emily Fox
vs.
STAT/CSE 416: Intro to Machine Learning9
What’s learned in a neural net
©2018 Emily Fox
Very specific to Task 1
Should be ignored for other tasks
More genericCan be used as feature extractor
vs.
Neural net trained for Task 1: cat vs. dog
STAT/CSE 416: Intro to Machine Learning10
Transfer learning in more detail…
©2018 Emily Fox
Very specific to Task 1
Should be ignored for other tasks
More genericCan be used as feature extractor
For Task 2, predicting 101 categories, learn only end part of neural net
Use simple classifiere.g., logistic regression, SVMs, nearest neighbor,…
Class?Keep weights fixed!
Neural net trained for Task 1: cat vs. dog