Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... –...

Transcript of Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... –...

Monte Carlo Methods and Neural Networks

Noah Gamboa and Alexander Keller

Neural Networks

Fully connected layers

⌅neurons

.

a0,0

a0,1

a0,2

a0,n

0

�1

.

a1,0

a1,1

a1,2

a1,n

1

�1

.

a2,0

a2,1

a2,2

a2,n

2

�1

.

aL,0

aL,1

aL,2

aL,nL�1

NVIDIA Confidential 2

Neural Networks

Fully connected layers

⌅neurons compute max{0,Âj wjai ,j}

.

a0,0

a0,1

a0,2

a0,n

0

�1

.

a1,0

a1,1

a1,2

a1,n

1

�1

.

a2,0

a2,1

a2,2

a2,n

2

�1

a2,1

.

aL,0

aL,1

aL,2

aL,nL�1

Page 4: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Monte Carlo Methods all over Neural Networks

Examples

⌅drop out

⌅drop connect

⌅stochastic binarization

⌅stochastic gradient descent

⌅fixed pseudo-random matrices for direct feedback alignment

⌅...

Page 5: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Observations

⌅the brain

–

about 10

11

nerve cells with to up to 10

4

connections to others

–

much more energy efficient than a GPU

⌅artificial neural networks

–

rigid layer structure

–

expensive to scale in depth

–

partially trained fully connected

⌅goal: explore algorithms linear in time and space

Observations

⌅the brain

–

about 10

11

4

–

Page 7: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Observations

⌅the brain

–

about 10

11

4

–

Page 8: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Partition instead of Dropout

Page 9: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Guaranteeing coverage of neural units

⌅so far: dropout neuron if threshold t > x

–

x by linear feedback register generator (for example)

⌅now: assign neuron to partition p = bx ·Pc out of P

–

less random number generator calls

–

all neurons guaranteed to be considered

LeNet on MNIST Average of t = 1/2 to 1/9 dropout Average of P = 2 to 9 partitions

Mean accuracy 0.6062 0.6057

StdDev accuracy 0.0106 0.009

Page 10: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

–

Page 11: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

–

Page 12: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Training accuracy with LeNet on MNIST

0 20 40 60 80 100 120 140

0.2

0.4

0.6

Epoch of Training

Accu

ra

cy

2 dropout partitions

1/2 dropout

Page 13: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Training accuracy with LeNet on MNIST

0 20 40 60 80 100 120 140

0.2

0.4

0.6

Epoch of Training

Accu

ra

cy

3 dropout partitions

1/3 dropout

Page 14: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Simulating Discrete Densities

Page 15: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Stochastic evaluation of scalar product

⌅discrete density approximation of the weights

1

0

–

remember to flip sign accordingly

–

transform jittered equidistant samples using cumulative distribution function of absolute value of weights

Page 16: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

1

0

–

Page 17: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

1

0

–

Page 18: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

1

0

–

Page 19: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

⌅partition of unit interval by sums Pk := Âk

j=1

|wj | of normalized absolute weights

0 = P0

< P1

< · · ·< Pm = 1

–

using a uniform random variable x 2 [0,1) we find

select neuron i , Pi�1

x < Pi satisfying Prob({Pi�1

x < Pi}) = |wi |–

⌅in fact derivation of quantization to weights in {�1,0,+1}–

integer weights if a neuron referenced more than once

–

explains why ternary and binary did not work in some articles

–

relation to drop connect and drop out, too

Page 20: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

j=1

0 = P0

< P1

< · · ·< Pm = 1

–

x < Pi}) = |wi |

–

Page 21: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

j=1

0 = P0

< P1

< · · ·< Pm = 1

–

x < Pi}) = |wi |–

–

Page 22: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

j=1

0 = P0

< P1

< · · ·< Pm = 1

–

x < Pi}) = |wi |–

–

Page 23: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Test accuracy for two layer ReLU feedforward network on MNIST

⌅able to achieve 97% of accuracy of model by sampling most important 12% of weights!

0 100 200 300 400 500 600

0.94

0.96

0.98

1

Number of Samples per Neuron

Te

st

Accu

ra

cy

Page 24: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Application to convolutional layers

⌅sample from distribution of filter (for example, 128x5x5 = 3200)

–

less redundant than fully connected layers

⌅LeNet Architecture on CIFAR-10, best accuracy is 0.6912

⌅able to get 88% of accuracy of full model at 50% sampled

Page 25: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Test accuracy for LeNet on CIFAR-10

0 500 1,000 1,500 2,000 2,500 3,000

0.4

0.6

Number of Samples per Filter

Te

st

Accu

ra

cy

Page 26: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Neural Networks linear in Time and Space

Page 27: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Number n of neural units

⌅for L fully connected layers

n =

L

Âl=1

nl

where nl is the number of neurons in layer l

⌅number of weights

nw =

L

Âl=1

nl�1

·nl

⌅choose number of weights per neuron such that n proportional to nw

–

for example, constant number nw of weights per neuron

Page 28: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

n =

L

Âl=1

nl

nw =

L

Âl=1

nl�1

·nl

–

Page 29: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

n =

L

Âl=1

nl

nw =

L

Âl=1

nl�1

·nl

–

Page 30: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Results

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.2

0.4

0.6

0.8

1

Percent of FC layers sampled

Te

st

Accu

ra

cy

LeNet on MNIST

LeNet on CIFAR-10

Page 31: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Test accuracy for AlexNet on CIFAR-10

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.2

0.4

0.6

0.8

1

Te

st

Accu

ra

cy

Page 32: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Test accuracy for AlexNet on ILSVRC12

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.2

0.4

0.6

0.8

1

Te

st

Accu

ra

cy

Top-5 Accuracy

Top-1 Accuracy

Page 33: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Sampling paths through networks

⌅complexity bounded by number of paths times depth

⌅strong indication of relation to Markov chains

⌅importance sampling by weights

Page 34: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

⌅sparse from scratch

.

a0,0

a0,1

a0,2

a0,n

0

�1

.

a1,0

a1,1

a1,2

a1,n

1

�1

.

a2,0

a2,1

a2,2

a2,n

2

�1

.

aL,0

aL,1

aL,2

aL,nL�1

–

guaranteed connectivity

Page 35: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

.

a0,0

a0,1

a0,2

a0,n

0

�1

.

a1,0

a1,1

a1,2

a1,n

1

�1

.

a2,0

a2,1

a2,2

a2,n

2

�1

a2,1

.

aL,0

aL,1

aL,2

aL,nL�1

–

Page 36: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

.

a0,0

a0,1

a0,2

a0,n

0

�1

.

a1,0

a1,1

a1,2

a1,n

1

�1

.

a2,0

a2,1

a2,2

a2,n

2

�1

.

aL,0

aL,1

aL,2

aL,nL�1

–

Page 37: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

.

a0,0

a0,1

a0,2

a0,n

0

�1

.

a1,0

a1,1

a1,2

a1,n

1

�1

.

a2,0

a2,1

a2,2

a2,n

2

�1

.

aL,0

aL,1

aL,2

aL,nL�1

–

Page 38: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

.

a0,0

a0,1

a0,2

a0,n

0

�1

.

a1,0

a1,1

a1,2

a1,n

1

�1

.

a2,0

a2,1

a2,2

a2,n

2

�1

.

aL,0

aL,1

aL,2

aL,nL�1

–

Page 39: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

.

a0,0

a0,1

a0,2

a0,n

0

�1

.

a1,0

a1,1

a1,2

a1,n

1

�1

.

a2,0

a2,1

a2,2

a2,n

2

�1

.

aL,0

aL,1

aL,2

aL,nL�1

–

Page 40: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Test accuracy for 4 layer feedforward network (784/300/300/10)

0 50 100 150 200 250 300

0

0.2

0.4

0.6

0.8

1

Number of Per Pixel Paths Through Network

Te

st

Accu

ra

cy

MNIST

Fashion MNIST

Page 41: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Page 42: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Summary

⌅dropout partitions reduce variance

–

using much less random numbers

⌅simulating discrete densities explains {�1,0,1} and integer weights

–

compression and quantization without retraining

⌅neural networks with linear complexity for both inference and training

–

sparse from scratch

–

sampling paths through neural networks instead of drop connect and drop out

Page 43: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Summary

–

sparse from scratch

–

Page 44: Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... – partially trained fully connected ⌅ goal: explore algorithms linear in time and

Summary

–

sparse from scratch

–

Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... –...

Documents

Transcript of Monte Carlo Methods and Neural Networks - NVIDIA€¦ · ⌅ drop connect ⌅ stochastic ... –...