Comparison of deep learning frameworks from a viewpoint of double backpropagation

31
Comparison of deep learning frameworks from a viewpoint of double backpropagation Preferred Networks, Inc. Kenta Oono <[email protected]> Chainer Meetup #6@Preferred Networks Sep. 30 th 2017 1

Transcript of Comparison of deep learning frameworks from a viewpoint of double backpropagation

Page 1: Comparison of deep learning frameworks from a viewpoint of double backpropagation

Comparisonofdeeplearningframeworksfromaviewpointof

doublebackpropagation

PreferredNetworks,Inc.KentaOono <[email protected]>

Chainer Meetup#[email protected] 2017

1

Page 2: Comparison of deep learning frameworks from a viewpoint of double backpropagation

Agenda

• TechnologicalstackofDLframeworks• DesignchoiceinDLframeworks• Doublebackprop primer• Codingexamplesofdoublebackprop inChainer,PyTorch,andTF

2

Page 3: Comparison of deep learning frameworks from a viewpoint of double backpropagation

TechnologystackofaDLframework

name functions example

Graphical visualization DIGITS, TensorBoard

Machine learning workflowmanagement

Dataset prep, Save/LoadTraining loop

Keras, TF slim

Computational graph(CG)management

Build/Optimize CGsForward/Back prop

Theano, TensorFlowTorch.nn

Multi-dimensionalarray processing

High-level array manipulation

NumPy, CuPyEigen, Torch (core)

Numerical computation Matrix operationConvolution

BLAS(OpenBLAS, MKL),cuBLAS, cuDNN, MKL DNN

Computational device CPU, GPU, TPU, FPGA

3

Page 4: Comparison of deep learning frameworks from a viewpoint of double backpropagation

TechnologystackofChainer

cuDNN

Chainer

NumPy CuPy

BLAS cuBLAS,cuRAND

CPU GPU

4

name

Graphical visualization

Machine learning workflowmanagementComputational graph managementMulti-dimensionalarray processingNumerical computation

Computational device

Page 5: Comparison of deep learning frameworks from a viewpoint of double backpropagation

TechnologystackofTensorFlow

cuDNN

TensorFlow

Eigen::Tensor

BLAS cuBLAS,cuRAND

CPU GPU

5

TensorBoard

TFslimKeras

name

Graphical visualization

Machine learning workflowmanagementComputational graph managementMulti-dimensionalarray processingNumerical computation

Computational device

Page 6: Comparison of deep learning frameworks from a viewpoint of double backpropagation

TechnologystackofTheano

CUDA,OpenCLCUDAToolkit

Theano

BLAS

CPU GPU

6

libgpuarrayNumPy

Keras,Lasagne,Blocks,etc.

name

Graphical visualization

Machine learning workflowmanagementComputational graph managementMulti-dimensionalarray processingNumerical computation

Computational device

Page 7: Comparison of deep learning frameworks from a viewpoint of double backpropagation

TechnologystackofKeras

7

Keras

TensorFlowTheano

TechnologyStackofTheano

TechnologyStackofTF

name

Graphical visualization

Machine learning workflowmanagementComputational graph managementMulti-dimensionalarray processingNumerical computation

Computational device

Page 8: Comparison of deep learning frameworks from a viewpoint of double backpropagation

8

Page 9: Comparison of deep learning frameworks from a viewpoint of double backpropagation

9

Page 10: Comparison of deep learning frameworks from a viewpoint of double backpropagation

10

Page 11: Comparison of deep learning frameworks from a viewpoint of double backpropagation

11

Page 12: Comparison of deep learning frameworks from a viewpoint of double backpropagation

12

ImportantDesignChoicesthroughuser’stypicalworkflow

WriteNNs(inwhichlanguage?)

Computebackprop(how?)

Updateparameters(howtorepresent?)(howtoupdate?)

Runusercodes(when?)

OptimizeCG(how?)

Scaleuptraining(how?)

Coding Execution Improvement

Page 13: Comparison of deep learning frameworks from a viewpoint of double backpropagation

ImportantDesignChoicesthroughuser’stypicalworkflow

WriteNNs(inwhichlanguage?)

Computebackprop(how?)

Updateparameters(howtorepresent?)(howtoupdate?)

Runusercodes(when?)

Coding Execution Improvement

OptimizeCG(how?)

Scaleuptraining(how?)

13

Page 14: Comparison of deep learning frameworks from a viewpoint of double backpropagation

http://bit.ly/aaai-dlif

14

Page 15: Comparison of deep learning frameworks from a viewpoint of double backpropagation

NeuralNetworkasaComputationalGraph

• Inmostframeworks,NNisconceptualizedasacomputationalgraph(CG).• ThesimplestformofCGisabipartite DAG(DirectedAcyclicGraph)consistingofdatanodes andoperatornodes.

y = x1 * x2z = y - x3

x1 mul suby

x3

z

x2

datanode

operatornode15

Page 16: Comparison of deep learning frameworks from a viewpoint of double backpropagation

MultiLayerPerceptron(MLP)

x Affine

W1 b1

h1 ReLU a1

Affine

W2 b2

h2 ReLU a2

Softmax prob Cross

Entropy loss

t 16

Page 17: Comparison of deep learning frameworks from a viewpoint of double backpropagation

HowtocomputebackpropBackprop throughgraphsFrameworkonlybuildsgraphsofforwardprop,anddobackpropbybacktrackingthegraphs.

E.g.Torch.nn,Caffe

Backprop asextendedgraphsFrameworkbuildsgraphsforbackprop aswellasthoseforforwardprop.

E.g.Theano,MXNet,TensorFlow,Chainer,PyTorch

a mul suby

c

z

b

a mul suby

c

z

b

gzid

neg

mul

mul

gy

gc

ga

gb

∇y z∇a z ∇z z = 1

17

Page 18: Comparison of deep learning frameworks from a viewpoint of double backpropagation

Howtocomputebackprop

Backprop throughgraphs

EasyandsimpletoimplementBackpropcomputationneednotbedefinedasgraphs.

LowflexibilityFeaturesavailableforgraphsmaynotapplytobackpropcomputations.

Backprop asextendedgraphs

Implementationgetscomplicated

HighflexibilityAnyfeaturesavailableforgraphscanalsobeappliedtobackpropcomputations(e.g.backpropofbackprop).

18

Page 19: Comparison of deep learning frameworks from a viewpoint of double backpropagation

Doublebackprop

x F z

y

・・・ L

class F(FunctionNode):def forward(self, x, y):

return x * x + y

def backward(self, x, y, gz):return 2 * gz * x, gz

NumPy,CuPy

Note:Theinterfaceissimplifiedfromactualimplementation.

chainer.Variable->CreatesCG

19

Page 20: Comparison of deep learning frameworks from a viewpoint of double backpropagation

Doublebackprop

x F z

y

gx Grad F gz

gy

・・・ L

Backprop!

=∂L/∂z=∂L/∂x

=∂L/∂y

1.0

=∂L/∂L

Mul

x

gz

y

gx

gy

*2

20

Page 21: Comparison of deep learning frameworks from a viewpoint of double backpropagation

Doublebackprop

x F z

y

gx Grad F 1.0

gy

Backprop!

=∂z/∂x

=∂z/∂y 21

Page 22: Comparison of deep learning frameworks from a viewpoint of double backpropagation

Doublebackprop

x F z

y

gx

Grad F1.0

gy

22

Page 23: Comparison of deep learning frameworks from a viewpoint of double backpropagation

Doublebackpropx Mul z

y

gx

Grad F1.0

gy

Backprop!

1.0DoubleGrad F

ggx

=∂2z/∂x2 23

Page 24: Comparison of deep learning frameworks from a viewpoint of double backpropagation

Doublebackprop

x f z

ComputesthedifferentiationofL = G(f(x), ∇f(x)) withrespecttox

L = G(f(x), ∇f(x))

24

Page 25: Comparison of deep learning frameworks from a viewpoint of double backpropagation

Doublebackprop

x f z

gxGrad f

ComputesthedifferentiationofL = G(f(x), ∇f(x)) withrespecttox

L = G(f(x), ∇f(x))

25

Page 26: Comparison of deep learning frameworks from a viewpoint of double backpropagation

Doublebackprop

x f z

gxGrad f

・・・ L

ComputesthedifferentiationofL = G(f(x), ∇f(x)) withrespecttox

L = G(f(x), ∇f(x))

26

Page 27: Comparison of deep learning frameworks from a viewpoint of double backpropagation

Doublebackprop

x f z

gxGrad f

・・・ L

Backprop!

ggxDoubleGrad f

∂L/∂x

1.0gzGrad f

ComputesthedifferentiationofL = G(f(x), ∇f(x)) withrespecttox

L = G(f(x), ∇f(x))

27

Page 28: Comparison of deep learning frameworks from a viewpoint of double backpropagation

Example(Chainer)

http://bit.ly/2wpEzO5

28

Page 29: Comparison of deep learning frameworks from a viewpoint of double backpropagation

Example(PyTorch)

29

Page 30: Comparison of deep learning frameworks from a viewpoint of double backpropagation

Example(TensorFlow)

30

Page 31: Comparison of deep learning frameworks from a viewpoint of double backpropagation

Conclusion

• SeveralDLframeworkshavesimilarityintheirstructure• Differenceinchoiceofdesigndeterminescapabilityofframeworks• Introductionofdoublebackprop andtoyexamplesinseveralframeworks.

31