Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

33
Deep Learning on Hadoop Scaleout Deep Learning on YARN

description

As the data world undergoes its cambrian explosion phase our data tools need to become more advanced to keep pace. Deep Learning has emerged as a key tool in the non-linear arms race of machine learning. In this session we will take a look at how we parallelize Deep Belief Networks in Deep Learning on Hadoop’s next generation YARN framework with Iterative Reduce. We’ll also look at some real world examples of processing data with Deep Learning such as image classification and natural language processing.

Transcript of Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

Page 1: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

Deep Learning on Hadoop

Scaleout Deep Learning on YARN

Page 3: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

Josh PattersonEmail:

[email protected]

Twitter: @jpatanooga

Github: https://github.com/jpatanooga

PastPublished in IAAI-09:

“TinyTermite: A Secure Routing Algorithm”Grad work in Meta-heuristics, Ant-algorithms

Tennessee Valley Authority (TVA)

Hadoop and the SmartgridCloudera

Principal Solution ArchitectToday: Patterson Consulting

Page 4: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

Overview• What is Deep Learning?• Deep Belief Networks• Implementation on Hadoop/YARN• Results

Page 5: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

What is Deep Learning?

Page 6: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

What is Deep Learning?Algorithm that tries to learn simple features in lower layers

And more complex features in higher layers

Page 7: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

Interesting Properties of Deep Learning

Reduces a problem with overfitting in neural networks. Introduces new techniques for "unsupervised feature learning”

introduces new more automatic ways to figure out the parts of your data you should feed into your learning algorithm.

Page 8: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

Chasing NatureLearning sparse representations of auditory signals

leads to filters that closely correspond to neurons in early audio processing in mammals

When applied to speechLearned representations showed a striking resemblance to the cochlear filters in the auditory cortext

Page 9: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

Yann LeCunn on Deep Learning

Has become the dominant method for acoustic modeling in speech recognitionQuickly becoming the dominant method for several vision tasks such as

object recognitionobject detectionsemantic segmentation.

Page 10: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

Deep Belief Networks

Page 11: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

What is a Deep Belief Network?

Generative probabilistic modelComposed of one visible layer

Many hidden layersEach hidden layer learns relationship between units in lower layer

Higher layer representations tend to become more complext

Page 12: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

Restricted Boltzmann Machines Unsupervised model: Does feature learning by repeated sampling

of the input data. Learns how to reconstruct data for good feature detection. RBMs have different formulas for different kinds of data:

Binary

Continuous

Page 13: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

DeepLearning4JImplementation in Java

Self-contained & built on Akka, Hazelcast, JblasDistributed to run faster and with more features than current Theano-based implementations.Talks to any data source, expects one format.

Page 14: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

Vectorized Implementation

Handles lots of data concurrently. Any number of examples at once, but the code does not change.Faster: Allows for native/GPU execution.One format: Everything is a matrix.

Page 15: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

DL4J vs Theano PerfGPUs are inherently faster than normal native.Theano is not distributed, and GPUs have very low RAM.DL4J allows for situations where you have to “throw CPUs at it.”

Page 16: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

What are Good Applications for Deep Learning?

Image ProcessingHigh MNIST Scores

Audio ProcessingCurrent Champ on TIMIT dataset

Text / NLP ProcessingWord2vec, etc

Page 17: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

Deep Learning on Hadoop

Page 18: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

Past Work: Parallel Iterative Algorithms on YARN

Started withParallel linear, logistic regressionParallel Neural Networks

Packaged in Metronome100% Java, ASF 2.0 Licensed, on github

Page 19: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

19

Parameter Averaging

McDonald, 2010Distributed Training Strategies for the Structured Perceptron

Langford, 2007Vowpal Wabbit

Jeff Dean’s Work on Parallel SGDDownPour SGD

Page 20: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

20

MapReduce vs. Parallel Iterative

Input

Output

Map Map Map

Reduce Reduce

Processor Processor Processor

Superstep 1

Processor Processor

Superstep 2

. . .

Processor

Page 21: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

21

SGD: Serial vs Parallel

Model

Training Data

Worker 1

Master

Partial Model

Global Model

Worker 2

Partial Model

Worker N

Partial Model

Split 1 Split 2 Split 3

Page 22: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

Managing ResourcesRunning through YARN on hadoop is important

Allows for workflow schedulingAllows for scheduler oversight

Allows the jobs to be first class citizens on Hadoop

And share resources nicely

Page 23: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

Parallelizing Deep Belief Networks

Two phase trainingPre TrainFine tune

Each phase can do multiple passes over datasetEntire network is averaged at master

Page 24: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

PreTrain and Lots of DataWe’re exploring how to better leverage the unsupervised aspects of the PreTrain phase of Deep Belief Networks

Allows for the use of far less unlabeled dataAllows us to more easily modeled the massive amounts of structured data in HDFS

Page 25: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

Results

Page 26: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

DBNs on IR Performance Faster to Train. Parameter averaging is an automatic form of

regularization. Adagrad with IR allows for better

generalization of different features and even pacing.

Page 27: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

Scale Out MetricsBatches of records can be processed by as many workers as there are data splitsMessage passing overhead is minimalExhibits linear scaling

Example: 3x workers, 3x faster learning

Page 28: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

Usage From Command Line

Run Deep Learning on Hadoopyarn jar iterativereduce-0.1-SNAPSHOT.jar [props file]

Evaluate model./score_model.sh [props file]

Page 29: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

Handwriting Renders

Page 30: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

Faces Renders

Page 31: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

…In Which We Gather Lots of Cat Photos

Page 32: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

Future DirectionGPUsBetter Vectorization toolingMove YARN version back over to JBLAS for matrices

Page 33: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop

References“A Fast Learning Algorithm for Deep Belief Nets”

Hinton, G. E., Osindero, S. and Teh, Y. - Neural Computation (2006)

“Large Scale Distributed Deep Networks”Dean, Corrado, Monga - NIPS (2012)

“Visually Debugging Restricted Boltzmann Machine Training with a 3D Example”

Yosinski, Lipson - Representation Learning Workshop (2012)