Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

24
DISTRIBUTED DEEP LEARNING SPARK ON AWS Presented by Vincent Van Steenbergen - @nsteenv

Transcript of Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

Page 1: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

DISTRIBUTED DEEPLEARNING

SPARK ON AWSPresented by Vincent Van Steenbergen - @nsteenv

Page 2: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

WHOAMIVINCENT VAN STEENBERGEN

Data Engineer @ Abstract Minds

Playing with Scala, Akka & Spark +/- 3 years

Deeply interested in Artificial Intelligence and Data Analysis

Page 3: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

DISCLAIMER

Page 4: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

DEEP LEARNINGconvolutional neural networks

Page 5: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

APPLICATIONS

Page 6: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

IMAGE ANALYSIS

Page 7: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

IMAGE GENERATION

Page 8: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

GAMES

Page 9: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

TRAINING A MODEL REQUIRES:a lot of time

even more computing power

Ex: AlphaGo - 1202 CPU and 176 GPU

Page 10: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

SO HOW CAN I DO THAT...from my laptop?

for a decent cost?

within a short timespan?

possible on a laptop but very slow

solution: distribute training over a cluster

Page 11: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

APACHE SPARK

Scala/Python framework for big data analysis

Like Hadoop but faster

Page 12: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

ADVANTAGESAble to handle potentially Tb of data in streaming

Parallelise operations on a big cluster of machines

Improves accuracy of results

Page 13: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

AMAZON WEB SERVICES (EC2)GPU instances (g2.2xlarge, g2.8xlarge)

Spot instances (on demand, generally 2-3 times cheaperthan regular instances)

Page 14: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

G2.8XLARGE CONFIGURATIONFour NVIDIA GRID GPUs, each with 1,536 CUDA cores and 4

GB of video memory

32 vCPUs

60 GiB of memory

240 GB (2 x 120) of SSD storage

Average price: $1.00 per hour

Page 15: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

NOT BAD...

Page 16: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

DEEP LEARNING FRAMEWORKSTensorFlow (Google)

Caffe (Berkeley)

Page 17: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

MNIST DATASET

Handwriten digits dataset

Page 18: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

CROSS VALIDATION

Page 19: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

COMPUTATION TIME

Page 20: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

RESULTS7x speedup compared to training the models one at a time

on one machine

best result with hyperparameter tuning has a 99.47%accuracy on the test set

which is a 34% reduction of the test error.

Page 21: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

IMAGE CLASSIFICATION

Page 22: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

RESULTS('coral reef', 0.88503921),('scuba diver', 0.025853464),('brain coral', 0.0090828091),('snorkel', 0.0036010914),('promontory, headland, head, foreland', 0.0022605944)])             

Page 23: Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs Connect

THANK YOU!Any questions?

My email: [email protected]