Automation of Deep learning training with AWS Step Functions

25
AUTOMATED DEEP LEARNING TRAINING WITH AWS STEP FUNCTIONS / AWS LAMBDA @mizti

Transcript of Automation of Deep learning training with AWS Step Functions

Page 1: Automation of Deep learning training with AWS Step Functions

AUTOMATED DEEP LEARNING TRAININGWITH

AWS STEP FUNCTIONS / AWS LAMBDA

@mizti

Page 2: Automation of Deep learning training with AWS Step Functions

PROBLEM WITH DEEP LEARNING TRAING

1.

Page 3: Automation of Deep learning training with AWS Step Functions

SERVER WITH GPUREQUIRED

Page 4: Automation of Deep learning training with AWS Step Functions

SERVERS WITH GNU AREEXPENSIVE• Some thousands dollars / month with on demand

instance• Spot instance with bidding system: much low priced,

but not ignorable price for me

Page 5: Automation of Deep learning training with AWS Step Functions

NOT IGNORABLE PRICE ? • It costs equal to 1 or 2 “Tirol choco” for each

server / hour

• Not much, but I worry about…

* WELL-KNOWN IN JAPAN, THE PRONOUN OF CHEAP CONFECTION

Page 6: Automation of Deep learning training with AWS Step Functions

AND IT TAKES VERY LONG TIME

Half day, One day,Occasionally some days

Page 7: Automation of Deep learning training with AWS Step Functions

I WANT TO TERMINATE SERVERS

ONCE TRAINING COMPLETED

SO

Page 8: Automation of Deep learning training with AWS Step Functions

PROBLEM WITHDEEP LEARNING TRAINING

2.

Page 9: Automation of Deep learning training with AWS Step Functions

ANNOYING COLLECTION OFTRAINED DATA

Page 10: Automation of Deep learning training with AWS Step Functions

WITH ONE SERVER,IT TAKES ONLY FEW MINUTES

WITH SCP

Page 11: Automation of Deep learning training with AWS Step Functions

WITH MANY SERVERS,IT TAKES LONG TIME

WHAT IS WORSE, WE DON’T KNOW WHENEACH TASK COMPETE

IN EACH SERVER

Page 12: Automation of Deep learning training with AWS Step Functions

AND I GET CONFUSED“WHAT WAS THE SETTING FOR THIS

SERVER?”

Page 13: Automation of Deep learning training with AWS Step Functions

AT LAST, I TERMINATE SERVER

WITHOUT EXTRACTING DATA

Page 14: Automation of Deep learning training with AWS Step Functions

I WANT TO GATHER DATA INTOONE PLACE AUTOMATICALLY

SO

AND WANT TO LABEL TRAINING CONDITIONS…

Page 15: Automation of Deep learning training with AWS Step Functions

SERVER-LESS ARCHITECTURE

• Serverless computing (with my understanding) is

• Generate servers when I need, Terminate servers once task completed

• Does not use any server to control above.

• Thus, I don’t need have any server usually, and can generate any numbers of server when / as many as I need.

• ( becoming buzz-word these days ?)

Page 16: Automation of Deep learning training with AWS Step Functions

SERVER-LESS SERVICES IN AWS • AWS Lambda• Users can register code with Node.js /

Python / Java / C#• Registered codes can be hooked with events from inside of AWS (and can be kicked by hand, of cause)

• Users can automate AWS control with AWS SDK for each languages ( like boto3 for Python )

• No special libraries for AWS Lambda,IOW: AWS Lambda is just a register / starting mechanism of codes

• One Lambda function can be alive only 60 seconds at most, so AWS Lambda is not suitable forlong-time / many-state jobs.

Page 17: Automation of Deep learning training with AWS Step Functions

SERVER-LESS SERVICES IN AWS • AWS Step Functions• Users can define multi-state machine like “cell automaton”

• Fork / Parallel processes are also can be defined

• Each state inputs / receives data into / from AWS Lambda functions.

• You can check status of states (process) with Web UI visually.

• Users can control long-time / multi-path process

Page 18: Automation of Deep learning training with AWS Step Functions

WHAT I WANTED TO MAKE:

1. Create S3 bucket for each execution2. Bid a spot instance3. If the bidding suceeds, and a spot instance is generated,

• Notify with AWS SNS (Email or SMS)• Prepare to training ( Downloading training etc.)• Start training• Periodically upload model dump / output data / logs into S3 bucket

4. Once training completed• Notify with AWS SNS (Email or SMS)• Terminate instance after a certain period of times

Page 19: Automation of Deep learning training with AWS Step Functions

I MADE:Create S3 bucket

Request Spot Instance

Check if the bidding succeededNotify bidding success

Check if the task completed

Wait for the task completed

Notify task completed

Terminate Spot instance

Page 20: Automation of Deep learning training with AWS Step Functions

USAGE• Input a set of json like below to start Step Function

• exec_name: name of this execution (also become a name of S3 bucket)• repository url: git repository of code to exec ( used like git clone {repository url} )• data_dir / output_dir: directory of training data and output data• data_get_command: command executed before training. (typically, getting

training data for machine learning)• exec_command: executed command for training.

Page 21: Automation of Deep learning training with AWS Step Functions

USAGE

Input a json, and ..

Page 22: Automation of Deep learning training with AWS Step Functions

USAGE

Just push "Start Execution"

Page 23: Automation of Deep learning training with AWS Step Functions

USAGE

・ Progress can be checked on Web UI・ Output result is automatically carried into S3 bucket.

Page 24: Automation of Deep learning training with AWS Step Functions

BENEFIT• Start and Forget. Sleep peacefully.

• Make it easy to parallel execution with many patterns of hyper-params

• No need of modifying training / model codes

• Maybe used also for many kinds ofbatch-like process

Page 25: Automation of Deep learning training with AWS Step Functions

MISC• Author: @mizti

any comments / questions welcomed

• Details: wrote in my blog (but in Japanese lang ; )http://mizti.hatenablog.com/entry/deeplearningwithawsstepfunction

• Code repository:https://github.com/mizti/aws_stepfunc_chainer

• Illustration in this slides:http://www.irasutoya.com/