Intro to the Distributed Version of TensorFlow

Post on 12-Jan-2017

1.091 views 0 download

Transcript of Intro to the Distributed Version of TensorFlow

Dr. Miha Pelko, NorCom*

@mpelko

*views are my own

WE ARE HIRING!

Configuration at Yahoo! :

“We avoid unnecessary data movement between Hadoop clusters and separate deep learning clusters.”

“YARN works well for deep learning. Multiple experiments of deep learning can be conducted concurrently on a single cluster. It makes deep learning extremely cost effective as opposed to conventional methods. In the past, we had teams use “notepad” to schedule GPU resources manually, which was painful and worked only for a small number of users.”

From: http://yahoohadoop.tumblr.com/post/129872361846/large-scale-distributed-deep-learning-on-hadoop

See: https://www.tensorflow.org/versions/r0.9/how_tos/image_retraining/index.html

Inception-v3

RETRAIN INSTEAD OF DISTRIBUTE

ERLKÖNIG RECOGNITION

3 SPECIFIC CATEGORIES

erlkönig

car

road

Cut the last layer and train a new one~30 minutes on Desktop CPU> 90% accuracy

TasksJobs

One server per task!

§ Wrapper over a Coordinator, a Saver, and a SessionManager

§ Variable initialization

§ Checkpointing

§ Summarizes to the log

§ Automatic initialization from the most recent checkpoint

§ is_chief flag in replica-type models

Source: http://download.tensorflow.org/paper/whitepaper2015.pdf

Performance of Distributed TensorFlow: A Multi-Node and Multi-GPU Configurationhttp://www.altoros.com/performance-benchmark-distributed-tensorflow.html

§ Putting it all together (including deployment management)§ See: https://www.tensorflow.org/versions/r0.9/how_tos/distributed/index.html§ See: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/dist_test§ See: https://github.com/bwahn/tensorflow-kr-docker

§ In-graph replication vs. Between-graph replication§ See: https://www.tensorflow.org/versions/r0.9/how_tos/distributed/index.html#replicated-training

§ Specific hardware components§ How to handle GPUs?§ Other hardware?

§ Model splitting parallelization§ You’re on your own

§ See also: https://www.youtube.com/watch?v=YAkdydqUE2c

Thank you.