Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
-
Upload
databricks -
Category
Software
-
view
422 -
download
6
Transcript of Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
![Page 1: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/1.jpg)
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Bay Area Spark MeetupNov 8, 2017
Sue Ann Hong, Databricks
![Page 2: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/2.jpg)
This talk
• Deep Learning at scale: current state• Deep Learning Pipelines: the philosophy• End-to-end workflow with DL Pipelines• Future
![Page 3: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/3.jpg)
Deep Learning at Scale: current state
3putyour#assignedhashtagherebysettingthefooterinview-header/footer
![Page 4: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/4.jpg)
What is Deep Learning?
• A set of machine learning techniques that use layers that transform numerical inputs• Classification• Regression• Arbitrary mapping
• Popular in the 80’s as Neural Networks• Recently came back thanks to advances in data collection,
computation techniques, and hardware.
![Page 5: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/5.jpg)
Success of Deep Learning
Tremendous success for applications with complex data• AlphaGo• Image interpretation• Automatic translation• Speech recognition
![Page 6: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/6.jpg)
Deep Learning is often challengingLabeled
DataCompute Resources
& Time Engineer hours
• Tedious or difficult to distribute computations• No exact science around deep learning à lots of tweaking• Low level APIs with steep learning curve• Complex models à need a lot of data
![Page 7: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/7.jpg)
Deep Learning in industry
• Currently limited adoption• Huge potential beyond the industrial giants• How do we accelerate the road to massive availability?
![Page 8: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/8.jpg)
Deep Learning Pipelines
8putyour#assignedhashtagherebysettingthefooterinview-header/footer
![Page 9: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/9.jpg)
Deep Learning Pipelines:Deep Learning with Simplicity
• Open-source Databricks library• Focuses on ease of use and integration• without sacrificing performance
![Page 10: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/10.jpg)
Deep Learning is often challengingCompute Resources
& TimeEngineer hours Labeled
Data
• Tedious or difficult to distribute computations• No exact science around deep learning à lots of tweaking• Low level APIs with steep learning curve• Complex models à need a lot of data
![Page 11: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/11.jpg)
Deep Learning is often challenging
• Tedious or difficult to distribute computations
• No exact science around deep learning à lots of tweaking
• Low level APIs with steep learning curve
• Complex models à need a lot of data
![Page 12: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/12.jpg)
• Tedious or difficult to distribute computations
• No exact science around deep learning à lots of tweaking
• Low level APIs with steep learning curve
• Complex models à need a lot of data
Instead
• Be easy to scale
• Require little tweaking
• Be easy to write
• Require little or no data
Commonworkflowsshould
![Page 13: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/13.jpg)
How
• Be easy to scale
• Require little tweaking
• Be easy to write
• Require little or no data
Commonworkflowsshould
• Use Apache Spark for scaling out common tasks
• Leverage well-known model architectures
• Integrate with MLlib Pipelines API to capture ML workflow concisely
• Leverage pre-trained models for common tasks
Apache Spark for scaling out
MLlib Pipelines API
pre-trained models
![Page 14: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/14.jpg)
Demo: Build a visual recommendation AI
10minutes
7lines of code
Elastic Scale-out using Apache Spark
MLlib Pipelines API Leveragepre-trained models
0labels
![Page 15: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/15.jpg)
![Page 16: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/16.jpg)
To the notebook!
![Page 17: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/17.jpg)
![Page 18: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/18.jpg)
End-to-End Workflowwith Deep Learning Pipelines
18putyour#assignedhashtagherebysettingthefooterinview-header/footer
![Page 19: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/19.jpg)
A typical Deep Learning workflow
• Load data (images, text, time series, …)• Interactive work• Train• Select an architecture for a neural network• Optimize the weights of the NN
• Evaluate results, potentially re-train• Apply:• Pass the data through the NN to produce new features or output
Load data Interactive workTrain
EvaluateApply
![Page 20: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/20.jpg)
A typical Deep Learning workflow
Load data
Interactive work
Train
Evaluate
Apply
• ImageloadinginSpark
• Distributedbatchprediction• DeployingmodelsinSQL
• Transferlearning• Distributedtuning
• Pre-trainedmodels
![Page 21: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/21.jpg)
Deep Learning Pipelines
• Load data
• Interactive work
• Train
• Evaluate model
• Apply
![Page 22: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/22.jpg)
Adds support for images in Spark
• ImageSchema, reader, conversion functions to/from numpy arrays• Most of the tools we’ll describe work on ImageSchema columns
from sparkdl import readImagesimage_df = readImages(sample_img_dir)
![Page 23: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/23.jpg)
Upcoming: built-in support in Spark
• Spark-21866• Contributing image format & reading to Spark• Targeted for Spark 2.3• Joint work with Microsoft
23putyour#assignedhashtagherebysettingthefooterinview-header/footer
![Page 24: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/24.jpg)
Deep Learning Pipelines
• Load data
• Interactive work
• Train
• Evaluate model
• Apply
![Page 25: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/25.jpg)
Applying popular models
• Popular pre-trained models accessible through MLlib Transformers
predictor = DeepImagePredictor(inputCol="image", outputCol="predicted_labels", modelName="InceptionV3")
predictions_df = predictor.transform(image_df)
![Page 26: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/26.jpg)
Applying popular modelspredictor = DeepImagePredictor(inputCol="image",
outputCol="predicted_labels", modelName="InceptionV3")
predictions_df = predictor.transform(image_df)
![Page 27: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/27.jpg)
Deep Learning Pipelines
• Load data
• Interactive work
• Train
• Evaluate model
• Apply
Hyperparameter tuning
Transferlearning
![Page 28: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/28.jpg)
Deep Learning Pipelines
• Load data
• Interactive work
• Train
• Evaluate model
• Apply
Hyperparameter tuning
Transferlearning
![Page 29: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/29.jpg)
Transfer learning
• Pre-trained models may not be directly applicable• New domain, e.g. shoes
• Training from scratch requires • Enormous amounts of data• A lot of compute resources & time
• Idea: intermediate representations learned for one task may be useful for other related tasks
![Page 30: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/30.jpg)
Transfer Learning
SoftMaxGIANT PANDA 0.9RACCOON 0.05RED PANDA 0.01…
![Page 31: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/31.jpg)
Transfer Learning
![Page 32: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/32.jpg)
Transfer Learning
Classifier
![Page 33: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/33.jpg)
Transfer Learning
Classifier
Rose: 0.7
Daisy: 0.3
![Page 34: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/34.jpg)
Transfer Learning as a Pipeline
DeepImageFeaturizerImage
Loading PreprocessingLogistic
Regression
MLlib Pipeline
![Page 35: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/35.jpg)
Transfer Learning as a Pipeline
35putyour#assignedhashtagherebysettingthefooterinview-header/footer
featurizer = DeepImageFeaturizer(inputCol="image",outputCol="features",modelName="InceptionV3")
lr = LogisticRegression(labelCol="label")
p = Pipeline(stages=[featurizer, lr])
p_model = p.fit(train_df)
![Page 36: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/36.jpg)
Transfer Learning
• Usually for classification tasks• Similar task, new domain
• But other forms of learning leveraging learned representations can be loosely considered transfer learning
![Page 37: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/37.jpg)
![Page 38: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/38.jpg)
Featurization for similarity-based ML
DeepImageFeaturizerImage
Loading PreprocessingLogistic
Regression
![Page 39: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/39.jpg)
Featurization for similarity-based ML
DeepImageFeaturizerImage
Loading PreprocessingClusteringKMeans
GaussianMixture
Nearest NeighborKNN LSH
Distance computation
![Page 40: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/40.jpg)
Featurization for similarity-based ML
DeepImageFeaturizerImage
Loading PreprocessingClusteringKMeans
GaussianMixture
Nearest NeighborKNN LSH
Distance computation
Duplicate Detection
RecommendationAnomaly Detection
Search result diversification
![Page 41: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/41.jpg)
Deep Learning Pipelines
• Load data
• Interactive work
• Train
• Evaluate model
• Apply
Hyperparameter tuning
Transferlearning
![Page 42: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/42.jpg)
Deep Learning Pipelines
• Load data
• Interactive work
• Train
• Evaluate model
• Apply
![Page 43: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/43.jpg)
Deep Learning Pipelines
• Load data
• Interactive work
• Train
• Evaluate model
• ApplySparkSQL
Batchprediction
s
![Page 44: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/44.jpg)
Deep Learning Pipelines
• Load data
• Interactive work
• Train
• Evaluate model
• ApplySparkSQL
Batchprediction
![Page 45: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/45.jpg)
Batch prediction as an MLlib Transformer
• A model is a Transformer in MLlib• DataFrame goes in, DataFrame comes out with output columns
predictor = XXTransformer(inputCol="image",outputCol="prediction",modelSpecification={…})
predictions = predictor.transform(test_df)
![Page 46: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/46.jpg)
Hierarchy of DL transformers for images
46
TFImageTransformer
KerasImageTransformer
DeepImageFeaturizerDeepImagePredictor
NamedImageTransformer (model_name)
(keras.Model)
(tf.Graph)
Input(Image)
Output(vector)
model_namemodel_name
Keras.Model
tf.Graph
![Page 47: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/47.jpg)
Hierarchy of DL transformers for images
47
TFImageTransformer
KerasImageTransformer
DeepImageFeaturizerDeepImagePredictor
NamedImageTransformer (model_name)
(keras.Model)
(tf.Graph)
Input(Image)
Output(vector)
model_namemodel_name
Keras.Model
tf.Graph
![Page 48: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/48.jpg)
Hierarchy of DL transformers
48
TFImageTransformer
KerasImageTransformer
DeepImageFeaturizer
DeepImagePredictor
NamedImageTransformer
model_name)
Input(Image
)
Output(vector
)
model_namemodel_name
Keras.Model
tf.Graph
TFTextTransformer
KerasTextTransformer
DeepTextFeaturizer
DeepTextPredictor
NamedTextTransformerInput(Text)
Output(vector)
model_namemodel_name
Keras.Model
tf.Graph
TFTransformer
![Page 49: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/49.jpg)
Hierarchy of DL transformers
49
TFImageTransformer
KerasImageTransformer
DeepImageFeaturizerDeepImagePredictor
NamedImageTransformer
TFTextTransformer
KerasTextTransformer
DeepTextFeaturizerDeepTextPredictor
NamedTextTransformer
TFTransformer
![Page 50: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/50.jpg)
Hierarchy of DL transformers
50
TFImageTransformer
KerasImageTransformer
DeepImageFeaturizerDeepImagePredictor
NamedImageTransformer
TFTextTransformer
KerasTextTransformer
DeepTextFeaturizerDeepTextPredictor
NamedTextTransformer
TFTransformer
![Page 51: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/51.jpg)
Common Tasks EasyAdvanced Ones Possible
51
TFImageTransformer
KerasImageTransformer
DeepImageFeaturizerDeepImagePredictor
NamedImageTransformer
TFTextTransformer
KerasTextTransformer
DeepTextFeaturizerDeepTextPredictor
NamedTextTransformer
TFTransformer
![Page 52: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/52.jpg)
Common Tasks EasyAdvanced Ones Possible
52
TFImageTransformer
KerasImageTransformer
DeepImageFeaturizer
DeepImagePredictor
NamedImageTransformer
TFTextTransformer
KerasTextTransformer
DeepTextFeaturizer
DeepTextPredictor
NamedTextTransformer
TFTransformer
Pre-built Solutions
80%-built Solutions
Self-built Solutions
![Page 53: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/53.jpg)
Deep Learning Pipelines : FutureIn progress• Text featurization (embeddings)• TFTransformer for arbitrary vectors
Future directions• Non-image data domains: video, text, speech, …• Distributed training• Support for more backends, e.g. MXNet, PyTorch, BigDL
![Page 54: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/54.jpg)
Questions?Thank you!
54putyour#assignedhashtagherebysettingthefooterinview-header/footer
![Page 55: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark](https://reader033.fdocuments.in/reader033/viewer/2022052308/5a64761a7f8b9afc4d8b45dd/html5/thumbnails/55.jpg)
ResourcesDL Pipelines GitHub Repo, Spark Summit Europe 2017 Deep Dive
Blog posts & webinars (http://databricks.com/blog)• Deep Learning Pipelines• GPU acceleration in Databricks• BigDL on Databricks• Deep Learning and Apache Spark
Docs for Deep Learning on Databricks (http://docs.databricks.com)• Getting started• Deep Learning Pipelines Example• Spark integration