Introduction to Sparkling Water - Spark Summit East 2016
-
Upload
srisatish-ambati -
Category
Technology
-
view
579 -
download
0
Transcript of Introduction to Sparkling Water - Spark Summit East 2016
An introduction to Sparkling Water
Michal Malohlava h2o.ai
Who Am I?Background
• PhD in CS from Charles University in Prague, Czech Republic
• Postdoc at Purdue University experimenting with algos for large-scale computation
• Now software engineer at H2O.ai Experience with domain-specific languages,
distributed system, software engineering, and big data.
H2O.aiH
2O team
Sri Ambati Cliff ClickCo-
Foun
ders
Stephen Boyd
Rob Tibshirani
TrevorHastie
Scie
ntifi
cA
dvis
ory
Cou
ncil
H2OOpen-Source In-Memory Data Science Platform
• Highly optimized Java code (in-house) • Distributed in-memory K-V store and map/
reduce computation framework • Data parser (HDFS, S3, NFS, HTTP, local
drives, etc.) • Read/write access to distributed data
frames (R/Pandas-style) • ML algos - Deep Learning, GBM, DRF,
GLM, GLRM, K-Means, PCA, CoxPH, Ensembles
• REST API: clients Interactive UI/R/Python
Sparkling Water
Sparkling WaterProvides
• Transparent integration of H2O into Spark ecosystem
• Use H2O Frames and algorithms with Spark API
Excels in existing Spark workflows requiring advanced Machine Learning algorithms
TYPICAL USE CASES
Where to use Sparkling Water?
Data SourceM
odel
build
ing
Modelling
Deep Learning, GBMDRF, GLM, GLRM
K-Means, PCACoxPH, Ensembles
Prediction processingData munging
Where to use Sparkling Water?
Data Source
Dat
a pa
rsin
gm
ungi
ng
ModellingData load/munging/
exploration
Load and parsedata directly into
H2OFrame
Ad hocdata
transformation
Where to use Sparkling Water?
DataSourceO
ff-lin
e m
odel
trai
ning
Stre
ampr
oces
sing
Data Stream
Data munging
Model prediction
Deploy the model
Export modelin a binary format
or as code
Modelling
WHAT IS INSIDE?
Cluster manager
Worker node
Spark executor
Scala/Py main program
Driver node
H2OContext
SparkContext
Worker node
Spark executor
Worker node
Spark executor
H2O
Ser
vice
sH
2O S
ervi
ces
Data Source
Spar
k Ex
ecut
orSp
ark
Exec
utor
Spar
k Ex
ecut
or
Spark Cluster
DataFrame
H2O
Ser
vice
s
H2OFrame
Data Source
h2oContext.asDataFrame
h2oContext.asH2OFrame
TIME FOR DEMO!
Key Points to RememberSparkling Water integrates H2O to Spark
• Enables using advanced machine learning algorithms inside Spark workflows
• Offers eager computation model,mutable data structure H2OFrame
THANK YOU.@h2oai @mmalohlava
h2o.ai/downloadgithub.com/h2oai/sparkling-waterVisit our booth K27 for live demos and more!