Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
-
Upload
dataworks-summithadoop-summit -
Category
Technology
-
view
507 -
download
2
Transcript of Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
![Page 1: Enabling Apache Zeppelin and Spark for Data Science in the Enterprise](https://reader031.fdocuments.in/reader031/viewer/2022030317/586fddfe1a28ab18428b6a47/html5/thumbnails/1.jpg)
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Enabling Apache Zeppelin* and Spark* for Data Science in the Enterprise
Bikas Saha@bikassaha
*Apache Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie, Zeppelin and the Hadoop elephant logo are trademarks of the Apache Software Foundation.
![Page 2: Enabling Apache Zeppelin and Spark for Data Science in the Enterprise](https://reader031.fdocuments.in/reader031/viewer/2022030317/586fddfe1a28ab18428b6a47/html5/thumbnails/2.jpg)
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaMaking Big Data Science easy to approach
What are the current issues for the enterprise
Making Apache Zeppelin enterprise ready
Future Roadmap
![Page 3: Enabling Apache Zeppelin and Spark for Data Science in the Enterprise](https://reader031.fdocuments.in/reader031/viewer/2022030317/586fddfe1a28ab18428b6a47/html5/thumbnails/3.jpg)
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin
![Page 4: Enabling Apache Zeppelin and Spark for Data Science in the Enterprise](https://reader031.fdocuments.in/reader031/viewer/2022030317/586fddfe1a28ab18428b6a47/html5/thumbnails/4.jpg)
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin makes Big Data Science Easy to Approach
Zero install – Just connect via a web browser and ready to run Support for multiple execution platforms (Apache Spark, JDBC, Hive…) Support for multiple languages (Scala, SQL, Python…) Support for built-in visualizations Support for reporting Support for sharing and collaborative work
Does NOT have machine learning built-in – that’s where Apache Spark comes in (or your favorite SQL engine Apache Flink/Drill/Hive… and 30+ others)
![Page 5: Enabling Apache Zeppelin and Spark for Data Science in the Enterprise](https://reader031.fdocuments.in/reader031/viewer/2022030317/586fddfe1a28ab18428b6a47/html5/thumbnails/5.jpg)
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin for Sharing
![Page 6: Enabling Apache Zeppelin and Spark for Data Science in the Enterprise](https://reader031.fdocuments.in/reader031/viewer/2022030317/586fddfe1a28ab18428b6a47/html5/thumbnails/6.jpg)
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaMaking Big Data Science easy to approach
What are the current issues for the enterprise
Making Apache Zeppelin enterprise ready
Future Roadmap
![Page 7: Enabling Apache Zeppelin and Spark for Data Science in the Enterprise](https://reader031.fdocuments.in/reader031/viewer/2022030317/586fddfe1a28ab18428b6a47/html5/thumbnails/7.jpg)
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Current Apache Zeppelin and Spark integration
ZeppelinServer
SparkDriver
User
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
![Page 8: Enabling Apache Zeppelin and Spark for Data Science in the Enterprise](https://reader031.fdocuments.in/reader031/viewer/2022030317/586fddfe1a28ab18428b6a47/html5/thumbnails/8.jpg)
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architectural Issue with Secure Data Access
ZeppelinServer
SparkDriver
User 1 Spark
Executor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
Zeppelin ServerUser
HDFS
![Page 9: Enabling Apache Zeppelin and Spark for Data Science in the Enterprise](https://reader031.fdocuments.in/reader031/viewer/2022030317/586fddfe1a28ab18428b6a47/html5/thumbnails/9.jpg)
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architectural Issues with Multi-Tenancy – Fault Tolerance
ZeppelinServer
SparkDriver
Us
er1
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
Us
er2
User 1 failure affects User 2
Heavy-weight Spark drivers
![Page 10: Enabling Apache Zeppelin and Spark for Data Science in the Enterprise](https://reader031.fdocuments.in/reader031/viewer/2022030317/586fddfe1a28ab18428b6a47/html5/thumbnails/10.jpg)
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architectural Issues with Multi-Tenancy – Privacy
ZeppelinServer
SparkDriver
Us
er1
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
Us
er2
User 1 can
access User 2Data
![Page 11: Enabling Apache Zeppelin and Spark for Data Science in the Enterprise](https://reader031.fdocuments.in/reader031/viewer/2022030317/586fddfe1a28ab18428b6a47/html5/thumbnails/11.jpg)
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaMaking Big Data Science easy to approach
What are the current issues for the enterprise
Enterprise Ready Big Data Science
Future Roadmap
![Page 12: Enabling Apache Zeppelin and Spark for Data Science in the Enterprise](https://reader031.fdocuments.in/reader031/viewer/2022030317/586fddfe1a28ab18428b6a47/html5/thumbnails/12.jpg)
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Livy Server as a Session Management Service
LivyServer
Remote Spark Driver
Session Remote Context
Interactive REST API
BatchREST API
Standard Spark Batch Job
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
![Page 13: Enabling Apache Zeppelin and Spark for Data Science in the Enterprise](https://reader031.fdocuments.in/reader031/viewer/2022030317/586fddfe1a28ab18428b6a47/html5/thumbnails/13.jpg)
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Secure Data Access - Solved
ZeppelinServer
LivyInterpreter
User
SparkExecutor
SparkExecutor
LivyServer
Remote Spark Driver
Session
Remote Context
User
HDFS
![Page 14: Enabling Apache Zeppelin and Spark for Data Science in the Enterprise](https://reader031.fdocuments.in/reader031/viewer/2022030317/586fddfe1a28ab18428b6a47/html5/thumbnails/14.jpg)
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi Tenancy - Solved
ZeppelinServer
LivyInterpreter
LivyServer
Session 1
Us
er1
Us
er2
LivyInterpreter
Session 2
Remote Spark Driver
Remote Context
SparkExecutor
Remote Spark Driver
Remote Context
SparkExecutor
![Page 15: Enabling Apache Zeppelin and Spark for Data Science in the Enterprise](https://reader031.fdocuments.in/reader031/viewer/2022030317/586fddfe1a28ab18428b6a47/html5/thumbnails/15.jpg)
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaMaking Big Data Science easy to approach
What are the current issues for the enterprise
Making Apache Zeppelin enterprise ready
Future Roadmap
![Page 16: Enabling Apache Zeppelin and Spark for Data Science in the Enterprise](https://reader031.fdocuments.in/reader031/viewer/2022030317/586fddfe1a28ab18428b6a47/html5/thumbnails/16.jpg)
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Near Term Improvements
Session Management Debuggability Unified session for all languages Better visualizations for Machine Learning Support for Spark 2.0
![Page 17: Enabling Apache Zeppelin and Spark for Data Science in the Enterprise](https://reader031.fdocuments.in/reader031/viewer/2022030317/586fddfe1a28ab18428b6a47/html5/thumbnails/17.jpg)
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Long Term Improvements
Controlled sharing of sessions for collaboration Data exploration and browsing with metadata Taking the model from training to production
![Page 18: Enabling Apache Zeppelin and Spark for Data Science in the Enterprise](https://reader031.fdocuments.in/reader031/viewer/2022030317/586fddfe1a28ab18428b6a47/html5/thumbnails/18.jpg)
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You