meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe,...
Transcript of meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe,...
![Page 1: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/1.jpg)
Data-aware schedulingSpark on Kubernetes with HDFS
Johannes M. Scheuermann
Karlsruhe, 18.10.2017
![Page 2: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/2.jpg)
Johannes M. ScheuermannIT Engineering & Operations @ inovex
〉 Software-Defined Datacenters
〉 Infrastructure as Code
〉 Cloud technologies
〉 High Availability & Scalability
〉 Want‘s an IBM Z-Frame
〉 @johscheuer
2
![Page 3: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/3.jpg)
Data-aware scheduling
![Page 4: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/4.jpg)
• Why data-aware scheduling
• Data-ware for non Big-Data application
• Data-ware scheduler
• Big Data on Kubernetes• Spark on Kubernetes
• HDFS on Kubernetes
Agenda
![Page 5: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/5.jpg)
• I’m not a scheduling expert
• Concept is/was a PoC
• Share learnings/ideas
• Get feedback from the community
Spoiler(s) J
![Page 6: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/6.jpg)
Data-locality
![Page 7: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/7.jpg)
Why data-locality?
![Page 8: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/8.jpg)
Data-aware scheduling for non Big-Data
• Databases
• (large) image processing
• Video encoding
• (Web)-Cache
![Page 9: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/9.jpg)
• Distributed (parallel) POSIX file system• Any workload with high performance (incl. throughput,
databases, small files)
• Can be deployed in containers, on kubelet hosts.• Linearly scalable performance.
• Fully fault-tolerant, split-brain safe
Quobyte – What is Quobyte
![Page 10: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/10.jpg)
Quobyte - Architecture
![Page 11: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/11.jpg)
• Metadata servers make placement decisions against
policies• on file level
• tiering, isolation, …
• keep stripes of files on disks of same machine => enable local read
• allow preferring writes to local storage servers => enable local
write
• Locality information can be retrieved per file• that’s where the scheduler hooks in
Quobyte - Placement
![Page 12: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/12.jpg)
Running multiple schedulers
![Page 13: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/13.jpg)
• Specify wanted Data
• Lookup Data Placement
• Remapping if Storage runs in Containers
• Schedule Pod
Scheduling data-aware (file-based)
![Page 14: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/14.jpg)
Scheduler Architecture (4000ft)
![Page 15: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/15.jpg)
Scheduler Architecture (1000ft)
![Page 16: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/16.jpg)
Scheduler Architecture (containerized)
![Page 17: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/17.jpg)
Benchmarks
![Page 18: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/18.jpg)
(Spark) Big-Data on Kubernetes
![Page 19: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/19.jpg)
• https://github.com/apache-spark-on-k8s/spark
• Not the “faked” Spark on Kubernetes
• Still in development
• Still not in the official Apache Spark project
• Current: v2.2.0-0.4.0 • Alpha/Beta ?!
Spark on Kubernetes
![Page 20: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/20.jpg)
Spark on Kubernetes
Spark Core
MesosYARNStandaloneKubernetes
StreamingMLlibSparkSQLGraphX
![Page 21: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/21.jpg)
• Integrates with Kubernetes• RBAC
• Resource Quotas
• Audit logging
• Etc.
• Only cluster-mode
Spark on Kubernetes
![Page 22: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/22.jpg)
Spark on Kubernetes (cluster-mode)
![Page 23: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/23.jpg)
Spark + HDFS on Kubernetes
Driver 1
Driver 2
Executor 1
Executor 2.1
HDFS NN
HDFS DN 1 HDFS DN 2
PodNetwork
Host Network
Executor 2.2
![Page 24: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/24.jpg)
Demo
![Page 25: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/25.jpg)
• Rack-locality
• Node preferences
• Priority-based scheduling (K8s 1.8 alpha)
• NameNode HA
• Kerberos support
Missing pieces
![Page 26: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/26.jpg)
Conclusions
![Page 27: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/27.jpg)
• Good starting point
• Good integration
• Still some points open
• Work for better integration (more general)
• Play with it!
Conclusions
![Page 28: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/28.jpg)
28
We are hiring!
www.inovexperts.com
![Page 29: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/29.jpg)
Q&A
![Page 30: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/30.jpg)
• https://issues.apache.org/jira/browse/SPARK-18278
• https://www.youtube.com/watch?v=0xRHONrWwvU&
feature=youtu.be
• https://www.youtube.com/watch?v=DxCDxi08HWo&f
eature=youtu.be
Further reading
![Page 31: meetup 18 10 17 - inovex · Spark on Kubernetes with HDFS Johannes M. Scheuermann Karlsruhe, 18.10.2017. ... • Still not in the official Apache Spark project • Current: v2.2.0-0.4.0](https://reader034.fdocuments.in/reader034/viewer/2022042300/5ecafe5564db3431087dc94b/html5/thumbnails/31.jpg)
Johannes M. Scheuermanninovex GmbH
CC BY-NC-ND inovex.de +JohannesScheuermann
github.com/johscheuer
@johscheuer youtube.com/inovexGmbH