IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30....

59
Piotr Mrowczynski Cloud-native Spark on Kubernetes IT-DB-SAS Internal Training 1

Transcript of IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30....

Page 1: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Piotr Mrowczynski

Cloud-native Spark on Kubernetes

IT-DB-SAS Internal Training

!1

Page 2: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Credit - https://www.edureka.co/blog/hadoop-ecosystem

What is Apache Spark?

Apache Spark is an open-source distributed and parallel processing framework that allows for sophisticated analytics, real-time streaming and machine learning on large datasets

Piotr MrowczynskiCloud-native Spark on Kubernetes !2

Page 3: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

IT-DB-SAS !3

Spark K8S Operator Democern.ch/spark-user-guide

Spark Operator Deployment

(long-running operator for batch)

Cloud Infrastructure

Kubernetes Resource Manager

Create cluster Ref: Spark/K8S Cluster Admin Guide

Spark Drivers/Executors Pods (containers)

Lets kick off Create cluster as it can take up to 10 min

Page 4: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Current state of the art for Apache Spark at CERN

• Spark running on top of HDFS/YARN (distributed filesystem for big data and cluster resource manager). Usually processing of data happens on the same cluster of machines as storage (data locality).

Piotr MrowczynskiCloud-native Spark on Kubernetes !4

Page 5: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Current state of the art for Apache Spark at CERN

• Spark running on top of HDFS/YARN (distributed filesystem for big data and cluster resource manager). Usually processing of data happens on the same cluster of machines as storage (data locality).

• Physical machines allocated – means no resource elasticity, no isolation between users, compute coupled with data storage.

Piotr MrowczynskiCloud-native Spark on Kubernetes 3

Page 6: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Current state of the art for Apache Spark at CERN

• Spark running on top of HDFS/YARN (distributed filesystem for big data and cluster resource manager). Usually processing of data happens on the same cluster of machines as storage (data locality).

• Physical machines allocated – means no resource elasticity, no isolation between users, compute coupled with data storage.

• Stable workloads from monitoring, security, experiment operations (NXCals) with data in HDFS/Kafka

Piotr MrowczynskiCloud-native Spark on Kubernetes 3

Page 7: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Current state of the art for Apache Spark at CERN

• Spark running on top of HDFS/YARN (distributed filesystem for big data and cluster resource manager). Usually processing of data happens on the same cluster of machines as storage (data locality).

• Physical machines allocated – means no resource elasticity, no isolation between users, compute coupled with data storage.

• Stable workloads from monitoring, security, experiment operations (NXCals) with data in HDFS/Kafka

• Ad-hoc workloads from physics communities interested in analyzing data at scale using external storages (EOS/S3). Physics analysis limited to the capacity and resources of shared Spark/Hadoop cluster. Some workloads (>100TB, possibly 1PB) not possible on shared cluster without major upgrade of cluster or disruption of other workflows

Piotr MrowczynskiCloud-native Spark on Kubernetes 3

Page 8: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Use-case investigated

• Spark running on top of HDFS/YARN (distributed filesystem for big data and cluster resource manager). Usually processing of data happens on the same cluster of machines as storage (data locality).

• Physical machines allocated – means no resource elasticity, no isolation between users, compute coupled with data storage.

• Stable workloads from monitoring, security, experiment operations (NXCals) with data in HDFS/Kafka

• Ad-hoc workloads from physics communities interested in analysing data at scale using external storages (EOS/S3).

Piotr MrowczynskiCloud-native Spark on Kubernetes 3

Page 9: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Data and operational challenges

Piotr MrowczynskiCloud-native Spark on Kubernetes 4

Page 10: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

1. Grow of data brings new challenges

Mass Storage of LHC data for analysis at CERN - EOS

Currently >250PB, 12.3 PB ingest only in October 2017

Piotr MrowczynskiCloud-native Spark on Kubernetes 5

Page 11: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Piotr Mrowczynski

Physics analysis has to be reproducible! YARN clusters are difficult to provision, and difficult to setup with required environment.

2. Reproducible analysis

Cloud-native Spark on Kubernetes 6

Page 12: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Addressing the challenges

Piotr MrowczynskiCloud-native Spark on Kubernetes 7

Page 13: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Piotr Mrowczynski

Literature study

„Cloud-native is an approach to building and running applications that exploits the advantages of the cloud computing model” [1]

Companies attempted cloud-native Spark deployments using Kubernetes. Native resource scheduler implementation

had experimental release in Apache Spark in March 2018.

Research from Google defends the hypothesis, that in cluster computing disk-locality is becoming irrelevant nowadays.

Databricks and Accenture Labs benchmarked Spark with data in external storages compared to HDFS.

1. “Kubernetes becomes the first project to graduate from the cloud native computing foundation.” https://www.forbes.com/sites/janaki-rammsv/2018/03/07/kubernetes-becomes-the-first-project-to-graduate-from-the-cloud-native-computing-foundation.

Cloud-native Spark on Kubernetes 8

Page 14: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Separating Compute and Storage for Elastic Big Data

Spark

HDFSSpark

External Storage (EOS, S3, GCS + Kafka,

HDFS)

Data Locality Data External (network)

Piotr MrowczynskiCloud-native Spark on Kubernetes 10

Page 15: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Separating Compute and Storage for Elastic Big Data

Spark

HDFSSpark

External Storage (EOS, S3, GCS + Kafka,

HDFS)

- assumes disk bandwidth is higher than storage throughput over network

- assumes mass storage throughput higher than cluster disk bandwidth

Piotr Mrowczynski

Data Locality Data External (network)

Cloud-native Spark on Kubernetes 11

Page 16: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Separating Compute and Storage for Elastic Big Data

Spark

HDFSSpark

External Storage (EOS, S3, GCS + Kafka,

HDFS)

- assumes disk bandwidth is higher than storage throughput over network

- limited elasticity (both for computation and storage)

- assumes mass storage throughput higher than cluster disk bandwidth

- allows compute not being bound to storage (with cloud elasticity)

Piotr Mrowczynski

Data Locality Data External (network)

Cloud-native Spark on Kubernetes 11

Page 17: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Separating Compute and Storage for Elastic Big Data

Spark

HDFSSpark

External Storage (EOS, S3, GCS + Kafka,

HDFS)

- assumes disk bandwidth is higher than storage throughput over network

- limited elasticity (both for computation and storage)

- avoids high network bandwidth for analysis on large datasets

- assumes mass storage throughput higher than cluster disk bandwidth

- allows compute not being bound to storage (with cloud elasticity)

- assumes network delivers data to CPU faster than local disks, might generate high traffic

Piotr Mrowczynski

Data Locality Data External (network)

Cloud-native Spark on Kubernetes 11

Page 18: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Possible solution for storage elasticity

- mass storage services are easier/cheaper to scale - data stored on disk can be large, and compute nodes can be adjusted to compute needs

Piotr MrowczynskiCloud-native Spark on Kubernetes 12

Page 19: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Possible solution for storage and compute elasticity, reproducibility

- Openstack CaaS (Containers-as-a-Service) with Kubernetes provides compute elasticity and authentication mechanism (certificates).

[1] https://github.com/GoogleCloudPlatform/spark-on-k8s-operator

Piotr MrowczynskiCloud-native Spark on Kubernetes 16

Page 20: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Possible solution for storage and compute elasticity, reproducibility

- Openstack CaaS (Containers-as-a-Service) with Kubernetes provides compute elasticity and authentication mechanism (certificates).

- Separation of monolithic services simplifies operational effort

[1] https://github.com/GoogleCloudPlatform/spark-on-k8s-operator

Piotr MrowczynskiCloud-native Spark on Kubernetes 16

Page 21: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Possible solution for storage and compute elasticity, reproducibility

- Openstack CaaS (Containers-as-a-Service) with Kubernetes provides compute elasticity and authentication mechanism (certificates).

- Separation of monolithic services simplifies operational effort

- Just Spark-Submit with Kubernetes only allows to create drivers and executors (not so friendly to manage).

- Spark K8S Operator bring feature parity known from YARN (managing Spark Applications)

- Spark Operator gives idiomatic submission, application restart policies, cron support, custom volume/config/secret mounts, pod affinity/anti-affinity [1] https://github.com/GoogleCloudPlatform/spark-on-k8s-operator

Piotr MrowczynskiCloud-native Spark on Kubernetes 16

Page 22: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Containers-as-a-Service to automate deployment

BACKUP SLIDES !22

Page 23: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Spark Operator Deployment

(long-running operator for batch)

Cloud Infrastructure

Kubernetes Resource Manager

openstack coe clusterCreate/Resize/Delete/GetConfiguration for

Kubernetes clusters

Spark Drivers/Executors Pods (containers)

Piotr Mrowczynski

Containers-as-a-Service for compute elasticity

Cloud-native Spark on Kubernetes 13

Page 24: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Spark Operator Deployment

(long-running operator for batch)

Cloud Infrastructure

Kubernetes Resource Manager

openstack coe clusterCreate/Resize/Delete/GetConfiguration for

Kubernetes clusters

Helm Create/Delete/Update

Spark Operator to control batch/stream jobs

Spark Drivers/Executors Pods (containers)

Piotr Mrowczynski

Containers-as-a-Service for compute elasticity

Cloud-native Spark on Kubernetes 13

Page 25: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Spark Operator Deployment

(long-running operator for batch)

Cloud Infrastructure

Kubernetes Resource Manager

openstack coe clusterCreate/Resize/Delete/GetConfiguration for

Kubernetes clusters

sparkctl Submit batch/stream jobs

to K8S via Operator(cluster-mode)

Helm Create/Delete/Update

Spark Operator to control batch/stream jobs

Spark Drivers/Executors Pods (containers)

Piotr Mrowczynski

Containers-as-a-Service for compute elasticity

Cloud-native Spark on Kubernetes 13

Page 26: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Spark Operator Deployment

(long-running operator for batch)

Cloud Infrastructure

Kubernetes Resource Manager

openstack coe clusterCreate/Resize/Delete/GetConfiguration for

Kubernetes clusters

sparkctl Submit batch/stream jobs

to K8S via Operator(cluster-mode)

Helm/Kubectl Create/Delete/Update

Spark Operator to control batch/stream jobs

Spark Drivers/Executors Pods (containers)

Piotr Mrowczynski

Containers-as-a-Service for compute elasticity

Container Registry Fetch containers with

required software

Cloud-native Spark on Kubernetes 13

spark-submit Create driver on behalf

of user (customized by operator)

Page 27: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Workflow Reproducibility

Dockerfiles with required software packages, configurations etc

Cloud Infrastructure

Kubernetes Resource Manager

Baremetal Infrastructure

Kubernetes Resource Manager

Piotr Mrowczynski

- Kubernetes and use of containers allows Spark developers to build their isolated and reproducible environments

- Encapsulate software packages, libraries, mounting configmaps, volumes

Cloud-native Spark on Kubernetes 14

Page 28: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

IT-DB-SAS !28

Spark K8S Operator Democern.ch/spark-user-guide

Spark Operator Deployment

(long-running operator for batch)

Cloud Infrastructure

Kubernetes Resource Manager

Create cluster Ref: Spark/K8S Cluster Admin Guide

Spark Drivers/Executors Pods (containers)

Install Spark Operator Ref: Spark/K8S Cluster Admin Guide

Submit jobs Ref: Spark/K8S User Guide

Page 29: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Spark K8S Operator Submitting application

- Spark Operator watches for create/delete/update events of SparkApplication

- Spark Operator executes spark-submit with required configurations

- Data is uploaded/read from external storage service, and monitoring is realised with internal/external monitoring tools

Piotr MrowczynskiCloud-native Spark on Kubernetes 18

Page 30: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Spark K8S Operator Running application

- Spark Application Controller handles restarts and resubmissions.

- Submission Runner executes spark-submit

- Spark Pod Monitor reports updates of pods to controller

- Mutating Admission WebHook handles customising Docker images and node affinities

BACKUP SLIDES !30

Page 31: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

IT-DB-SAS !31

Openstack Containers-as-a-Service

Magnum - responsible for Kubernetes deployment

Nova VMs Nova Ironic (baremetal)

Heat - composition of applications in the cloud

Nova - resource provisioning

Neutron - network provisioning

Page 32: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

IT-DB-SAS !32

Spark K8S Operator Democern.ch/spark-user-guide

Spark Operator Deployment

(long-running operator for batch)

Cloud Infrastructure

Kubernetes Resource Manager

Create cluster Ref: Spark/K8S Cluster Admin Guide

Spark Drivers/Executors Pods (containers)

Install Spark Operator Ref: Spark/K8S Cluster Admin Guide

Submit jobs Ref: Spark/K8S User Guide

For Spark-Pi you are ready to go

For TPCDS: Go to IT-DB Keepass to get CS3.CERN.CH (S3) credentials

Page 33: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Evaluation of Spark on Kubernetes

Piotr MrowczynskiCloud-native Spark on Kubernetes 19

Page 34: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Evaluating Persistent Storage for Cloud-native Spark

Data over network

Persistent Storage Feature

Network Volume Mounts

Native Storage Connectors

Object Storage Connectors Monitoring Storage

ExampleCephFS,

CVMFS ,AzureDisk, EFS

HDFS, EOS, Kafka GCS, S3, Azure InfluxDB, Stackdriver, Prometheus

Use-caseSoftware Packages,

Streaming Checkpoints, Logs

Data processing Data processing, Logs, Checkpoints Events, Statistics

Spark Transactional Writes No Yes Possible -

Eventual Consistency Writes No No Yes -

Piotr MrowczynskiCloud-native Spark on Kubernetes 20

Page 35: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Memory management and Cloud-native SparkHow Kubernetes fits to Spark Memory Management

Piotr MrowczynskiCloud-native Spark on Kubernetes 21

Page 36: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Memory management and Cloud-native SparkHow Kubernetes fits to Spark Memory Management

- Kubelet monitors memory and disk available to the Node to prevent node instability.

Piotr MrowczynskiCloud-native Spark on Kubernetes 21

Page 37: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Memory management and Cloud-native SparkHow Kubernetes fits to Spark Memory Management

- Kubelet monitors memory and disk available to the Node.

- Detecting MemoryPressure or System Out Of Memory and reclaiming resources from running containers (OOMKilled errors)

Piotr MrowczynskiCloud-native Spark on Kubernetes 21

Page 38: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Memory management and Cloud-native SparkHow Kubernetes fits to Spark Memory Management

Piotr Mrowczynski

- MemoryOverheadFactor - memory for off-heap memory, non-JVM processes (e.g. in case of Python higher limit) and processes required for operation of container.

- OOMKills tuned by spark.memory.fraction and spark.memory.storageFraction

Cloud-native Spark on Kubernetes

- Kubelet monitors memory and disk available to the Node.

- Detecting MemoryPressure or System Out Of Memory and reclaiming resources from running containers (OOMKilled errors)

21

Page 39: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Persistent Storage for Cloud-native Spark Large scale data reduction in Kubernetes

“Dimuon Mass Calculation on 20TB, max 180 executors”

Job execution duration trendThroughput trend

Piotr Mrowczynski

- After optimisations (read-ahead, speculation) very high resource utilisation and significant improvement of execution time

- Workload scaled near-linear with increasing number of executors

Cloud-native Spark on Kubernetes 24

Page 40: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Persistent Storage for Cloud-native Spark

- InfluxDB to monitor the cluster usage, CPU/Network usage was pretty stable

Large scale data reduction in Kubernetes “Dimuon Mass Calculation on 20TB, max 180 executors”

Piotr MrowczynskiCloud-native Spark on Kubernetes 25

Page 41: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Network impact of overtuning the read-ahead

20 GB/s cluster read throughput for a duration of calculation accross 500 VMs (>200 hypervisors) with high read-ahead parameter

Piotr Mrowczynski

“Dimuon Mass Calculation with max available resources”

Cloud-native Spark on Kubernetes 23

Page 42: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Overhead of Cloud-native Spark Attempt to asses what user can expect with

our on-premise YARN and cloud K8S

Piotr Mrowczynski

- Very small scale of 100GB EOS TPCDS SQL Benchmark.

- Spark 2.3 - 8 Executors - 2CPU/7GB RAM

- Metrics collected per query and exported using SparkMeasure [1]

Cloud-native Spark on Kubernetes

[1] https://github.com/cerndb/sparkMeasure

26

Page 43: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Overhead of Cloud-native Spark Attempt to asses what user can expect with

our on-premise YARN and cloud K8S

Piotr MrowczynskiCloud-native Spark on Kubernetes 27

Page 44: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Overhead of Cloud-native Spark Attempt to asses what user can expect with

our on-premise YARN and cloud K8S

Piotr MrowczynskiCloud-native Spark on Kubernetes 28

Page 45: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Target use cases / workloads for Spark on Kubernetes

Piotr Mrowczynski

- Physics analysis with Spark - Physics analysis with EP-SFT (ROOT team) RDataframe approach - CMS Big Data Project with hadoop-xrootd connector and spark-root

package

- Usage of Spark as compute engine to analyse data stored in external storage systems (non-physics)

- Stable Streaming workloads with Kafka

Cloud-native Spark on Kubernetes 31

Page 46: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Service Deliverables / Future work

Piotr Mrowczynski

- Contribution to the upstream spark on kubernetes operator

- Helm charts to deploy spark operator for customisation of Spark for CERN infrastructure / use cases

- https://gitlab.cern.ch/db/spark-service/spark-service-charts

- Monitoring of Spark workloads for efficient usage of compute resources

- Consultancy for on boarding of the users and curated examples for various use cases

- Support of Spark on kubernetes

- Easier interaction with the service - Web application ? - SWAN ?

Cloud-native Spark on Kubernetes 31

Page 47: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Service Deliverables / Future work

Piotr MrowczynskiCloud-native Spark on Kubernetes 31

- Spark Streaming and Mounting CVMFS to driver/executor

- Monitoring of Spark on kubernetes workloads

- Ease of interacting with the service for creation of the cluster and submission of spark jobs - Integrate with Jupyter Notebooks (CERN SWAN) for interactive/batch analysis

- Working with the user community on the adoption of the new compute model

- Federated multi-cloud clusters ?

- Benchmark resource scheduling (start-up time of K8S and YARN for provisioning and setting up executors) ?

Page 48: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Conclusions

Piotr Mrowczynski

- Openstack provisions Kubernetes cluster of 10s of nodes in automated fashion (CaaS).

- Kubernetes allows reproducible Spark and limited operational effort using Docker containers and microservices

- Just Spark-Submit with Kubernetes only allows to create drivers and executors (not so friendly to manage)

- Spark K8S Operator provides management of Spark Applications similar to ones in YARN ecosystem

Cloud-native Spark on Kubernetes 29

Page 49: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Conclusions

Piotr Mrowczynski

Persistent Storage Feature

Network Volume Mounts

Native Storage Connectors

Object Storage Connectors

Monitoring Storage

Example CephFS, CVMFS EOS, Kafka S3 [1] InfluxDB

Use-caseIn evaluation for

logs and checkpoints

Data processing

Logs, Checkpoints

(streaming use-case), Data

processing (in evaluation)

Events, Statistics

- Native Storage Connectors best for data processing

- Network Volume Mounts and Objectstorage Connectors good for logs and checkpoints

- InfluxDB good for monitoring data

Cloud-native Spark on Kubernetes 29

- Openstack provisions Kubernetes cluster of 10s of nodes in automated fashion (CaaS).

- Kubernetes allows reproducible Spark and limited operational effort using Docker containers and microservices

- Just Spark-Submit with Kubernetes only allows to create drivers and executors (not so friendly to manage)

- Spark K8S Operator provides management of Spark Applications similar to ones in YARN ecosystem

Page 50: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Conclusions

Piotr Mrowczynski

- Spark/K8S, as Spark/YARN as native resource schedulers, handles sustained ad-hoc large-scale workloads from external storages well. Spark/YARN had ~40 nodes, Spark/K8S 1000 VMs (~200 nodes)

- Without data locality, network can be a serious problem/bottleneck

Cloud-native Spark on Kubernetes 30

Page 51: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Conclusions

Piotr Mrowczynski

- Kubernetes, as YARN, controls the executor memory. Similarly, proper tuning to avoid OOMKilled

- With proper data access tuning and speculation enabled, more resources (executors) near-linearly scaled workload

- TPCDS shown overhead of cloud-native Spark/K8S compared to Spark/YARN in similar workload setup. Shared Spark/YARN with spikes due to other heavy users

Cloud-native Spark on Kubernetes

- Spark/K8S, as Spark/YARN as native resource schedulers, handles sustained ad-hoc large-scale workloads from external storages well. Spark/YARN had ~40 nodes, Spark/K8S 1000 VMs (~200 nodes)

- Without data locality, network can be a serious problem/bottleneck

30

Page 52: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Thank you! Questions?

Piotr Mrowczynski

Special thanks for support: Supervisors at CERN: Prasanth Kothuri, Vaggelis Motesnitsalis

Cloud-native Spark on Kubernetes 32

Page 53: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Backup slides

BACKUP SLIDES !53

Page 54: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

State-of-the-art Conclusions

Piotr MrowczynskiCloud-native Spark on Kubernetes 28

Data Locality

Data Externality

Batch/StreamInteractive

Datacenter Networking

Cluster Networking

On-Premise Performance Effective

Spark/K8S Operator

Spark/YARNSpark/K8S

External Storage

Spark/YARN HDFS

Spark/YARN External StorageSpark/YARN

Spark/YARN

Cloud-Native Cost Effective

Spark/K8S Future?

• Stable streaming workloads with data in Kafka

• Ad-hoc workloads from physics and other communities interested in analyzing data at scale using external storages (EOS/S3).

Page 55: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

BACKUP SLIDES

Example of large-scale ad-hoc/external-storage workload

Physics data mining

Detect particle interactions (data),

compare with theory predictions (simulation)

Particle detection analysis

Large scale data reduction facility

!55

Page 56: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

EOS [1]uses namespace, files

are replicated, optimized for ROOT file access,

storage service

[1] AJ Peters and J Lukasz ”Exabyte Scale Storage at CERN.”, 2011.

S3, EOS and HDFSHDFS

uses namespace, files are chunked into equal blocks and replicated,

compute service

NamespaceNamespace

EOS Node

File 1

HDFS Datanode

Block 1

Block 3

HDFS Datanode

Block 2

Block 3

HDFS Datanode

Block 1

Block 2

EOS Node

File 1

S3 [1]objects are replicated,

objects can have custom ACLs and metadata,

storage service

Ceph Node

Object ID1

“File” Metadata For Object ID1

Ceph Node

Object ID1

Page 57: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

IT-DB-SAS !57

Kubernetes Federation across clouds

Page 58: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

Allows Spark developers to build their isolated and reproducible environments (software packages, libraries, mounting configmaps, volumes)

with use of containers.

Scales horizontally, self-healing system

Multi-cloud support

Elastic resource provisioning in the cloud, resources autoscaling in public clouds

Large and industry-grade community

Automating deployment, scaling, and management of containerized applications.

Kubernetes and Data Processing

BACKUP SLIDES !58

Page 59: IT-DB-SAS Internal Training · 2018. 11. 22. · openstack coe cluster ... BACKUP SLIDES !30. IT-DB-SAS !31 Openstack Containers-as-a-Service Magnum - responsible for Kubernetes deployment

“Dimuon Mass Calculation” in YARN ROOT Files Dataset skew in HDFS vs Mass Storage

- Our 20TB (6100 ROOT files) dataset has been stored in HDFS, and in mass storage EOS

- In HDFS in most servers we had big number of block replicas for the same dataset

- In EOS in most servers we had small number of file replicas for the same dataset. Large number if replicas only on small subset of servers

- Not easy to keep HDFS in balance in mass scale and high write rate

BACKUP SLIDES !59