Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

34
Telmo Oliveira, Toon Using Spark in the Cloud: A Devops perspective

Transcript of Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

Page 1: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

Telmo Oliveira, Toon

Using Spark in the Cloud: A Devops perspective

Page 2: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Page 3: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Page 4: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

4put your #assignedhashtag here by setting the footer in view-header/footer

Page 5: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

5put your #assignedhashtag here by setting the footer in view-header/footer

Page 6: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

6put your #assignedhashtag here by setting the footer in view-header/footer

Page 7: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Page 8: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

8

Page 9: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

9

Page 10: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

Requirements

10

• Seamless transition

• Ensure data anonymity

• Move fast, optimise later

• Ensure multi-tenancy

• As little disturbance as possible to the DS team

Page 11: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

11

Page 12: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

12

Page 13: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

13

Page 14: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

14

• Cluster timeouts• Autoscaling• Spot instances• Well documented API

Page 15: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

15

Infrastructure as code

• Repeatability• Fast deployment• Resilience• Documentation

Page 16: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

16

Page 17: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

17

Terraform• S3 Buckets• EC2 instances• Network topology• Log management• RDS instances• IAM roles/policies

Page 18: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

18

Terraform• S3 Buckets• EC2 instances• Network topology• Log management• RDS instances• IAM roles/policies

Page 19: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

19

Ansible

• User management

• Databases and ACLs

• Custom app deployment

Page 20: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

20

Ansible

• User management

• Databases and ACLs

• Custom app deployment

Page 21: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

21

ArchitectureOverview

Page 22: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

22

Page 23: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

Airflow

23

Page 24: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

24

Page 25: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

25

Page 26: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

26

Page 27: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

27

Page 28: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

28

• External Hive metastore• Send logs to S3• Authorisation• i3.2xlarge nodes

Page 29: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

Future plans

29

• Streaming

• Real time services

• Improve CI/CD

Page 30: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

What’s all this for?

30

Page 31: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

What’s all this for?

31

Page 32: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

32

Thanks to the team

Aemro AmareBarend GarvelinkBert Jan KatsmanKliment MarkovskiMiquel MonrealStanislava Potupchik

Page 33: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

Questions?

Page 34: Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira