Managing a Multi-Tenant Data Lake

41
Managing A Multi-Tenant Data Lake

Transcript of Managing a Multi-Tenant Data Lake

Page 1: Managing a Multi-Tenant Data Lake

Managing A Multi-Tenant Data Lake

Page 2: Managing a Multi-Tenant Data Lake

2Copyright 2016 Comcast Corporation. All Rights Reserved

Agenda Timeline for Evolution Why Governance Multi-Tenancy Anti-Patterns / Warning Signs Instituting Governance Managing through Chaos Monitoring/Metrics Environment Tools SLA Management Support and Staffing Demo - Command Center

Page 3: Managing a Multi-Tenant Data Lake

3Copyright 2016 Comcast Corporation. All Rights Reserved

Timeline – 2013

2013 – “The Experiment” Started with 10 node cluster

Experimentation with batch processing and enrichment of event data

Team assembled from across organization

Primarily solving single use case

30 nodes by end of trial

2 Racks

Page 4: Managing a Multi-Tenant Data Lake

4Copyright 2016 Comcast Corporation. All Rights Reserved

Timeline – 2014 (H1)

2014 Production “Honeymoon”

Added 70 more nodes along with lower environments (Dev & QA)

Onboard additional ~20 data sets through batch ETL

Supporting a dozen use cases

5 Racks

Page 5: Managing a Multi-Tenant Data Lake

5Copyright 2016 Comcast Corporation. All Rights Reserved

Timeline – 2014 (H2)

2014 Production “Tiger’s Tail”

Total of 200 nodes to support additional use cases (data science)

Total of ~30 more data sets through batch ETL

Supporting several dozen use cases and ad-hoc exploration

Starting to have difficulty managing resource requests

9 Racks

Page 6: Managing a Multi-Tenant Data Lake

6Copyright 2016 Comcast Corporation. All Rights Reserved

Timeline – 2015

2015 Production “Cortez”

Adding 250 more nodes to production environment

Fully embraced governance

Supporting 24x7 production use cases

19 Racks

Page 7: Managing a Multi-Tenant Data Lake

7Copyright 2016 Comcast Corporation. All Rights Reserved

Timeline – 2016

2016 Production “Planetary”

Adding 1300 more nodes to production environment

Standing up separate 500 node data science cluster

Spinning off critical compute to boundary satellite clusters

Reaping benefits from governance and resource planning

48 Racks

Page 8: Managing a Multi-Tenant Data Lake

8Copyright 2016 Comcast Corporation. All Rights Reserved

Why Governance?

It’s about establishing acceptable behaviors for the benefit of the community

Minimize user/application impact on cluster

Users will do whatever is technically possible Everyone has been conditioned to work “smarter not harder”

Establishing the guardrails not edicts.

Page 9: Managing a Multi-Tenant Data Lake

9Copyright 2016 Comcast Corporation. All Rights Reserved

Multi-Tenancy Anti-Patterns

Speculative Execution

Optional User Training

Lack of Resource Isolation

Lack of Testing and Measurement

Ad-hoc Communication Channels

Excessive Resource Utilization/Reservation

Informal Service Level Agreements (SLAs)

Public Domain: Plynn9

Page 10: Managing a Multi-Tenant Data Lake

10Copyright 2016 Comcast Corporation. All Rights Reserved

Signs of Looming Disaster

Pending Jobs

Queue Fidgeting

Job Rescheduling

Non Predictive Workloads

Cluster Storage Out Of Balance

Public Domain: US DOE

Page 11: Managing a Multi-Tenant Data Lake

11Copyright 2016 Comcast Corporation. All Rights Reserved

Instituting Governance

Governance is not a technology problem

Governance must be solved using People - Who Processes – What / When / How Policy – Why

Always employ technology to help with enforcement and measurement

Page 12: Managing a Multi-Tenant Data Lake

12Copyright 2016 Comcast Corporation. All Rights Reserved

Setting Out Governance Standards – Starting Out

Involve the business users to define light-weight policies and processes Onboarding users/applications/tools Resource Utilization Worksheets Deployment checklists Service Level Agreements / Penalties Updates of Governance Standards

You MUST socialize and educate your community on these policies and process

Strive for evolution not revolution

Page 13: Managing a Multi-Tenant Data Lake

13Copyright 2016 Comcast Corporation. All Rights Reserved

Setting Out Governance Standards – Measurement

Define universally accepted performance measures Storage Compute System Availability Issues and MTTR Average Completion Time Average Pending Apps

Be transparent with results and make them available to entire community

Establish monthly performance reviews with key stakeholders

Page 14: Managing a Multi-Tenant Data Lake

14Copyright 2016 Comcast Corporation. All Rights Reserved

Setting Out Governance Standards – Enforcement

Lock down as many resources as possible

Monitor resource utilization for compliance

Automate corrective measures

Its all about transitioning from defense to offense and eliminating surprises!

Page 15: Managing a Multi-Tenant Data Lake

15Copyright 2016 Comcast Corporation. All Rights Reserved

Setting Out Governance Standards – Enforcement

Hadoop provides some base capabilities YARN Queues for compute HDFS Quotas/ACLs for storage

Implement custom solutions for proactive offensive capabilities Job monitoring and migration (Penalty Box) Dynamic Allocation / Queue Flexing Monitor and track leading indicators (Command Center)

Page 16: Managing a Multi-Tenant Data Lake

16Copyright 2016 Comcast Corporation. All Rights Reserved

Multi-Tenancy: Understanding the Chaos - Monitoring/Metrics

Image Attribution: Pixabay - Creative Commons CC0

Page 17: Managing a Multi-Tenant Data Lake

17Copyright 2016 Comcast Corporation. All Rights Reserved

Use Case – Extreme Ad Hoc (Data Science)

Page 18: Managing a Multi-Tenant Data Lake

18Copyright 2016 Comcast Corporation. All Rights Reserved

Use Case – Extreme Ad Hoc (Data Science)

Page 19: Managing a Multi-Tenant Data Lake

19Copyright 2016 Comcast Corporation. All Rights Reserved

Challenges? You bet!

Page 20: Managing a Multi-Tenant Data Lake

20Copyright 2016 Comcast Corporation. All Rights Reserved

Challenges Monitoring and Managing a Multi-tenant Hadoop Environment – Diverse User Community

Div

erse

Use

r Com

mun

ity

Images: Creative Commons

Page 21: Managing a Multi-Tenant Data Lake

21Copyright 2016 Comcast Corporation. All Rights Reserved

Challenges Monitoring and Managing a Multi-tenant Hadoop Environment - SLAs

Div

erse

SLA

s

Page 22: Managing a Multi-Tenant Data Lake

22Copyright 2016 Comcast Corporation. All Rights Reserved

Challenges Monitoring and Managing a Multi-tenant Hadoop Environment - Governance

Images: Creative Commons

Page 23: Managing a Multi-Tenant Data Lake

23Copyright 2016 Comcast Corporation. All Rights Reserved

Challenges Monitoring and Managing a Multi-tenant Hadoop Environment – Monitoring & Forecasting

Images: Creative Commons

Page 24: Managing a Multi-Tenant Data Lake

24Copyright 2016 Comcast Corporation. All Rights Reserved

Environment

Page 25: Managing a Multi-Tenant Data Lake

25Copyright 2016 Comcast Corporation. All Rights Reserved

Our Environment - Tools for Monitoring

Standard Hadoop Monitoring

Page 26: Managing a Multi-Tenant Data Lake

26Copyright 2016 Comcast Corporation. All Rights Reserved

Environment - Tools for Monitoring

Command Center

Pepperdata

Page 27: Managing a Multi-Tenant Data Lake

27Copyright 2016 Comcast Corporation. All Rights Reserved

SLA Management

Application Timing

Images: Creative Commons

Page 28: Managing a Multi-Tenant Data Lake

28Copyright 2016 Comcast Corporation. All Rights Reserved

SLA Management

Application Timing

Resource Management

Images: Creative Commons

Page 29: Managing a Multi-Tenant Data Lake

29Copyright 2016 Comcast Corporation. All Rights Reserved

SLA Management

Application Timing

Resource Management

Capacity Management

Images: Creative Commons

Page 30: Managing a Multi-Tenant Data Lake

30Copyright 2016 Comcast Corporation. All Rights Reserved

Support & Staffing

Images: Creative Commons

Page 31: Managing a Multi-Tenant Data Lake

31Copyright 2016 Comcast Corporation. All Rights Reserved

Takeaways for DevOps Model in Hadoop

Train Your Teams (!!!)

Page 32: Managing a Multi-Tenant Data Lake

32Copyright 2016 Comcast Corporation. All Rights Reserved

Takeaways for DevOps Model in Hadoop

Train Your Teams (!!!)

Measure, Forecast and Model

Page 33: Managing a Multi-Tenant Data Lake

33Copyright 2016 Comcast Corporation. All Rights Reserved

Takeaways for DevOps Model in Hadoop

Train Your Teams (!!!)

Measure, Forecast and Model

Automation and Frameworks

Page 34: Managing a Multi-Tenant Data Lake

34Copyright 2016 Comcast Corporation. All Rights Reserved

Comcast Command Center

Page 35: Managing a Multi-Tenant Data Lake

35Copyright 2016 Comcast Corporation. All Rights Reserved

The Command Center: Our Focus

Visualizations & Design

Page 36: Managing a Multi-Tenant Data Lake

36Copyright 2016 Comcast Corporation. All Rights Reserved

Ease Of Use

Visualizations & Design

The Command Center: Our Focus

Page 37: Managing a Multi-Tenant Data Lake

37Copyright 2016 Comcast Corporation. All Rights Reserved

Visualizations & Design

Ease Of Use

Extensibility

The Command Center: Our Focus

Page 38: Managing a Multi-Tenant Data Lake

38Copyright 2016 Comcast Corporation. All Rights Reserved

Visualizations & Design

Ease Of Use

Extensibility

Alerting

The Command Center: Our Focus

Page 39: Managing a Multi-Tenant Data Lake

39Copyright 2016 Comcast Corporation. All Rights Reserved

The Command Center for Monitoring and Alerting

• Missed SLAs• Guardrails broken

• Definitions• Links

• Containers• Queue capacity

• Status• Measures

• HDFS Usage• Queue Usage

Continuous Evolution

Continuous Engagement

Page 40: Managing a Multi-Tenant Data Lake

40Copyright 2016 Comcast Corporation. All Rights Reserved

Monitoring and Alerting at Comcast

The Command Center!

Page 41: Managing a Multi-Tenant Data Lake

41Copyright 2016 Comcast Corporation. All Rights Reserved

Thanks!

Ray HarrisonPrinciple DevOps Architect

Mike FaganPrinciple Big Data Architect

[email protected] [email protected]

We Are Hiring!