Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn...

104
Managing Containers with Helix Kanak Biscuitwala Jason Zhang Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix

Transcript of Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn...

Page 1: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Managing Containers with Helix

Kanak Biscuitwala Jason Zhang

Apache Helix Committers @ LinkedIn helix.apache.org

@apachehelix

Page 2: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Intersection of Job Types

OracleDB OracleDB

Page 3: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Intersection of Job Types

OracleDB OracleDB

BackupBackup

Page 4: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Intersection of Job Types

OracleDB OracleDB

BackupBackup

HDFS

ETL ETL

Page 5: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Intersection of Job Types

OracleDB OracleDB

BackupBackup

HDFS

ETL ETL

Long-running and batch jobs running together!

Page 6: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Cloud Deployment

A

B

online

nearline

C batch

A1 A1

A2 A3B1

C1 C2

C3

B2 B3

C2

B4 B5

C2 C4

Applications with diverse requirements running together in a datacenter

Page 7: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Cloud Deployment

A

B

C

A1 A1

A2 A3B1

C1 C2

C3

B2 B3

C2

B4 B5

C2 C4

Applications with diverse requirements running together in a datacenter

DB

Backup

ETL

Page 8: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Processes on Machines

Machine ContainerProcess VM

Page 9: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Processes on Machines

TaskTaskProcess

No Isolation

Machine ContainerProcess VM

Page 10: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Processes on Machines

TaskTaskProcess

128 MB

128 MB

128 MB

Process

Process

Process

No Isolation VM-based Isolation

Machine ContainerProcess VM

Page 11: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Processes on Machines

TaskTaskProcess

256 MB

64 MB

128 MB

128 MB

128 MB

Process

Process

Process Process

Process

No Isolation VM-based Isolation Container-based Isolation

Machine ContainerProcess VM

Page 12: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

• Run as individual processes – Poor isolation or poor utilization

• Virtual machines – Better isolation – Xen, Hyper-V, ESX, KVM

• Containers – cgroup – YARN, Mesos – Super lightweight, dynamic based on application

requirements

Processes on Machines

Page 13: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Processes on Machines

Virtualization and containerization significantly improve process isolation and open up possibilities for efficient

utilization of physical resources

Page 14: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Container-Based Solution

Page 15: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Container-Based SolutionSystem Requirements

A

B

C

64 MB 64 MB 64 MB

128 MB 128 MB

256 MB

Page 16: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Container-Based SolutionAllocation

64 MB

64 MB

128 MB

256 MB

128 MB

64 MB

Machine

Container

Page 17: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Container-Based SolutionAllocation

64 MB

64 MB

128 MB

256 MB

128 MB

64 MB

Machine

Container

A

A

A

B

B

C

Process

Page 18: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Container-Based SolutionAllocation

64 MB

64 MB

128 MB

256 MB

128 MB

64 MB

Containerization is powerful!

Machine

Container

A

A

A

B

B

C

Process

Page 19: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Container-Based SolutionAllocation

64 MB

64 MB

128 MB

256 MB

128 MB

64 MB

Containerization is powerful!

Machine

Container

A

A

A

B

B

C

Process

But do processes always fit so nicely?

Page 20: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Over-Utilization

256 MB

Container-Based Solution Machine

ContainerProcess

Page 21: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Over-Utilization

256 MBProcess 1

Container-Based Solution Machine

ContainerProcess

Page 22: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Over-Utilization

Outcome: Preemption and relaunch

256 MBProcess 1

Container-Based Solution Machine

ContainerProcess

Page 23: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Over-Utilization

Outcome: Preemption and relaunch

Container-Based Solution

384 MB

Machine

ContainerProcess

Page 24: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Over-Utilization

Outcome: Preemption and relaunch

Container-Based Solution

384 MBProcess 1

Machine

ContainerProcess

Page 25: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Under-Utilization

384 MB

128 MB

Container-Based Solution Machine

ContainerProcess

Page 26: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Under-Utilization

Outcome: Over-provisioned until restart

384 MBProcess 1

128 MB

Container-Based Solution Machine

ContainerProcess

Process 2

Page 27: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Container-Based SolutionFailure

64 MB

64 MB

128 MB

256 MB

128 MB

64 MB

Machine

Container

A

A

A

B

B

C

Process

Page 28: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Container-Based SolutionFailure

64 MB

64 MB

128 MB

128 MB

Machine

Container

A

A

B

B

Process

Page 29: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Container-Based SolutionFailure

64 MB

64 MB

128 MB

128 MB

Outcome: Launch containers elsewhere

Machine

Container

A

A

B

B

Process

256 MBC64 MBA

What about stateful systems?

Page 30: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Container-Based SolutionFailure

64 MB

64 MB

128 MB

256 MB

128 MB

64 MB

Machine

Container

SLAVE

SLAVE

MASTER

B

B

C

Process

Page 31: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Container-Based SolutionFailure

64 MB

64 MB

128 MB

128 MB

Without additional information, the master is unavailable until restart

Machine

Container

SLAVE

SLAVE

B

B

Process

Page 32: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

ScalingContainer-Based Solution Machine

ContainerProcess

256 MB50% 256 MB50%

Page 33: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

ScalingContainer-Based Solution Machine

ContainerProcess

Page 34: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

ScalingContainer-Based Solution Machine

ContainerProcess

128 MB33% 128 MB33% 128 MB33%

Outcome: Relaunch with new sharding

Page 35: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Container-Based Solution

Container-Based Solution

Utilization Application requirements define container size

Fault Tolerance New container is started

Scaling Workload is repartitioned and new containers are brought up

Discovery Existence

Page 36: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Container-Based Solution

We need something finer-grained

The container model provides flexibility within machines, but assumes homogeneity of tasks within containers

Page 37: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Task-Based Solution

Page 38: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Task-Based SolutionSystem Requirements

A

B

C

complete in less than 5 hours

always have 2 containers running

response time should be less than 50 ms

Page 39: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Task-Based SolutionAllocation

Machine

Container

A A

B

Task

B

C

C

Page 40: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Over-UtilizationTask-Based Solution Machine

ContainerTask

Page 41: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Over-UtilizationTask-Based Solution

Task 1

Machine

ContainerTask

Page 42: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Over-UtilizationTask-Based Solution

Task 1

Machine

ContainerTask

Page 43: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Over-UtilizationTask-Based Solution

Task 1

Machine

ContainerTask

Task 1

Page 44: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Over-UtilizationTask-Based Solution

Hide the overhead of a container restart

Machine

ContainerTask

Task 1

Page 45: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Under-Utilization

384 MB

128 MB

Task-Based Solution Machine

ContainerTask

Page 46: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Under-Utilization

384 MBTask 1

128 MBTask 2

Task-Based Solution Machine

ContainerTask

Page 47: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Under-Utilization

Optimize container allocations based on usage

384 MBTask 1

Task 2

Task-Based Solution Machine

ContainerTask

Page 48: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Task-Based SolutionFailure

Task 1 Leader

Task 2 Leader

Task 3 Leader

Task 2 Standby

Task 3 Standby

Task 1 Standby

Task 2 Standby

Task 1 Standby

Task 3 Standby

Machine

Container

Page 49: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Task-Based SolutionFailure

Task 1 Leader

Task 2 Leader

Task 2 Standby

Task 3 Standby

Task 1 Standby

Task 3 StandbyTask 3 Leader

Machine

Container

Page 50: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Task-Based SolutionFailure

Some systems cannot wait for new containers to start

Task 1 Leader

Task 2 Leader

Task 2 Standby

Task 3 Standby

Task 1 Standby

Task 3 StandbyTask 3 Leader

Machine

Container

Page 51: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Task-Based SolutionDiscovery

Task 1 Leader

Task 2 Leader

Task 2 Standby

Machine

Container

Task 1:!Leader at N1 Standby at N2

Task 1 Standby

Task 2:!Leader at N2 Standby at N1

N1 N2

Page 52: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Task-Based SolutionDiscovery

Task 1 Leader

Task 2 Leader

Task 2 Standby

Machine

Container

Learn where everything runs, and what state each task is in

Task 1:!Leader at N1 Standby at N2

Task 1 Standby

Task 2:!Leader at N2 Standby at N1

N1 N2

Page 53: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

ScalingTask-Based Solution

T4

T5

T6

T1

T2

T3

Machine

ContainerTask

Page 54: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

ScalingTask-Based Solution

T4

T5

T6

T1

T2

T3

Machine

ContainerTask

Page 55: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

ScalingTask-Based Solution

T4

T5 T6

T1

T2

T3

Machine

ContainerTask

Page 56: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

ScalingTask-Based Solution

T4

T5 T6

T1

T2

T3

Machine

ContainerTask

Page 57: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Comparing Solutions

Container Solution Task + Container Solution

Utilization Application requirements define container size

Tasks are distributed as needed to a minimal

container set as per SLA

Fault Tolerance New container is startedExisting task can assume a new state while waiting for

new container

ScalingWorkload is repartitioned and new containers are

brought up

Tasks are moved across containers

Discovery Existence Existence and state

Page 58: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Benefits of a Task-Based SolutionComparing Solutions

Container reuseMinimize overhead of container relaunch

Fine-grained scheduling

Page 59: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Benefits of a Task-Based SolutionComparing Solutions

Container reuseMinimize overhead of container relaunch

Fine-grained scheduling

Task : Container :: Thread : ProcessTask is the right level of abstraction

Page 60: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Working at task granularity is powerful

We need a reactive approach to resource assignment

Comparing Solutions

Page 61: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Working at task granularity is powerful

How can Helix help?

We need a reactive approach to resource assignment

Comparing Solutions

Page 62: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Working at task granularity is powerful

How can Helix help?

We need a reactive approach to resource assignment

Comparing Solutions

YARN/Mesos: containers bring flexibility in a machineHelix: tasks bring flexibility in a container

Page 63: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Task Management with Helix

Page 64: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Application Lifecycle

Capacity Planning

Provisioning

Fault Tolerance

State Management

Allocating physical resources for your load

Deploying and launching tasks

Staying available, ensuring success

Determining what code should be running and where

Page 65: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Controller NODES (Participants)

Spectators

ControllerController

ManageTASKS

Helix OverviewCluster Roles

Page 66: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Helix ControllerHigh-Level Overview

Rebalancer

Task Assignment

Constraints

Nodes

“single master” “no more than 3 tasks

per machine”

Page 67: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Helix ControllerRebalancer

ResourceAssignment computeResourceMapping( RebalancerConfig rebalancerConfig, ResourceAssignment prevAssignment, Cluster cluster, ResourceCurrentState currentState);

Based on the current nodes in the cluster and constraints, find an assignment of task to node

Page 68: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Helix ControllerRebalancer

ResourceAssignment computeResourceMapping( RebalancerConfig rebalancerConfig, ResourceAssignment prevAssignment, Cluster cluster, ResourceCurrentState currentState);

Based on the current nodes in the cluster and constraints, find an assignment of task to node

What else do we need?

Page 69: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Helix ControllerWhat is Missing?

Dynamic Container Allocation

Container Isolation

Automated Service Deployment

Resource Utilization Monitoring

Page 70: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Helix ControllerTarget Provider

Based on some constraints, determine how many containers are required in this system

Fixed

CPU

Memory

Bin Packing

We’re working on integrating with monitoring systems in order to query for usage information

Page 71: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Helix ControllerTarget Provider

Based on some constraints, determine how many containers are required in this system

TargetProviderResponse evaluateExistingContainers( Cluster cluster, ResourceId resourceId, Collection<Participant> participants);

class TargetProviderResponse { List<ContainerSpec> containersToAcquire; List<Participant> containersToRelease; List<Participant> containersToStop; List<Participant> containersToStart;}

Fixed

CPU

Memory

Bin Packing

We’re working on integrating with monitoring systems in order to query for usage information

Page 72: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Helix ControllerAdding a Target Provider

Rebalancer

Task Assignment

Constraints

Nodes

Target Provider

Page 73: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Helix ControllerAdding a Target Provider

Rebalancer

Task Assignment

Constraints

Nodes

Target Provider

How do we use the target provider response?

Page 74: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Helix ControllerContainer Provider

Given the container requirements, ensure that number of containers are running

YARN

Mesos

Local

Page 75: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Helix ControllerContainer Provider

Given the container requirements, ensure that number of containers are running

ListenableFuture<ContainerId> allocateContainer(ContainerSpec spec);!ListenableFuture<Boolean> deallocateContainer(ContainerId containerId);!ListenableFuture<Boolean> startContainer(ContainerId containerId, Participant participant);!ListenableFuture<Boolean> stopContainer(ContainerId containerId);

YARN

Mesos

Local

Page 76: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Helix ControllerAdding a Container Provider

Rebalancer

Task Assignment

Constraints

Nodes

Target Provider

Container Provider

Target Provider + Container Provider = Provisioner

Page 77: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Application Lifecycle

Capacity Planning

Provisioning

Fault Tolerance

State Management

Target Provider

Container Provider

Existing Helix Controller (enhanced by Provisioner)

Existing Helix Controller (enhanced by Provisioner)

With Helix and the Task Abstraction

Page 78: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

System Architecture

Page 79: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

System Architecture

Resource Provider

Page 80: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

System Architecture

submit jobResource ProviderClient

Page 81: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

System Architecture

submit jobResource Provider

Controller Container

Provisioner

Rebalancer

Client

App Launcher

Page 82: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

System Architecture

submit jobResource Provider

Controller Container

Provisioner

Rebalancer

Client

container request

App Launcher

Page 83: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

System Architecture

submit jobResource Provider

Controller Container

Provisioner

Rebalancer

Client

container request

Participant Container

Participant Launcher

Helix Participant

App

App Launcher

Page 84: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

System Architecture

submit jobResource Provider

Controller Container

Provisioner

Rebalancer

Client

container request

Participant Container

Participant Launcher

Helix Participant

App

App Launcher

assign tasks

Page 85: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

HDFS/Common Area

Helix + YARNYARN Architecture

ClientResource Manager

Application Master Container

Node Manager Node Manager

submit job

node statusnode statuscontainer request

assign work

status

App Package

grab package

Page 86: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

HDFS/Common Area

Helix + YARNHelix + YARN Architecture

ClientResource Manager

Application Master Container

Node Manager Node Manager

submit job

node statusnode statuscontainer request

assign tasks

status

Helix Controller

Rebalancer

Helix Participant

App

App Package

grab package

Page 87: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

HDFS/Common Area

Scheduler Slave

Helix + MesosMesos Architecture

SchedulerMesos Master

Slave Machine Slave Machine

Mesos SlaveMesos Slave

offer resources

node statusnode status

Mesos Executor

grab executor

Executor Package

offer response

Page 88: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Scheduler SlaveHelix Controller

Helix + MesosHelix + Mesos Architecture

SchedulerMesos Master

Slave Machine Slave Machine

Mesos Slave

Mesos Slave

offer resources

node statusnode status

assign tasks

HDFS/Common Area

Mesos Executor

grab executor

Helix Executor Package

offer response

Helix Participant/App

Page 89: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Example

Page 90: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Distributed Document StoreOverview

OraclePartition 0 Partition 1 Partition 2 Oracle

Partition 0 Partition 1 Partition 2

P1 BackupP2 Backup

HDFS

ETL ETL

Master Slave

OraclePartition 0 Partition 1 Partition 2

P0 Backup

ETL

Page 91: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Distributed Document StoreOverview

OraclePartition 0 Partition 1 Partition 2 Oracle

Partition 0 Partition 1 Partition 2

P1 BackupP2 Backup

HDFS

ETL ETL

Master Slave

P0 Backup

Partition 0 Partition 1 Partition 2

Page 92: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Distributed Document StoreYARN Example

ClientResource Managersubmit job

container request

assign work

status

node status

Application Master

Node Manager

Helix Controller

Rebalancer

Container

Node Manager

node status

Helix Participant

OraclePartition 0 Partition 1

P1 Backup ETL

Page 93: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

YAML SpecificationappConfig: { config: { k1: v1 } }appPackageUri: 'file://path/to/myApp-pkg.tar'appName: myAppservices: [DB, ETL] # the task containersserviceConfigMap: {DB: { num_containers: 3, memory: 1024 }, ... ETL: { time_to_complete: 5h, ... }, ...}servicePackageURIMap: { DB: ‘file://path/to/db-service-pkg.tar', ...}...

Distributed Document Store

Page 94: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

YAML SpecificationappConfig: { config: { k1: v1 } }appPackageUri: 'file://path/to/myApp-pkg.tar'appName: myAppservices: [DB, ETL] # the task containersserviceConfigMap: {DB: { num_containers: 3, memory: 1024 }, ... ETL: { time_to_complete: 5h, ... }, ...}servicePackageURIMap: { DB: ‘file://path/to/db-service-pkg.tar', ...}...

Distributed Document Store

TargetProvider specification

Page 95: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Service/Container Implementation

public class MyQueuerService extends StatelessParticipantService { @Override public void init() { ... }! @Override public void onOnline() { ... }! @Override public void onOffline() { ... }}

Distributed Document Store

Page 96: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Task Implementation

public class BackupTask extends Task { @Override public ListenableFuture<Status> start() { ... }! @Override public ListenableFuture<Status> cancel() { ... }! @Override public ListenableFuture<Status> pause() { ... }! @Override public ListenableFuture<Status> resume() { ... }}

Distributed Document Store

Page 97: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Distributed Document StoreState Model-Style Callbacks

public class StoreStateModel extends StateModel { public void onBecomeMasterFromSlave() { ... }! public void onBecomeSlaveFromMaster() { ... }! public void onBecomeSlaveFromOffline() { ... }! public void onBecomeOfflineFromSlave() { ... }}

Page 98: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

class  RoutingLogic  {        public  void  write(Request  request)  {            partition  =  getPartition(request.key);            List<Participant>  nodes  =                    routingTableProvider.getInstance(                            partition,  “MASTER”);            nodes.get(0).write(request);        }  !      public  void  read(Request  request)  {            partition  =  getPartition(request.key);            List<Participant>  nodes  =                    routingTableProvider.getInstance(partition);            random(nodes).read(request);        }  }

Spectator (for Discovery)Distributed Document Store

Page 99: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Helix at LinkedIn

Page 100: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Helix at LinkedIn

OracleOracleOracleDB

Change Capture

Change Consumers

Index Search Index

User Writes

Data Replicator

Backup/Restore

In Production

ETL

HDFS

Analytics

Page 101: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Helix at LinkedInIn Production

Over 1000 instances covering over 30000 database partitions

Over 1000 instances for change capture consumers

As many as 500 instances in a single Helix cluster

(all numbers are per-datacenter)

Page 102: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection

Summary

•Container abstraction has become a huge win • With Helix, we can go a step further and make

tasks the unit of work • With the TargetProvider and ContainerProvider

abstractions, any popular provisioner can be plugged in

Page 104: Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix. Intersection of Job Types OracleDB OracleDB. Intersection