Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn...

Post on 09-Jul-2020

33 views 0 download

Transcript of Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn...

Managing Containers with Helix

Kanak Biscuitwala Jason Zhang

Apache Helix Committers @ LinkedIn helix.apache.org

@apachehelix

Intersection of Job Types

OracleDB OracleDB

Intersection of Job Types

OracleDB OracleDB

BackupBackup

Intersection of Job Types

OracleDB OracleDB

BackupBackup

HDFS

ETL ETL

Intersection of Job Types

OracleDB OracleDB

BackupBackup

HDFS

ETL ETL

Long-running and batch jobs running together!

Cloud Deployment

A

B

online

nearline

C batch

A1 A1

A2 A3B1

C1 C2

C3

B2 B3

C2

B4 B5

C2 C4

Applications with diverse requirements running together in a datacenter

Cloud Deployment

A

B

C

A1 A1

A2 A3B1

C1 C2

C3

B2 B3

C2

B4 B5

C2 C4

Applications with diverse requirements running together in a datacenter

DB

Backup

ETL

Processes on Machines

Machine ContainerProcess VM

Processes on Machines

TaskTaskProcess

No Isolation

Machine ContainerProcess VM

Processes on Machines

TaskTaskProcess

128 MB

128 MB

128 MB

Process

Process

Process

No Isolation VM-based Isolation

Machine ContainerProcess VM

Processes on Machines

TaskTaskProcess

256 MB

64 MB

128 MB

128 MB

128 MB

Process

Process

Process Process

Process

No Isolation VM-based Isolation Container-based Isolation

Machine ContainerProcess VM

• Run as individual processes – Poor isolation or poor utilization

• Virtual machines – Better isolation – Xen, Hyper-V, ESX, KVM

• Containers – cgroup – YARN, Mesos – Super lightweight, dynamic based on application

requirements

Processes on Machines

Processes on Machines

Virtualization and containerization significantly improve process isolation and open up possibilities for efficient

utilization of physical resources

Container-Based Solution

Container-Based SolutionSystem Requirements

A

B

C

64 MB 64 MB 64 MB

128 MB 128 MB

256 MB

Container-Based SolutionAllocation

64 MB

64 MB

128 MB

256 MB

128 MB

64 MB

Machine

Container

Container-Based SolutionAllocation

64 MB

64 MB

128 MB

256 MB

128 MB

64 MB

Machine

Container

A

A

A

B

B

C

Process

Container-Based SolutionAllocation

64 MB

64 MB

128 MB

256 MB

128 MB

64 MB

Containerization is powerful!

Machine

Container

A

A

A

B

B

C

Process

Container-Based SolutionAllocation

64 MB

64 MB

128 MB

256 MB

128 MB

64 MB

Containerization is powerful!

Machine

Container

A

A

A

B

B

C

Process

But do processes always fit so nicely?

Over-Utilization

256 MB

Container-Based Solution Machine

ContainerProcess

Over-Utilization

256 MBProcess 1

Container-Based Solution Machine

ContainerProcess

Over-Utilization

Outcome: Preemption and relaunch

256 MBProcess 1

Container-Based Solution Machine

ContainerProcess

Over-Utilization

Outcome: Preemption and relaunch

Container-Based Solution

384 MB

Machine

ContainerProcess

Over-Utilization

Outcome: Preemption and relaunch

Container-Based Solution

384 MBProcess 1

Machine

ContainerProcess

Under-Utilization

384 MB

128 MB

Container-Based Solution Machine

ContainerProcess

Under-Utilization

Outcome: Over-provisioned until restart

384 MBProcess 1

128 MB

Container-Based Solution Machine

ContainerProcess

Process 2

Container-Based SolutionFailure

64 MB

64 MB

128 MB

256 MB

128 MB

64 MB

Machine

Container

A

A

A

B

B

C

Process

Container-Based SolutionFailure

64 MB

64 MB

128 MB

128 MB

Machine

Container

A

A

B

B

Process

Container-Based SolutionFailure

64 MB

64 MB

128 MB

128 MB

Outcome: Launch containers elsewhere

Machine

Container

A

A

B

B

Process

256 MBC64 MBA

What about stateful systems?

Container-Based SolutionFailure

64 MB

64 MB

128 MB

256 MB

128 MB

64 MB

Machine

Container

SLAVE

SLAVE

MASTER

B

B

C

Process

Container-Based SolutionFailure

64 MB

64 MB

128 MB

128 MB

Without additional information, the master is unavailable until restart

Machine

Container

SLAVE

SLAVE

B

B

Process

ScalingContainer-Based Solution Machine

ContainerProcess

256 MB50% 256 MB50%

ScalingContainer-Based Solution Machine

ContainerProcess

ScalingContainer-Based Solution Machine

ContainerProcess

128 MB33% 128 MB33% 128 MB33%

Outcome: Relaunch with new sharding

Container-Based Solution

Container-Based Solution

Utilization Application requirements define container size

Fault Tolerance New container is started

Scaling Workload is repartitioned and new containers are brought up

Discovery Existence

Container-Based Solution

We need something finer-grained

The container model provides flexibility within machines, but assumes homogeneity of tasks within containers

Task-Based Solution

Task-Based SolutionSystem Requirements

A

B

C

complete in less than 5 hours

always have 2 containers running

response time should be less than 50 ms

Task-Based SolutionAllocation

Machine

Container

A A

B

Task

B

C

C

Over-UtilizationTask-Based Solution Machine

ContainerTask

Over-UtilizationTask-Based Solution

Task 1

Machine

ContainerTask

Over-UtilizationTask-Based Solution

Task 1

Machine

ContainerTask

Over-UtilizationTask-Based Solution

Task 1

Machine

ContainerTask

Task 1

Over-UtilizationTask-Based Solution

Hide the overhead of a container restart

Machine

ContainerTask

Task 1

Under-Utilization

384 MB

128 MB

Task-Based Solution Machine

ContainerTask

Under-Utilization

384 MBTask 1

128 MBTask 2

Task-Based Solution Machine

ContainerTask

Under-Utilization

Optimize container allocations based on usage

384 MBTask 1

Task 2

Task-Based Solution Machine

ContainerTask

Task-Based SolutionFailure

Task 1 Leader

Task 2 Leader

Task 3 Leader

Task 2 Standby

Task 3 Standby

Task 1 Standby

Task 2 Standby

Task 1 Standby

Task 3 Standby

Machine

Container

Task-Based SolutionFailure

Task 1 Leader

Task 2 Leader

Task 2 Standby

Task 3 Standby

Task 1 Standby

Task 3 StandbyTask 3 Leader

Machine

Container

Task-Based SolutionFailure

Some systems cannot wait for new containers to start

Task 1 Leader

Task 2 Leader

Task 2 Standby

Task 3 Standby

Task 1 Standby

Task 3 StandbyTask 3 Leader

Machine

Container

Task-Based SolutionDiscovery

Task 1 Leader

Task 2 Leader

Task 2 Standby

Machine

Container

Task 1:!Leader at N1 Standby at N2

Task 1 Standby

Task 2:!Leader at N2 Standby at N1

N1 N2

Task-Based SolutionDiscovery

Task 1 Leader

Task 2 Leader

Task 2 Standby

Machine

Container

Learn where everything runs, and what state each task is in

Task 1:!Leader at N1 Standby at N2

Task 1 Standby

Task 2:!Leader at N2 Standby at N1

N1 N2

ScalingTask-Based Solution

T4

T5

T6

T1

T2

T3

Machine

ContainerTask

ScalingTask-Based Solution

T4

T5

T6

T1

T2

T3

Machine

ContainerTask

ScalingTask-Based Solution

T4

T5 T6

T1

T2

T3

Machine

ContainerTask

ScalingTask-Based Solution

T4

T5 T6

T1

T2

T3

Machine

ContainerTask

Comparing Solutions

Container Solution Task + Container Solution

Utilization Application requirements define container size

Tasks are distributed as needed to a minimal

container set as per SLA

Fault Tolerance New container is startedExisting task can assume a new state while waiting for

new container

ScalingWorkload is repartitioned and new containers are

brought up

Tasks are moved across containers

Discovery Existence Existence and state

Benefits of a Task-Based SolutionComparing Solutions

Container reuseMinimize overhead of container relaunch

Fine-grained scheduling

Benefits of a Task-Based SolutionComparing Solutions

Container reuseMinimize overhead of container relaunch

Fine-grained scheduling

Task : Container :: Thread : ProcessTask is the right level of abstraction

Working at task granularity is powerful

We need a reactive approach to resource assignment

Comparing Solutions

Working at task granularity is powerful

How can Helix help?

We need a reactive approach to resource assignment

Comparing Solutions

Working at task granularity is powerful

How can Helix help?

We need a reactive approach to resource assignment

Comparing Solutions

YARN/Mesos: containers bring flexibility in a machineHelix: tasks bring flexibility in a container

Task Management with Helix

Application Lifecycle

Capacity Planning

Provisioning

Fault Tolerance

State Management

Allocating physical resources for your load

Deploying and launching tasks

Staying available, ensuring success

Determining what code should be running and where

Controller NODES (Participants)

Spectators

ControllerController

ManageTASKS

Helix OverviewCluster Roles

Helix ControllerHigh-Level Overview

Rebalancer

Task Assignment

Constraints

Nodes

“single master” “no more than 3 tasks

per machine”

Helix ControllerRebalancer

ResourceAssignment computeResourceMapping( RebalancerConfig rebalancerConfig, ResourceAssignment prevAssignment, Cluster cluster, ResourceCurrentState currentState);

Based on the current nodes in the cluster and constraints, find an assignment of task to node

Helix ControllerRebalancer

ResourceAssignment computeResourceMapping( RebalancerConfig rebalancerConfig, ResourceAssignment prevAssignment, Cluster cluster, ResourceCurrentState currentState);

Based on the current nodes in the cluster and constraints, find an assignment of task to node

What else do we need?

Helix ControllerWhat is Missing?

Dynamic Container Allocation

Container Isolation

Automated Service Deployment

Resource Utilization Monitoring

Helix ControllerTarget Provider

Based on some constraints, determine how many containers are required in this system

Fixed

CPU

Memory

Bin Packing

We’re working on integrating with monitoring systems in order to query for usage information

Helix ControllerTarget Provider

Based on some constraints, determine how many containers are required in this system

TargetProviderResponse evaluateExistingContainers( Cluster cluster, ResourceId resourceId, Collection<Participant> participants);

class TargetProviderResponse { List<ContainerSpec> containersToAcquire; List<Participant> containersToRelease; List<Participant> containersToStop; List<Participant> containersToStart;}

Fixed

CPU

Memory

Bin Packing

We’re working on integrating with monitoring systems in order to query for usage information

Helix ControllerAdding a Target Provider

Rebalancer

Task Assignment

Constraints

Nodes

Target Provider

Helix ControllerAdding a Target Provider

Rebalancer

Task Assignment

Constraints

Nodes

Target Provider

How do we use the target provider response?

Helix ControllerContainer Provider

Given the container requirements, ensure that number of containers are running

YARN

Mesos

Local

Helix ControllerContainer Provider

Given the container requirements, ensure that number of containers are running

ListenableFuture<ContainerId> allocateContainer(ContainerSpec spec);!ListenableFuture<Boolean> deallocateContainer(ContainerId containerId);!ListenableFuture<Boolean> startContainer(ContainerId containerId, Participant participant);!ListenableFuture<Boolean> stopContainer(ContainerId containerId);

YARN

Mesos

Local

Helix ControllerAdding a Container Provider

Rebalancer

Task Assignment

Constraints

Nodes

Target Provider

Container Provider

Target Provider + Container Provider = Provisioner

Application Lifecycle

Capacity Planning

Provisioning

Fault Tolerance

State Management

Target Provider

Container Provider

Existing Helix Controller (enhanced by Provisioner)

Existing Helix Controller (enhanced by Provisioner)

With Helix and the Task Abstraction

System Architecture

System Architecture

Resource Provider

System Architecture

submit jobResource ProviderClient

System Architecture

submit jobResource Provider

Controller Container

Provisioner

Rebalancer

Client

App Launcher

System Architecture

submit jobResource Provider

Controller Container

Provisioner

Rebalancer

Client

container request

App Launcher

System Architecture

submit jobResource Provider

Controller Container

Provisioner

Rebalancer

Client

container request

Participant Container

Participant Launcher

Helix Participant

App

App Launcher

System Architecture

submit jobResource Provider

Controller Container

Provisioner

Rebalancer

Client

container request

Participant Container

Participant Launcher

Helix Participant

App

App Launcher

assign tasks

HDFS/Common Area

Helix + YARNYARN Architecture

ClientResource Manager

Application Master Container

Node Manager Node Manager

submit job

node statusnode statuscontainer request

assign work

status

App Package

grab package

HDFS/Common Area

Helix + YARNHelix + YARN Architecture

ClientResource Manager

Application Master Container

Node Manager Node Manager

submit job

node statusnode statuscontainer request

assign tasks

status

Helix Controller

Rebalancer

Helix Participant

App

App Package

grab package

HDFS/Common Area

Scheduler Slave

Helix + MesosMesos Architecture

SchedulerMesos Master

Slave Machine Slave Machine

Mesos SlaveMesos Slave

offer resources

node statusnode status

Mesos Executor

grab executor

Executor Package

offer response

Scheduler SlaveHelix Controller

Helix + MesosHelix + Mesos Architecture

SchedulerMesos Master

Slave Machine Slave Machine

Mesos Slave

Mesos Slave

offer resources

node statusnode status

assign tasks

HDFS/Common Area

Mesos Executor

grab executor

Helix Executor Package

offer response

Helix Participant/App

Example

Distributed Document StoreOverview

OraclePartition 0 Partition 1 Partition 2 Oracle

Partition 0 Partition 1 Partition 2

P1 BackupP2 Backup

HDFS

ETL ETL

Master Slave

OraclePartition 0 Partition 1 Partition 2

P0 Backup

ETL

Distributed Document StoreOverview

OraclePartition 0 Partition 1 Partition 2 Oracle

Partition 0 Partition 1 Partition 2

P1 BackupP2 Backup

HDFS

ETL ETL

Master Slave

P0 Backup

Partition 0 Partition 1 Partition 2

Distributed Document StoreYARN Example

ClientResource Managersubmit job

container request

assign work

status

node status

Application Master

Node Manager

Helix Controller

Rebalancer

Container

Node Manager

node status

Helix Participant

OraclePartition 0 Partition 1

P1 Backup ETL

YAML SpecificationappConfig: { config: { k1: v1 } }appPackageUri: 'file://path/to/myApp-pkg.tar'appName: myAppservices: [DB, ETL] # the task containersserviceConfigMap: {DB: { num_containers: 3, memory: 1024 }, ... ETL: { time_to_complete: 5h, ... }, ...}servicePackageURIMap: { DB: ‘file://path/to/db-service-pkg.tar', ...}...

Distributed Document Store

YAML SpecificationappConfig: { config: { k1: v1 } }appPackageUri: 'file://path/to/myApp-pkg.tar'appName: myAppservices: [DB, ETL] # the task containersserviceConfigMap: {DB: { num_containers: 3, memory: 1024 }, ... ETL: { time_to_complete: 5h, ... }, ...}servicePackageURIMap: { DB: ‘file://path/to/db-service-pkg.tar', ...}...

Distributed Document Store

TargetProvider specification

Service/Container Implementation

public class MyQueuerService extends StatelessParticipantService { @Override public void init() { ... }! @Override public void onOnline() { ... }! @Override public void onOffline() { ... }}

Distributed Document Store

Task Implementation

public class BackupTask extends Task { @Override public ListenableFuture<Status> start() { ... }! @Override public ListenableFuture<Status> cancel() { ... }! @Override public ListenableFuture<Status> pause() { ... }! @Override public ListenableFuture<Status> resume() { ... }}

Distributed Document Store

Distributed Document StoreState Model-Style Callbacks

public class StoreStateModel extends StateModel { public void onBecomeMasterFromSlave() { ... }! public void onBecomeSlaveFromMaster() { ... }! public void onBecomeSlaveFromOffline() { ... }! public void onBecomeOfflineFromSlave() { ... }}

class  RoutingLogic  {        public  void  write(Request  request)  {            partition  =  getPartition(request.key);            List<Participant>  nodes  =                    routingTableProvider.getInstance(                            partition,  “MASTER”);            nodes.get(0).write(request);        }  !      public  void  read(Request  request)  {            partition  =  getPartition(request.key);            List<Participant>  nodes  =                    routingTableProvider.getInstance(partition);            random(nodes).read(request);        }  }

Spectator (for Discovery)Distributed Document Store

Helix at LinkedIn

Helix at LinkedIn

OracleOracleOracleDB

Change Capture

Change Consumers

Index Search Index

User Writes

Data Replicator

Backup/Restore

In Production

ETL

HDFS

Analytics

Helix at LinkedInIn Production

Over 1000 instances covering over 30000 database partitions

Over 1000 instances for change capture consumers

As many as 500 instances in a single Helix cluster

(all numbers are per-datacenter)

Summary

•Container abstraction has become a huge win • With Helix, we can go a step further and make

tasks the unit of work • With the TargetProvider and ContainerProvider

abstractions, any popular provisioner can be plugged in

Questions?

Jason zzhang@apache.org

Kanak kanak@apache.org

Website helix.apache.org

Dev Mailing List dev@helix.apache.org

User Mailing List user@helix.apache.org

Twitter @apachehelix?