Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn...

Managing Containers with Helix

Kanak Biscuitwala Jason Zhang

Apache Helix Committers @ LinkedIn helix.apache.org

@apachehelix

Intersection of Job Types

OracleDB OracleDB

BackupBackup

OracleDB OracleDB

BackupBackup

ETL ETL

OracleDB OracleDB

BackupBackup

ETL ETL

Long-running and batch jobs running together!

Cloud Deployment

online

nearline

C batch

A2 A3B1

Applications with diverse requirements running together in a datacenter

Cloud Deployment

A2 A3B1

Applications with diverse requirements running together in a datacenter

Backup

Processes on Machines

Machine ContainerProcess VM

TaskTaskProcess

No Isolation

TaskTaskProcess

128 MB

Process

No Isolation VM-based Isolation

TaskTaskProcess

256 MB

128 MB

Process

Process Process

Process

No Isolation VM-based Isolation Container-based Isolation

• Run as individual processes – Poor isolation or poor utilization

• Virtual machines – Better isolation – Xen, Hyper-V, ESX, KVM

• Containers – cgroup – YARN, Mesos – Super lightweight, dynamic based on application

requirements

Virtualization and containerization significantly improve process isolation and open up possibilities for efficient

utilization of physical resources

Container-Based Solution

Container-Based SolutionSystem Requirements

64 MB 64 MB 64 MB

128 MB 128 MB

256 MB

Container-Based SolutionAllocation

128 MB

256 MB

128 MB

Machine

Container

128 MB

256 MB

128 MB

Machine

Container

Process

128 MB

256 MB

128 MB

Containerization is powerful!

Machine

Container

Process

128 MB

256 MB

128 MB

Containerization is powerful!

Machine

Container

Process

But do processes always fit so nicely?

Over-Utilization

256 MB

Container-Based Solution Machine

ContainerProcess

Over-Utilization

256 MBProcess 1

ContainerProcess

Over-Utilization

Outcome: Preemption and relaunch

256 MBProcess 1

ContainerProcess

Over-Utilization

384 MB

Machine

ContainerProcess

Over-Utilization

384 MBProcess 1

Machine

ContainerProcess

Under-Utilization

384 MB

128 MB

ContainerProcess

Under-Utilization

Outcome: Over-provisioned until restart

384 MBProcess 1

128 MB

ContainerProcess

Process 2

Container-Based SolutionFailure

128 MB

256 MB

128 MB

Machine

Container

Process

128 MB

Machine

Container

Process

128 MB

Outcome: Launch containers elsewhere

Machine

Container

Process

256 MBC64 MBA

What about stateful systems?

128 MB

256 MB

128 MB

Machine

Container

MASTER

Process

128 MB

Without additional information, the master is unavailable until restart

Machine

Container

Process

ScalingContainer-Based Solution Machine

ContainerProcess

256 MB50% 256 MB50%

ContainerProcess

128 MB33% 128 MB33% 128 MB33%

Outcome: Relaunch with new sharding

Utilization Application requirements define container size

Fault Tolerance New container is started

Scaling Workload is repartitioned and new containers are brought up

Discovery Existence

We need something finer-grained

The container model provides flexibility within machines, but assumes homogeneity of tasks within containers

Task-Based Solution

Task-Based SolutionSystem Requirements

complete in less than 5 hours

always have 2 containers running

response time should be less than 50 ms

Task-Based SolutionAllocation

Machine

Container

Over-UtilizationTask-Based Solution Machine

ContainerTask

Over-UtilizationTask-Based Solution

Task 1

Machine

ContainerTask

Task 1

Machine

ContainerTask

Task 1

Machine

ContainerTask

Task 1

Hide the overhead of a container restart

Machine

ContainerTask

Task 1

Under-Utilization

384 MB

128 MB

Task-Based Solution Machine

ContainerTask

Under-Utilization

384 MBTask 1

128 MBTask 2

ContainerTask

Under-Utilization

Optimize container allocations based on usage

384 MBTask 1

Task 2

ContainerTask

Task-Based SolutionFailure

Task 1 Leader

Task 2 Leader

Task 3 Leader

Task 2 Standby

Task 3 Standby

Task 1 Standby

Task 2 Standby

Task 1 Standby

Task 3 Standby

Machine

Container

Task 1 Leader

Task 2 Leader

Task 2 Standby

Task 3 Standby

Task 1 Standby

Task 3 StandbyTask 3 Leader

Machine

Container

Some systems cannot wait for new containers to start

Task 1 Leader

Task 2 Leader

Task 2 Standby

Task 3 Standby

Task 1 Standby

Task 3 StandbyTask 3 Leader

Machine

Container

Task-Based SolutionDiscovery

Task 1 Leader

Task 2 Leader

Task 2 Standby

Machine

Container

Task 1:!Leader at N1 Standby at N2

Task 1 Standby

Task-Based SolutionDiscovery

Task 1 Leader

Task 2 Leader

Task 2 Standby

Machine

Container

Learn where everything runs, and what state each task is in

Task 1 Standby

ScalingTask-Based Solution

Machine

ContainerTask

Machine

ContainerTask

Machine

ContainerTask

Machine

ContainerTask

Comparing Solutions

Container Solution Task + Container Solution

Utilization Application requirements define container size

Tasks are distributed as needed to a minimal

container set as per SLA

Fault Tolerance New container is startedExisting task can assume a new state while waiting for

new container

ScalingWorkload is repartitioned and new containers are

brought up

Tasks are moved across containers

Discovery Existence Existence and state

Benefits of a Task-Based SolutionComparing Solutions

Container reuseMinimize overhead of container relaunch

Fine-grained scheduling

Benefits of a Task-Based SolutionComparing Solutions

Container reuseMinimize overhead of container relaunch

Fine-grained scheduling

Task : Container :: Thread : ProcessTask is the right level of abstraction

Working at task granularity is powerful

We need a reactive approach to resource assignment

Comparing Solutions

How can Helix help?

Comparing Solutions

How can Helix help?

Comparing Solutions

YARN/Mesos: containers bring flexibility in a machineHelix: tasks bring flexibility in a container

Task Management with Helix

Application Lifecycle

Capacity Planning

Provisioning

Fault Tolerance

State Management

Allocating physical resources for your load

Deploying and launching tasks

Staying available, ensuring success

Determining what code should be running and where

Controller NODES (Participants)

Spectators

ControllerController

ManageTASKS

Helix OverviewCluster Roles

Helix ControllerHigh-Level Overview

Rebalancer

Task Assignment

Constraints

“single master” “no more than 3 tasks

per machine”

Helix ControllerRebalancer

ResourceAssignment computeResourceMapping( RebalancerConfig rebalancerConfig, ResourceAssignment prevAssignment, Cluster cluster, ResourceCurrentState currentState);

Based on the current nodes in the cluster and constraints, find an assignment of task to node

Helix ControllerRebalancer

ResourceAssignment computeResourceMapping( RebalancerConfig rebalancerConfig, ResourceAssignment prevAssignment, Cluster cluster, ResourceCurrentState currentState);

Based on the current nodes in the cluster and constraints, find an assignment of task to node

What else do we need?

Helix ControllerWhat is Missing?

Dynamic Container Allocation

Container Isolation

Automated Service Deployment

Resource Utilization Monitoring

Helix ControllerTarget Provider

Based on some constraints, determine how many containers are required in this system

Memory

Bin Packing

We’re working on integrating with monitoring systems in order to query for usage information

Helix ControllerTarget Provider

Based on some constraints, determine how many containers are required in this system

TargetProviderResponse evaluateExistingContainers( Cluster cluster, ResourceId resourceId, Collection<Participant> participants);

class TargetProviderResponse { List<ContainerSpec> containersToAcquire; List<Participant> containersToRelease; List<Participant> containersToStop; List<Participant> containersToStart;}

Memory

Bin Packing

We’re working on integrating with monitoring systems in order to query for usage information

Helix ControllerAdding a Target Provider

Rebalancer

Task Assignment

Constraints

Target Provider

Helix ControllerAdding a Target Provider

Rebalancer

Task Assignment

Constraints

Target Provider

How do we use the target provider response?

Helix ControllerContainer Provider

Given the container requirements, ensure that number of containers are running

Helix ControllerContainer Provider

Given the container requirements, ensure that number of containers are running

ListenableFuture<ContainerId> allocateContainer(ContainerSpec spec);!ListenableFuture<Boolean> deallocateContainer(ContainerId containerId);!ListenableFuture<Boolean> startContainer(ContainerId containerId, Participant participant);!ListenableFuture<Boolean> stopContainer(ContainerId containerId);

Helix ControllerAdding a Container Provider

Rebalancer

Task Assignment

Constraints

Target Provider

Container Provider

Target Provider + Container Provider = Provisioner

Application Lifecycle

Capacity Planning

Provisioning

Fault Tolerance

State Management

Target Provider

Container Provider

Existing Helix Controller (enhanced by Provisioner)

With Helix and the Task Abstraction

System Architecture

Resource Provider

System Architecture

submit jobResource ProviderClient

System Architecture

submit jobResource Provider

Controller Container

Provisioner

Rebalancer

Client

App Launcher

System Architecture

Provisioner

Rebalancer

Client

container request

App Launcher

System Architecture

Provisioner

Rebalancer

Client

container request

Participant Container

Participant Launcher

Helix Participant

App Launcher

System Architecture

Provisioner

Rebalancer

Client

container request

Participant Container

Participant Launcher

Helix Participant

App Launcher

assign tasks

HDFS/Common Area

Helix + YARNYARN Architecture

ClientResource Manager

Application Master Container

Node Manager Node Manager

submit job

node statusnode statuscontainer request

assign work

status

App Package

grab package

HDFS/Common Area

Helix + YARNHelix + YARN Architecture

ClientResource Manager

Application Master Container

Node Manager Node Manager

submit job

node statusnode statuscontainer request

assign tasks

status

Helix Controller

Rebalancer

Helix Participant

App Package

grab package

HDFS/Common Area

Scheduler Slave

Helix + MesosMesos Architecture

SchedulerMesos Master

Slave Machine Slave Machine

Mesos SlaveMesos Slave

offer resources

node statusnode status

Mesos Executor

grab executor

Executor Package

offer response

Scheduler SlaveHelix Controller

Helix + MesosHelix + Mesos Architecture

SchedulerMesos Master

Slave Machine Slave Machine

Mesos Slave

offer resources

node statusnode status

assign tasks

HDFS/Common Area

Mesos Executor

grab executor

Helix Executor Package

offer response

Helix Participant/App

Example

Distributed Document StoreOverview

OraclePartition 0 Partition 1 Partition 2 Oracle

Partition 0 Partition 1 Partition 2

P1 BackupP2 Backup

ETL ETL

Master Slave

OraclePartition 0 Partition 1 Partition 2

P0 Backup

Distributed Document StoreOverview

OraclePartition 0 Partition 1 Partition 2 Oracle

P1 BackupP2 Backup

ETL ETL

Master Slave

P0 Backup

Distributed Document StoreYARN Example

ClientResource Managersubmit job

container request

assign work

status

node status

Application Master

Node Manager

Helix Controller

Rebalancer

Container

Node Manager

node status

Helix Participant

OraclePartition 0 Partition 1

P1 Backup ETL

YAML SpecificationappConfig: { config: { k1: v1 } }appPackageUri: 'file://path/to/myApp-pkg.tar'appName: myAppservices: [DB, ETL] # the task containersserviceConfigMap: {DB: { num_containers: 3, memory: 1024 }, ... ETL: { time_to_complete: 5h, ... }, ...}servicePackageURIMap: { DB: ‘file://path/to/db-service-pkg.tar', ...}...

Distributed Document Store

YAML SpecificationappConfig: { config: { k1: v1 } }appPackageUri: 'file://path/to/myApp-pkg.tar'appName: myAppservices: [DB, ETL] # the task containersserviceConfigMap: {DB: { num_containers: 3, memory: 1024 }, ... ETL: { time_to_complete: 5h, ... }, ...}servicePackageURIMap: { DB: ‘file://path/to/db-service-pkg.tar', ...}...

TargetProvider specification

Service/Container Implementation

public class MyQueuerService extends StatelessParticipantService { @Override public void init() { ... }! @Override public void onOnline() { ... }! @Override public void onOffline() { ... }}

Task Implementation

public class BackupTask extends Task { @Override public ListenableFuture<Status> start() { ... }! @Override public ListenableFuture<Status> cancel() { ... }! @Override public ListenableFuture<Status> pause() { ... }! @Override public ListenableFuture<Status> resume() { ... }}

Distributed Document StoreState Model-Style Callbacks

public class StoreStateModel extends StateModel { public void onBecomeMasterFromSlave() { ... }! public void onBecomeSlaveFromMaster() { ... }! public void onBecomeSlaveFromOffline() { ... }! public void onBecomeOfflineFromSlave() { ... }}

class RoutingLogic { public void write(Request request) { partition = getPartition(request.key); List<Participant> nodes = routingTableProvider.getInstance( partition, “MASTER”); nodes.get(0).write(request); } ! public void read(Request request) { partition = getPartition(request.key); List<Participant> nodes = routingTableProvider.getInstance(partition); random(nodes).read(request); } }

Spectator (for Discovery)Distributed Document Store

Helix at LinkedIn

OracleOracleOracleDB

Change Capture

Change Consumers

Index Search Index

User Writes

Data Replicator

Backup/Restore

In Production

Analytics

Helix at LinkedInIn Production

Over 1000 instances covering over 30000 database partitions

Over 1000 instances for change capture consumers

As many as 500 instances in a single Helix cluster

(all numbers are per-datacenter)

Summary

•Container abstraction has become a huge win • With Helix, we can go a step further and make

tasks the unit of work • With the TargetProvider and ContainerProvider

abstractions, any popular provisioner can be plugged in

Questions?

Jason zzhang@apache.org

Kanak kanak@apache.org

Website helix.apache.org

Dev Mailing List dev@helix.apache.org

User Mailing List user@helix.apache.org

Twitter @apachehelix?

Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn...

Documents

Transcript of Managing Containers with Helix - events.static.linuxfound.org · Apache Helix Committers @ LinkedIn...

Helix Capping

Timex Helix

Helix Well Ops Siem Helix 2 Ltr - Helix Energy Solutions

MASTER: HeliX / HeliX 2 - Sunrise Medicaleparts.sunrisemedical.eu/etk_ic/print/master_ helix...HeliX / HeliX 2 -Rear wheel, spoked-9 - HeliX / HeliX 2 -Rear wheel, spoked Pos Item

Helix Hotel

Changes in the dynamics of the cardiac troponin C molecule … · 2017-07-12 · cTnC A B cTnI A8V L29Q A31S L48Q Q50R C84Y cTnC C N-helix A-helix B-helix C-helix N-helix D-helix

«Шелл» Академия - Вход · 2018. 2. 22. · HELIX HELIX HELIX HELIX HELIX HELIX HELIX HELIX HELIX HELIX HELIX HELIX HELIX HELIX HELIX HELIX ... HX3 IOW-40/15W-40

HELIX-LOOP-HELIX, HELIX-TURN-HELIX

BIOINFORMATIK I ÜBUNGEN · A. Zinc fingers B. Helix-turn-helix C. Leucine zipper D. Helix-loop-helix • hetero dimerization • homo dimerization Transcription factor dimerization

خانه - موسسه علوم پزشکی سناsanapezeshki.com/download-test/arshad-behdasht-96-97/oloom az-3... · EF Hands Helix-loop-helix Helix-turn helix (üJl (Double Stand

Live Panel: Appium Core Committers Answer Your Questions

PB | The Triple Helix The Triple Helix | 1

The basic helix-loop-helix transcription factor, dHAND, is ...

Hypoxia-inducible factor 1 is abasic-helix-loop-helix-PAS ... · bothHIF-1subunitsarebasic-helix-loop-helix proteinscon-tainingaPASdomain,definedbyits presenceintheDrosoph-ila Per

6. Helix, helicoidal surfaces - linkeova.cz · 6. Helix, helicoidal surfaces ... CHAPTER6. HELIX,HELICOIDALSURFACES 6.2. ... Constructive_Geometry___Exercises_Solution-2.pdf Author:

Helix Fast Response System - Helix Energy Solutions Group HF… · Containment System Tanker Loading ... the Helix Producer I floating production unit each played ... Helix Fast Response

Residential Commercial - plumbingsupplynow.com · Tee Double Helix ® Tee Double Helix ® x Double Helix® x MIPT Elbow FIPT x Double Helix® Coupling Double Helix® Male Adapter

Helix Nebula Architecture v1.5-public - CERN · HELIX&NEBULA&+THESCIENCECLOUD& TechArchSeries%& Helix&Nebula&Architecture ©&Helix&Nebula&Consortiumand&its&participants&2012&–&All&rights&reserved&

City College of San FranciscoHelix-turn-helix Dimer- binding helix Helix-turn-helix motif binds to DNA. Turn DNA- binding helix (b) Zinc fingers "Finger" Zinc finger motif binds to

DNA Recognition in Procaryotes by Helix-Turn-Helix Motifs.