SwarmKit in Theory and Practice

58
Container Orchestration from Theory to Practice Laura Frank & Stephen Day Open Source Summit, Los Angeles September 2017 v0

Transcript of SwarmKit in Theory and Practice

Page 1: SwarmKit in Theory and Practice

Container Orchestration from Theory to Practice

Laura Frank & Stephen DayOpen Source Summit, Los AngelesSeptember 2017

v0

Page 2: SwarmKit in Theory and Practice

2

Stephen DayDocker, [email protected]@stevvooe

Laura [email protected]@rhein_wein

Page 3: SwarmKit in Theory and Practice

3

Agenda

- Understanding the SwarmKit object model - Node topology - Distributed consensus and Raft

Page 4: SwarmKit in Theory and Practice

SwarmKitAn open source framework for building orchestration systemshttps://github.com/docker/swarmkit

Page 5: SwarmKit in Theory and Practice

SwarmKit Object Model

Laura Frank & Stephen DayContainer Orchestration from Theory to Practice

Page 6: SwarmKit in Theory and Practice

6

OrchestrationA control system for your cluster

ClusterO

-

Δ StD

D = Desired StateO = OrchestratorC = ClusterSt = State at time tΔ = Operations to converge S to D

https://en.wikipedia.org/wiki/Control_theory

Page 7: SwarmKit in Theory and Practice

7

ConvergenceA functional view

D = Desired StateO = OrchestratorC = ClusterSt = State at time t

f(D, Sn-1, C) → Sn | min(S-D)

Page 8: SwarmKit in Theory and Practice

8

Observability and ControllabilityThe Problem

Low Observability High Observability

Failure Process State User Input

Page 9: SwarmKit in Theory and Practice

9

Data Model Requirements

- Represent difference in cluster state- Maximize observability- Support convergence- Do this while being extensible and reliable

Page 10: SwarmKit in Theory and Practice

Show me your data structures and I’ll show you your orchestration system

Page 11: SwarmKit in Theory and Practice

11

Services- Express desired state of the cluster- Abstraction to control a set of containers- Enumerates resources, network availability, placement- Leave the details of runtime to container process- Implement these services by distributing processes across a cluster

Node 1 Node 2 Node 3

Page 12: SwarmKit in Theory and Practice

12

Declarative$ docker network create -d overlay backend vpd5z57ig445ugtcibr11kiiz

$ docker service create -p 6379:6379 --network backend redis pe5kyzetw2wuynfnt7so1jsyl

$ docker service scale serene_euler=3 serene_euler scaled to 3

$ docker service ls ID NAME REPLICAS IMAGE COMMANDpe5kyzetw2wu serene_euler 3/3 redis

Page 13: SwarmKit in Theory and Practice

13

ReconciliationSpec → Object

ObjectCurrent State

SpecDesired State

Page 14: SwarmKit in Theory and Practice

14

Demo Time!

Page 15: SwarmKit in Theory and Practice

Task Model

Prepare: setup resourcesStart: start the taskWait: wait until task exitsShutdown: stop task, cleanly

Runtime

Page 16: SwarmKit in Theory and Practice

Orchestrator

16

Task ModelAtomic Scheduling Unit of SwarmKit

ObjectCurrent State

SpecDesired State

Task0 Task1… Taskn Scheduler

Page 17: SwarmKit in Theory and Practice

Manager

TaskTask

Data Flow

ServiceSpec

TaskSpec

Service

ServiceSpec

TaskSpec

Task

TaskSpec

Worker

Page 18: SwarmKit in Theory and Practice

Consistency

Page 19: SwarmKit in Theory and Practice

19

Field Ownership

Only one component of the system can write to a field

Consistency

Page 20: SwarmKit in Theory and Practice

Worker

Pre-Run

Preparing

Manager

Terminal States

Task StateNew Allocated Assigned

Ready Starting

Running

Complete

Shutdown

Failed

Rejected

Page 21: SwarmKit in Theory and Practice

Field HandoffTask Status

State Owner

< Assigned Manager

>= Assigned Worker

Page 22: SwarmKit in Theory and Practice

22

Observability and ControllabilityThe Problem

Low Observability High Observability

Failure Process State User Input

Page 23: SwarmKit in Theory and Practice

23

OrchestrationA control system for your cluster

ClusterO

-

Δ StD

D = Desired StateO = OrchestratorC = ClusterSt = State at time tΔ = Operations to converge S to D

https://en.wikipedia.org/wiki/Control_theory

Page 24: SwarmKit in Theory and Practice

Orchestrator

24

Task ModelAtomic Scheduling Unit of SwarmKit

ObjectCurrent State

SpecDesired State

Task0 Task1… Taskn Scheduler

Page 25: SwarmKit in Theory and Practice

SwarmKit Doesn’t Quit!

Page 26: SwarmKit in Theory and Practice

Node Topology in SwarmKit

Laura Frank & Stephen DayContainer Orchestration from Theory to Practice

Page 27: SwarmKit in Theory and Practice

We’ve got a bunch of nodes… now what?

Page 28: SwarmKit in Theory and Practice

28

Push Model Pull Model

Manager

Worker

Discovery System

(ZooKeeper)

3 - Payload

1 - Register

2 - Discover Manager

Worker

Registration &Payload

Page 29: SwarmKit in Theory and Practice

29

Push Model Pull Model

Pros Provides better control over communication rate

- Managers decide when to contact Workers

Cons Requires a discovery mechanism

- More failure scenarios- Harder to troubleshoot

Pros Simpler to operate- Workers connect to Managers

and don’t need to bind- Can easily traverse networks- Easier to secure- Fewer moving parts

Cons Workers must maintain connection to Managers at all times

Page 30: SwarmKit in Theory and Practice

30

Push vs Pull• SwarmKit adopted the Pull model• Favored operational simplicity• Engineered solutions to provide rate control in pull mode

Page 31: SwarmKit in Theory and Practice

Rate Control in a Pull Model

Page 32: SwarmKit in Theory and Practice

32

Rate Control: Heartbeats• Manager dictates heartbeat rate to Workers• Rate is configurable (not by end user)• Managers agree on same rate by

consensus via Raft• Managers add jitter so pings are spread

over time (avoid bursts)

Manager

Worker

Ping? Pong!Ping me back in 5.2 seconds

Page 33: SwarmKit in Theory and Practice

33

Rate Control: Workloads• Worker opens a gRPC stream to

receive workloads• Manager can send data whenever it

wants to• Manager will send data in batches• Changes are buffered and sent in

batches of 100 or every 100 ms, whichever occurs first

• Adds little delay (at most 100ms) but drastically reduces amount of communication

Manager

Worker

Give me work to do

100ms - [Batch of 12 ]200ms - [Batch of 26 ]300ms - [Batch of 32 ]340ms - [Batch of 100]360ms - [Batch of 100]460ms - [Batch of 42 ]560ms - [Batch of 23 ]

Page 34: SwarmKit in Theory and Practice

ReplicationRunning multiple managers for high availability

Page 35: SwarmKit in Theory and Practice

35

Replication

Manager Manager Manager

Worker

Leader FollowerFollower • Worker can connect to any Manager

• Followers will forward traffic to the Leader

Page 36: SwarmKit in Theory and Practice

36

Replication

Manager Manager Manager

Worker

Leader FollowerFollower • Followers multiplex all workers to the Leader using a single connection

• Backed by gRPC channels (HTTP/2 streams)

• Reduces Leader networking load by spreading the connections evenly

Worker Worker

Example: On a cluster with 10,000 workers and 5 managers, each will only have to handle about 2,000 connections. Each follower will forward its 2,000 workers using a single socket to the leader.

Page 37: SwarmKit in Theory and Practice

37

Replication

Manager Manager Manager

Worker

Leader FollowerFollower • Upon Leader failure, a new one is elected

• All managers start redirecting worker traffic to the new one

• Transparent to workers

Worker Worker

Page 38: SwarmKit in Theory and Practice

38

Replication

Manager Manager Manager

Worker

Follower FollowerLeader • Upon Leader failure, a new one is elected

• All managers start redirecting worker traffic to the new one

• Transparent to workers

Worker Worker

Page 39: SwarmKit in Theory and Practice

39

Replication

Manager 3

Manager 1

Manager 2

Worker

Leader FollowerFollower • Manager sends list of all managers’ addresses to Workers

• When a new manager joins, all workers are notified

• Upon manager failure, workers will reconnect to a different manager

- Manager 1 Addr- Manager 2 Addr- Manager 3 Addr

Page 40: SwarmKit in Theory and Practice

40

Replication

Manager 3

Manager 1

Manager 2

Worker

Leader FollowerFollower • Manager sends list of all managers’ addresses to Workers

• When a new manager joins, all workers are notified

• Upon manager failure, workers will reconnect to a different manager

Page 41: SwarmKit in Theory and Practice

41

Replication

Manager 3

Manager 1

Manager 2

Worker

Leader FollowerFollower • Manager sends list of all managers’ addresses to Workers

• When a new manager joins, all workers are notified

• Upon manager failure, workers will reconnect to a different manager

Reconnect to random manager

Page 42: SwarmKit in Theory and Practice

PresenceScalable presence in a distributed environment

Page 43: SwarmKit in Theory and Practice

43

Presence• Leader commits Worker state (Up vs Down) into Raft

− Propagates to all managers− Recoverable in case of leader re-election

• Heartbeat TTLs kept in Leader memory− Too expensive to store “last ping time” in Raft

• Every ping would result in a quorum write− Leader keeps worker<->TTL in a heap (time.AfterFunc)− Upon leader failover workers are given a grace period to reconnect

• Workers considered Unknown until they reconnect• If they do they move back to Up• If they don’t they move to Down

Page 44: SwarmKit in Theory and Practice

Distributed Consensus

Laura Frank & Stephen DayContainer Orchestration from Theory to Practice

Page 45: SwarmKit in Theory and Practice

45

The Raft Consensus Algorithm

Orchestration systems typically use some kind of service to maintain state in a distributed system

- etcd- ZooKeeper- …

Many of these services are backed by the Raft consensus algorithm

Page 46: SwarmKit in Theory and Practice

46

SwarmKit and RaftDocker chose to implement the algorithm directly

- Fast- Don’t have to set up a separate service to get started with

orchestration- Differentiator between SwarmKit/Docker and other orchestration

systems

Page 47: SwarmKit in Theory and Practice

demo.consensus.group

secretlivesofdata.com

Page 48: SwarmKit in Theory and Practice

Consistency

Page 49: SwarmKit in Theory and Practice

Sequencer

● Every object in the store has a Version field● Version stores the Raft index when the object was last updated● Updates must provide a base Version; are rejected if it is out of date● Similar to CAS● Also exposed through API calls that change objects in the store

49

Page 50: SwarmKit in Theory and Practice

50

Versioned UpdatesConsistency

service := getCurrentService()

spec := service.Spec

spec.Image = "my.serv/myimage:mytag"

update(spec, service.Version)

Page 51: SwarmKit in Theory and Practice

Sequencer

Service ABC

SpecReplicas = 4Image = registry:2.3.0...

Version = 189

Original object:

51

Raft index when it was last updated

Page 52: SwarmKit in Theory and Practice

Sequencer

Service ABC

SpecReplicas = 4Image = registry:2.3.0...

Version = 189

Service ABC

SpecReplicas = 4Image = registry:2.4.0...

Version = 189

Update request: Original object:

52

Page 53: SwarmKit in Theory and Practice

Sequencer

Service ABC

SpecReplicas = 4Image = registry:2.3.0...

Version = 189

Original object:

Service ABC

SpecReplicas = 4Image = registry:2.4.0...

Version = 189

Update request:

53

Page 54: SwarmKit in Theory and Practice

Sequencer

Service ABC

SpecReplicas = 4Image = registry:2.4.0...

Version = 190

Updated object:

54

Page 55: SwarmKit in Theory and Practice

Sequencer

Service ABC

SpecReplicas = 4Image = registry:2.4.0...

Version = 190

Service ABC

SpecReplicas = 5Image = registry:2.3.0...

Version = 189

Update request: Updated object:

55

Page 56: SwarmKit in Theory and Practice

Sequencer

Service ABC

SpecReplicas = 4Image = registry:2.4.0...

Version = 190

Service ABC

SpecReplicas = 5Image = registry:2.3.0...

Version = 189

Update request: Updated object:

56

Page 57: SwarmKit in Theory and Practice

github.com/docker/swarmkitgithub.com/coreos/etcd (Raft)

Page 58: SwarmKit in Theory and Practice

THANK YOU