Getting Rid of Zookeeper - · PDF fileZookeeper to “detect” the current leading...

Post on 02-Feb-2018

235 views 0 download

Transcript of Getting Rid of Zookeeper - · PDF fileZookeeper to “detect” the current leading...

Getting Rid of Zookeeper

MesosCon Asia 2016

1

Jay GuoSoftware Developer @IBMguojiannan@cn.ibm.com

Kapil AryaMesos Committer @Mesosphere

kapil@mesosphere.io

Motivation

2

Zookeeper is...

● Mature● Feature-rich● ...

3

But!

● Primitive K/V store○ Provide your own tooling for other abstractions!

● Heavy● Hard dependencies● Language binding instead of RESTfull API● ...

4

It’s all about having options!

5

● Chocolate

● Strawberry

● Vanilla

● ...

Mesos HA:An Overview

6

High Availability

First of all, we need a Distributed Key-Value storage...

7

Mesos HA

● At least three Mesos Masters● One leading Master

○ Leader election○ Leader detection

● Replicated Log

Zookeeper as the distributed key-value store

8

ZKZKZK

MesosMaster

MesosMaster

MesosMaster

leader

Zookeeper cluster

Mesos Agents / Frameworks

Mesos HA: Leader Election

● All Masters “contend” to be the leader!

9

ZKZKZK

MesosMaster

MesosMaster

MesosMaster

Contend

Contend Contend

Mesos HA: Leader Election

● All Masters “contend” to be the leader!

● Only one succeeds; others fail

10

ZKZKZK

MesosMaster

MesosMaster

MesosMaster

Fail

Fail Success

Mesos HA: Leader Election

● All Masters “contend” to be the leader!

● Only one succeeds; others fail ● We have a leading Masters!

11

ZKZKZK

MesosMaster

MesosMaster

MesosMaster

Watch

Watch Hold

Mesos HA: Losing a Leader

● Suppose the leading Master is “lost”

12

ZKZKZK

MesosMaster

MesosMaster

MesosMaster

Watch

Watch

Master connection lost

Mesos HA: Losing a Leader

● Suppose the leading Master is “lost”

● All other Masters are notified

13

ZKZKZK

MesosMaster

MesosMaster

MesosMaster

Notify

Notify

Master connection lost

Mesos HA: Losing a Leader

● Suppose the leading Master is “lost”

● All other Masters are notified● The remaining Masters

contend again

14

ZKZKZK

MesosMaster

MesosMaster

Contend

Contend

MesosMaster

Mesos HA: Losing a Leader

● Suppose the leading Master is “lost”

● All other Masters are notified● The remaining Masters

contend again● One of them succeeds

15

ZKZKZK

MesosMaster

MesosMaster

Success

Fail

MesosMaster

Mesos HA: Losing a Leader

● Suppose the leading Master is “lost”

● All other Masters are notified● The remaining Masters

contend again● One of them succeeds

● A new leader is elected!

16

ZKZKZK

MesosMaster

MesosMaster

Watch

Hold

MesosMaster

What about Agents/Frameworks?

17

Mesos HA: Leader Detection

● Framework/Agent connects to Zookeeper to “detect” about the current leading Master

18

ZKZKZK

MesosMaster

MesosMaster

MesosMaster

Watch

Watch Hold

Mesos Agents/ Frameworks

Detect

Mesos HA: Leader Detection

● Framework/Agent connects to Zookeeper to “detect” about the current leading Master

● Zookeeper provides Master’s location○ I.e. IP:Port

19

ZKZKZK

MesosMaster

MesosMaster

MesosMaster

Watch

Watch Hold

IP:Port

Mesos Agents/ Frameworks

Mesos HA: Leader Detection

● Framework/Agent connects to Zookeeper to “detect” the current leading Master

● Zookeeper provides Master’s location

● Framework/Agent connectsto the “leader”

20

ZKZKZK

MesosMaster

MesosMaster

MesosMaster

Watch

Watch Hold

Connect

Mesos Agents/ Frameworks

What about Replicated Log?

Replicated Log lets you create replicated fault-tolerant append-only logs. The Mesos master uses Replicated Log to store cluster state in a replicated, durable way.

21

Mesos HA: Replicated Log

● Each replica registers its pid into ZK and maintain the presence.

22

ZKZKZK

Replica

Replica

Register & hold

Register & hold

Mesos HA: Replicated Log

23

● Each replica registers its pid into ZK and maintain the presence.

● When new replica joins the cluster, existing ones get notified and get to know the pid of new replica.

ZKZKZK

Replica

ReplicaReplica

Notified with info of new replica

registerNotified with info of new replica

Mesos HA: Replicated Log

24

● Each replica registers its own pid into ZK and maintain the presence.

● When new replica joins the cluster, existing ones get notified and get to know the pid of new replica.

● Every replica knows all nodes in the cluster and do Paxos.

ZKZKZK

Replica

ReplicaReplica

Paxos

Paxos

Paxos

ReplacingZookeeper

25

? = ZK Etcd Consul|||| ...

ZKZK?

MesosMaster

MesosMaster

MesosMaster

leader

DistributedKV Store

● Master Contender for leader election

Three Key Components

26

ZKZK?

MesosMaster

Contender

DistributedKV Store

bool contend();

● Master Contender for leader election

● Master Detector for discovery

Three Key Components

27

ZKZK?

MesosMaster

ContenderDetector

DistributedKV Store

bool contend();

MasterInfo detect(MasterInfo previous);

● Master Contender for leader election

● Master Detector for discovery

● PIDGroup for initialization

Three Key Components

28

bool contend();

MasterInfo detect(MasterInfo previous);

void initialize(pid_t pid); ZKZK?

MesosMaster

ContenderDetector PIDGroup

DistributedKV Store

A Case for Modularization!

29

● Already a clear-cut interfaces between:○ Master and Contender○ Agent and Detector○ Framework and Detector

● For new distributed KV store implementation, we just write the module without having to modify Mesos itself! ZKZK?

MesosMaster

ContenderDetector PIDGroup

DistributedKV Store

Let’s Talk about Modules!

30

MesosModules

● Module/Plugin/Extension● Add/replace a Mesos component

○ Isolators○ Authenticators○ …

● Hook modules:○ Listen to interesting events○ Modify/enhance certain code paths○ Prepare/enhance task environment○ ...

31

● Compiled as shared libraries○ E.g., libmesos_network_overlay.so

● Specified when launching Master/Agent/Frameworkmesos-agent.sh <master-parameters>

--modules=file:///path/to/modules.json

--isolation=”my_isolator”

● Gets loaded during initialization○ E.g., the ”my_isolator” isolator will be loaded into the Agent to

provide task isolation

How are Modules Used?

32

Community Modules

33

I just wrote a Mesos module that provides a really cute feature.

How do I make it useful for others!

Modules are Tricky!

34

● Developing● Building● Testing● Using● Hosting

● How can we make it all better for community?

Writing Modules

● Doesn’t require intimate Mesos knowledge○ Just the details of the subsystem being implemented (e.g., Isolators)

● Familiarity with Mesos model is required○ E.g., libprocess, events, futures and promises, etc.

● Closely tied with Mesos version○ To ensure mutual compatibility

35

Building Modules: Issues

● Build Mesos first!○ Install all Mesos dependencies○ Takes a long time to build○ Version dependencies

36

Building Modules: Good News!

● Starting Mesos 1.0 release, pre-compiled Mesos deb/rpm packages contain everything needed to build modules

37

Testing Modules

38

I just wrote a simple Mesos module that provide a cute feature and I know how to build it!

Can I write unit tests for it?

Testing Modules

● Key questions:○ How to get good test coverage?○ How can we solicit help from community?

● Good news!○ Efforts on the way to create a “libmesos_test” library that can be used to

create/run gmock style tests just like with Mesos itself.

39

How do we, as a community, make third-party modules available for general consumption?

While making sure the developers and consumers can seamlessly test/integrate into their environments!

Community-Driven Modules

40

Community Modules: Proposal

● A central registry that contains pointers:○ E.g., github.com/mesos/modules ○ Each module (or a set of related modules) in its own repository

● Make Mesos version-specific binary rpm/deb modules available○ E.g., lib_my_module_<module-version>_<mesos_version>.so

41

Module CI: Coming Soon!

● Builds binary packages for every registered module○ Across a given set of Mesos versions○ Work-in-progress!

● Automatic build/testing for upcoming Mesos release○ Catch incompatibilities sooner!

● Run tests!

42

Let’s take a look at Etcd!

43

Etcd:A Distributed KV Store

● HTTP API (no language bindings)● May already exist in your environments

44

Etcd in a Mesos Cluster

45

● Create Etcd-specific modules for:○ Master detector○ Master Contender○ PIDGroup

● No need to modify/rebuild Mesos

ZKZK

MesosMaster

ContenderDetector PIDGroup

DistributedKV Store

Etcd

Again, it’s all about having options!

46

● Chocolate

● Strawberry

● Vanilla

● ...

Again, it’s all about having options!

47

● ChocolateZookeeper

● StrawberryEtcd

● VanillaConsul

● ...

Demo!

48

Module CI:A Glimpse!

49

Acknowledgments!

● Shuai Lin● Cody Maloney● Benjamin Hindman● Joseph Wu

50