A Python Petting Zoo

55
A Python Petting Zoo Python and ZooKeeper Devon Jones Senior Software Engineer knewt.ly/pettingzoo-slides

description

Python and ZooKeeper

Transcript of A Python Petting Zoo

Page 1: A Python Petting Zoo

A Python Petting ZooPython and ZooKeeper

Devon JonesSenior Software Engineer

knewt.ly/pettingzoo-slides

Page 2: A Python Petting Zoo

Knewton is an education technology company with a goal of bringing adaptive education to the masses. Knewton makes it possible to break courses into tiny parts that are delivered to each student as personalized, real-time recommendations. Knewton recommends the best work for an individual learner by calculating data on what we know about a student, similar students, the learning objective, and the content itself at a given point in time. The Knewton platform today has tens of thousands of students, will have over 600k students starting Sept. 2012, and will soon have millions of students.

Page 3: A Python Petting Zoo

What is this about?

● What is ZooKeeper, how is it useful● State of ZooKeeper on python● The release of PettingZoo, Knewton's

ZooKeeper recipes for managing a distributed machine learning cluster

Page 4: A Python Petting Zoo

What is ZooKeeper?

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them, which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.

According to ZooKeeper's Apache Site:

Right, but what is it?

Page 5: A Python Petting Zoo

What is ZooKeeper?

ZooKeeper is a distributed filesystem based on the PAXOS algorithim with a few valuable features.● No single point of failure● Strictly ordered, observable state● Events● Sequential and ephemeral primitives● ACLs

Page 6: A Python Petting Zoo

PAXOShttp://groups.csail.mit.edu/tds/papers/DePrisco/WDAG97.pdf● Distributed Finite State Machine● Pairwise connections among hosts● Rounds of Consensus Propositions

conducted by leader● Time bound contracts on interactions● Useful to have internal concurrency

contracts

Page 7: A Python Petting Zoo

What is ZooKeeper for?

ZooKeeper is a platform for creating protocols for synchronization of distributed systems. Uses include:● Distributed configuration● Queues● Implementation of distributed concurrency

primitives such as locks, barriers, latches, counters, etc.

In short, it's a system for managing shared state between distributed systems.

Page 8: A Python Petting Zoo

Consistency Guarantees

ZooKeeper makes a number of promises:● Sequential Consistency● Atomicity● Single System Image● Reliability● Timeliness

Page 9: A Python Petting Zoo

Watches

● Can be set by any read operation● Will fire an event to the client who set them

once and only once● Can be set on nodes or on their data.● Will always be sent to you in a fixed order

Page 10: A Python Petting Zoo

Sequential Nodes

● Appends a monotonically increasing number to the end of the znode (file)

● Can be used directly for leader election● Provides communication of server's view of

ordering

Page 11: A Python Petting Zoo

Ephemeral Nodes

● Only exist for the life of a connection● If your connection does not respond to a

keepalive request, will disappear● Used to ensure reliability against service

disruption for most recipes● Used to trigger events, such as

reconfiguration if a service goes down that published discovery configs

Page 12: A Python Petting Zoo

Recipes

ZooKeeper's documentation explains how to implement a number of recipes:● Barriers● Locks● Queues● Counters● Two Phase Commit● Leader Election

Page 13: A Python Petting Zoo

Recipes

As a community, we need well tested versions of these recipes as well as other valuable protocols built on ZooKeeper

Page 14: A Python Petting Zoo

State of ZooKeeper

● Very low level● Coding for it is very complex● Lots of edge cases● ZooKeeper needs high level, well tested

libraries● Very few complete, high level solutions exist

Page 15: A Python Petting Zoo

ZooKeeper Libraries

The first high level library with significant recipe implementations has emerged from Netflix. It's name is Curator. Unfortunately for us, it's in Java, not Python.

Page 16: A Python Petting Zoo

Curator

https://github.com/Netfix/curator● State

○ High level api○ Non-Resilient Client○ Documentation○ Tests, Embeddable ZooKeeper for testing

● Recipes○ Leader Latch, Leader Election○ Multiple Locks and Semaphores○ Multiple Queues○ Barrier, Double Barrier○ Shared Counter/Distributed Atomic Long

Page 17: A Python Petting Zoo

State of ZooKeeper on Python

● Current state is in a lot of flux● Went from only low level bindings to a

number of incomplete bindings in first half of 2012

● So far nothing like Curator has emerged (but it appears to be brewing)

Page 18: A Python Petting Zoo

One of the top ranked results for 'Python ZooKeeper'

Page 19: A Python Petting Zoo

Summary of Python ZooKeeper bindings

● There are about 10 presently● Many suffer from not handling known edge

cases in ZooKeeper● Some suffer problems with resilient

connections● The following is derived from the python

ZooKeeper binding census of Ben Bangert

Page 20: A Python Petting Zoo

Official Bindings

● State○ Complete access to the ZooKeeper C bindings○ Full of sharp edges○ Not a resilient client○ No recipes○ Threads communication with ZooKeeper in a C

thread○ Foundation for most other libraries○ Very low level

Page 21: A Python Petting Zoo

Bindings

Twisted:● txzookeeper gevent:● gevent-zookeeper● kazoo

Page 22: A Python Petting Zoo

Bindings (Cont)

High level clients:● zc.zk● kazoo● twitter zookeeper Others:● zkpython (low level)● pykeeper (not resiliant)● zoop (not resiliant)

Page 23: A Python Petting Zoo

Bindings (Cont)

Recipes only:● zktools● PettingZoo

Page 24: A Python Petting Zoo

State of ZooKeeper on Python:Kind of a Mess

A project exists to merge some of the high level bindings in an attempt to create a python equivalent of Curator: https://github.com/python-zk Started by Ben Bangert to merge Kazoo & zc.zk with an attempt to implement all Curator recipes.

Page 25: A Python Petting Zoo

PettingZoo

https://github.com/Knewton/pettingzoo-python● State

○ Relies on zc.zk○ Documented, doc strings○ Tests (mock ZooKeeper)○ All recipes implemented in a Java version as well

● Recipes○ Distributed Config○ Distributed Bag○ Leader Queue○ Role Match

Page 26: A Python Petting Zoo

PettingZoo

● In heavy development● Distributed Discovery, Distributed Bag are

well tested and used in production● Leader Queue and Role Match are tested,

but undeployed● PettingZoo will be ported to or merged with

the kazoo effort when it is ready

Page 27: A Python Petting Zoo

Our Problem

Need to be able to do stream processing of observations of student interactions with course material. This involves multiple models that have interdependent parameters. This requires:● Sharding along different axes dependent

upon the models● Subscriptions between models for

parameters● Dynamic reconfiguration of the environment

to deal with current load

Page 28: A Python Petting Zoo

Distributed Discovery

Allows services in a dynamic, distributed environment to be able to be quickly alerted of service address changes.● Most service discovery recipes only contain

host:port, Distributed Discovery can share arbitrary data as well (using yaml)

● Can handle load balancing through random selection of config

● Handles rebalancing on pool change

Page 29: A Python Petting Zoo

How does this help us scale?

● Makes discovery of dependencies simple● Adds to reliability of system by quickly

removing dead resources● Makes dynamic reconfiguration simple as

additional resources become available

Page 30: A Python Petting Zoo

Distributed Discovery Example: Write

from pettingzoo.discovery import write_distributed_config

from pettingzoo.utils import connect_to_zk

conn = connect_to_zk( 'localhost:2181')

config = {

'header': {

'service_class': 'kestrel', 'metadata': {

'protocol': 'memcached', 'version': 1.0

}

},

'host': 'localhost',

'port': 22133

}

write_distributed_config(conn, 'kestrel', 'platform', config)

Page 31: A Python Petting Zoo

Distributed Discovery Example: Read

from pettingzoo.discovery import DistributedMultiDiscovery

from pettingzoo.utils import connect_to_zk

conn = connect_to_zk( 'localhost:2181')

dmd = DistributedMultiDiscovery(conn)

conf = None

def update_configs(path, updated_configs):

conf.update(updated_configs)

conf = dmd.load_config( 'kestrel', 'platform', callback=update_configs)

Page 32: A Python Petting Zoo

Distributed Bag

Recipe for a distributed bag (dbag) that allows processes to share a collection. Any participant can post or remove data, alerting all others.

● Used as a part of Role Match● Useful for any case where

processes need to share configuration determined at runtime

Page 33: A Python Petting Zoo

How does this help us scale?

● Can quickly alert processes as to who is subscribing to them

● Reduces load by quickly yanking dead subscriptions

● Provides event based subscriptions, making implementation simpler

Page 34: A Python Petting Zoo

Distributed Bag

Tokens

<bag>

Items Item1

Token3

Item2Item

3

● Sequential items contain the actual data

● Can be ephemeral● Clients set delete watch

on discrete items● Token is set to id of

highest item● Clients set a child

watch on the "Tokens" node

● Can determine exact adds and deletes with a constant number of messages per delta

Page 35: A Python Petting Zoo

Distributed Bag Exampleimport yaml

from pettingzoo.dbag import DistributedBag

from pettingzoo.utils import connect_to_zk

...

conn = connect_to_zk( 'localhost:2181')

dbag = DistributedBag(conn, '/some/data')

docs = {}

def acb(cb_dbag, cb_id):

docs[cb_id] = cb_dbag.get(cb_id)

def rcb(cb_dbag, cb_id):

docs.remove(cb.id)

dbag.add_listeners(add_callback=acb, remove_callback=rcb)

add_id = dbag.add(yaml.load(document), ephemeral=True)

docs[add_id] = document

Page 36: A Python Petting Zoo

Leader Queue

Recipe is similar to Leader Election, but makes it easy to monitor your spare capacity.● Used in Role Match● As services are ready to do work, they

create an ephemeral, sequential node in the queue.

● Any member always knows if either they are in the queue or at the front

● Watch lets leader know when it is elected

Page 37: A Python Petting Zoo

How does this help us scale?

● Gives a convenient method of assigning work

● Makes monitoring current excess capacity easy

Page 38: A Python Petting Zoo

Leader Queue

● Candidates register with sequential, ephemeral nodes

● Candidate sets delete watch on predecessor

● Candidate is elected when it is the smallest node

● When elected, candidate takes over its new role

● When ready, candidate removes itself from the queue

● Only one candidate needs to call get_children upon any node exiting

<queue> C_3

C_1

C_4

Page 39: A Python Petting Zoo

Leader Queue Usage Examplefrom pettingzoo.leader_queue import LeaderQueue, Candidate

from pettingzoo.utils import connect_to_zk

class SomeCandidate(Candidate):

def on_elected(self):

<do something sexy>

conn = connect_to_zk( 'localhost:2181')

leaderq = LeaderQueue(conn)

leaderq.add_candidate(SomeCandidate())

Page 40: A Python Petting Zoo

Role Match

Allows systems to expose needed, long lived jobs, and for services to take over those jobs until all are filled.● Dbag used to expose jobs● Leader queue used to hold applicants● Records which jobs are presently held with

ephemeral node● Lets a new process take over if a worker

dies● We use it for sharding/segmentation to

dynamically adjust the shards as needed due to load

Page 41: A Python Petting Zoo

How does this help us scale?

● Core of our ability to dynamically adjust shards

● Lets the controlling process adjust problem spaces and have those tasks become automatically filled

● Monitoring is easy to identify who is working on what, when

Page 42: A Python Petting Zoo

Role Match

● Leader monitors for open jobs

● Job holder creates an ephemeral assignment

● Assignment id matches job id, indicating that it is claimed

assignment

<match>

Assgn1A_2

Leader Queueapplicant

job DistributedBag

Page 43: A Python Petting Zoo

Future: Distributed Config

Next project is Distributed Config.● Allows service config to be recorded and

changed with a yaml config● Every process that connects creates a child

node of the appropriate service● Any change in a child node's config

overrides the overall service config for that process

● Any change of the parent or child fires a watch to let the process know that it's config has changed

Page 44: A Python Petting Zoo

Questions?

[email protected]

knewt.ly/pettingzoo-slides

knewt.ly/pettingzoo

Page 45: A Python Petting Zoo

Appendix: Official Bindings

● State○ Complete access to the ZooKeeper C bindings○ Full of sharp edges○ Not a resilient client○ No recipes○ Threads communication with ZooKeeper in a C

thread○ Foundation for most other libraries○ Very low level

Page 46: A Python Petting Zoo

Appendix: zkpython

https://github.com/duncf/zkpython/improvements to a fork of the official bindings● Status

○ Resilient Client○ No docs beyond the official bindings○ Good tests

● Recipes○ Basic Lock (Using unique id rather than UUID)

Page 47: A Python Petting Zoo

Appendix: pykeeper

https://github.com/nkvoll/pykeeper● State

○ Non-resilient Client (not resilient to errors)○ High level client○ Documented, No doc strings○ Tests (Requires running ZooKeeper)

Page 48: A Python Petting Zoo

Appendix: zoop

https://github.com/davidmiller/zoop● State

○ Doesn't handle node create edge-case○ Doesn't handle retryable exceptions○ Documented, doc strings○ Tests (Requires running ZooKeeper)

● Recipes○ Revocable Lock (Doesn't handle create node edge-

case, uses a permanent node instead of ephemeral)

Page 49: A Python Petting Zoo

Appendix: txzookeeper

https://launchpad.net/txzookeeper● State

○ Resilient Client○ Doc strings, no additional documentation○ Supports twisted (only)○ Well tested

● Recipes○ Basic Lock (Not using UUID)○ Queue○ ReliableQueue○ SerializedQueue

Page 50: A Python Petting Zoo

Appendix: twitter zookeeperhttps://github.com/twitter/commons/tree/master/src/python/twitter/common/zookeeper

● State○ Resilient Client○ Handles node create edge-case○ Some documentation○ Well Tested○ Tied to a lot of twitter commons code, difficult to

extract● Recipes

○ Service Registration/Discovery

Page 51: A Python Petting Zoo

Appendix: gevent-zookeeper

https://github.com/jrydberg/gevent-zookeeper/● State

○ Supports gevent○ No documentation○ No tests

Page 52: A Python Petting Zoo

Appendix: Kazoo

https://github.com/nimbusproject/kazoo● Status

○ Resilient Client○ Doc strings, no additional documentation○ Supports gevent○ Minimal tests

● Recipes○ Basic Lock (Uses UUID properly)

Page 53: A Python Petting Zoo

Appendix: zc.zk

https://github.com/python-zk/zc.zk● State

○ Resilient Client○ High level client○ Higher level automatic watch functionality○ Higher level restoration of ephemeral nodes○ Tests (Mock ZooKeeper)○ Documentation, no doc strings

● Recipes○ Service Registration/Discovery

Page 54: A Python Petting Zoo

Appendix: zktools

https://github.com/mozilla-services/zktools● State

○ Relies on zc.zk○ Documented, doc strings○ Tests (Requires running ZooKeepe)

● Recipes○ Shared Read/Write Locks○ AsyncLock○ Revocable Locks

Page 55: A Python Petting Zoo

Appendix: PettingZoo

https://github.com/Knewton/pettingzoo-python● State

○ Relies on zc.zk○ Documented, doc strings○ Tests (mock ZooKeeper)○ All Recipes implemented in a java version as well

● Recipes○ Distributed Config○ Distributed Bag○ Leader Queue○ Role Match