Locker: distributed consistent locking

49
locker: distributed locking Knut Nesheim @knutin

description

 

Transcript of Locker: distributed consistent locking

Page 1: Locker: distributed consistent locking

locker: distributed lockingKnut Nesheim

@knutin

Page 2: Locker: distributed consistent locking

The need

Real-time multiplayer game at Wooga

Stateful

One process per user

One process per world

Page 3: Locker: distributed consistent locking

The need

Only one process per user & world

Cannot reconcile game worlds

Strong consistency

Fine-grained distributed coordination

Lookup table when sending game updates

Page 4: Locker: distributed consistent locking

Lock

World #123 World #123

Client Client

lock lock

visit world #123 visit world #123

ok already_locked

Page 5: Locker: distributed consistent locking

World #123

Client Client

Page 6: Locker: distributed consistent locking

Next generation

Lock implemented once already

Central serialization with atomic ops (Redis)

SPoF

Next gen want higher availability, because..

Page 7: Locker: distributed consistent locking

Living with failure

Hardware sometimes fails

The network is mostly ok

Software is almost bug-free

Downtime is often short

Page 8: Locker: distributed consistent locking

Living with a SPoF sucks..

Page 9: Locker: distributed consistent locking

Development easy

Operations suck

Change become scary

Operator error becomes costly

Must fix problems immediately

Page 10: Locker: distributed consistent locking

Requirements

~10k conditional writes per second

~3M eventually consistent reads per second

~150k reads on local replica

Lock expires if not kept alive

Dynamic cluster membership

Page 11: Locker: distributed consistent locking

ZooKeeper looks like the best option

Page 12: Locker: distributed consistent locking

“What would the dream solution look like?”

Page 13: Locker: distributed consistent locking

AppApp App App App App App

AppApp App App App App App

LB LB

S3 DB

Page 14: Locker: distributed consistent locking

AppApp App App App App App

AppApp App App App App App

Page 15: Locker: distributed consistent locking

AppApp App App App

App App

AppApp App App App App App

Page 16: Locker: distributed consistent locking

AppApp App App App

App App

AppApp App App App App App

N=5, W=3

Page 17: Locker: distributed consistent locking

AppApp App App App

App App

AppApp App App App App App

N=5, W=3

Page 18: Locker: distributed consistent locking

AppApp App App App

App App

AppApp App App App App App

N=5, W=3

Page 19: Locker: distributed consistent locking

AppApp App App App

App App

AppApp App App App App App

N=5, W=3

Page 20: Locker: distributed consistent locking

AppApp App App AppApp

App

AppApp App App App App App

N=5, W=3

Page 21: Locker: distributed consistent locking

Run inside our app servers

Easy to debug, instrument, monitor, deploy

Simple operations

Page 22: Locker: distributed consistent locking

“How hard can it be?”

Page 23: Locker: distributed consistent locking

Distributed systems are hard

Many books, papers on distributed systems

Riak a good example of mindset

Idea: pick the algorithms that fits best

Page 24: Locker: distributed consistent locking

Simplest thing that could possibly work

Page 25: Locker: distributed consistent locking

“good enough”Problem Solution

Consistency,conditional writes

Serialization,“2 Phase Commit”

Availability Multiple serializers,quorum (CP)

Local replica Replay transaction log

Dynamic cluster Manual configuration

Anti-entropy Lock keep alive

Page 26: Locker: distributed consistent locking

Implementation

Proof of concept in Friday afternoon

Looked promising, spent ~3 weeks

Turned into production quality

Test race conditions

PropEr

330 lines (!)

Page 27: Locker: distributed consistent locking

lockerhttp://github.com/wooga/locker

Page 28: Locker: distributed consistent locking

Beware tradeoffs!

Keep consistency, sacrifice availability during failure

No persistency, assumes enough masters stays up

No group membership, views

Manually configure masters, replicas, quorum value

Assumes perfect network during reconfiguration

Not based on a described algorithm

Page 29: Locker: distributed consistent locking

Usage

start_link(W)

set_nodes(AllNodes, Masters, Replicas)

lock(Key, Value, LeaseLength)

extend_lease(Key, Value, LeaseLength)

release(Key, Value)

dirty_read(Key)

Page 30: Locker: distributed consistent locking

lock(Key, Value, LeaseLength)

Two phases:

Prepare: Ask masters for votes

Commit: If majority, write on masters

Timeout counted as negative vote, CP

Asynchronous replication, wait_for/2

Page 31: Locker: distributed consistent locking

Master #1

Client A

Master #3

Client B

locker:lock(foo, pid(0,123,0))

Master #2

Page 32: Locker: distributed consistent locking

Client A Client B

Write locks:[]

locker:lock(foo, pid(0,123,0))

Send: {get_write_lock, foo, not_found}

[] []

Master #1 Master #2 Master #3

Page 33: Locker: distributed consistent locking

Client A Client B

[{lock, foo}] [][{lock, foo}]

Send: {get_write_lock, foo, not_found}

locker:lock(foo, pid(0,123,0))

Master #1 Master #2 Master #3

Page 34: Locker: distributed consistent locking

Client A Client B

[{lock, foo}] [][{lock, foo}]

Send: {get_write_lock, foo, not_found}

locker:lock(foo, pid(0,123,0))

Master #1 Master #2 Master #3

Page 35: Locker: distributed consistent locking

Client A Client B

[{lock, foo}] [{lock, foo}][{lock, foo}]

Send: {get_write_lock, foo, not_found}

locker:lock(foo, pid(0,123,0))

Master #1 Master #2 Master #3

Page 36: Locker: distributed consistent locking

Client A Client B

[{lock, foo}] [{lock, foo}][{lock, foo}]

Alreadylocked!

Send: {get_write_lock, foo, not_found}

Master #1 Master #2 Master #3

locker:lock(foo, pid(0,123,0))

Page 37: Locker: distributed consistent locking

Client A Client B

[{lock, foo}] [{lock, foo}][{lock, foo}]

[ok, ok, error]

Send: {get_write_lock, foo, not_found}

[error,error, ok]

locker:lock(foo, pid(0,123,0))

Master #1 Master #2 Master #3

Page 38: Locker: distributed consistent locking

Client A Client B

[{lock, foo}] [{lock, foo}][{lock, foo}]

[ok, ok, error] [error, error, ok]

locker:lock(foo, pid(0,123,0))

Master #1 Master #2 Master #3

Page 39: Locker: distributed consistent locking

Client A Client B

[ok, ok, error] [error, error, ok]

Send: {write, foo, 123} Send: release_write_lock

[{lock, foo}] [{lock, foo}] [{lock, foo}]

locker:lock(foo, pid(0,123,0))

Master #1 Master #3Master #2

Page 40: Locker: distributed consistent locking

Client A Client B

[] [][]

[ok, ok, error] [error, error, ok]

Send: {write, foo, 123} Send: release_write_lock

locker:lock(foo, pid(0,123,0))

Master #1 Master #3Master #2

Page 41: Locker: distributed consistent locking

Use locker when

Strong consistency is sometimes needed

Protect resources

Leader election

Service discovery

Page 42: Locker: distributed consistent locking

Is it any good?

Page 43: Locker: distributed consistent locking

Some experts say yes,some experts say no

Page 44: Locker: distributed consistent locking

Conclusion

Page 45: Locker: distributed consistent locking

Distributed systems are hard

Page 46: Locker: distributed consistent locking

We could maybe have made ZooKeeper work..

Page 47: Locker: distributed consistent locking

..but we now have our dream system

Page 48: Locker: distributed consistent locking

Ambitious, naive projectled to

big operational advantage

Page 49: Locker: distributed consistent locking

Q&Ahttp://github.com/wooga/locker

@knutin