Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: •...

67
Review SWE 622, Spring 2017 Distributed Software Engineering

Transcript of Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: •...

Page 1: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

ReviewSWE 622, Spring 2017

Distributed Software Engineering

Page 2: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

HW5

2

0

10

20

30

40

50

60

70

80GMU SWE 622 HW 5 Scores (max possible = 80)

Page 3: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

What do we want from Distributed Systems?

• Scalability • Performance • Latency • Availability • Fault Tolerance

3

“Distributed Systems for Fun and Profit”, Takada

Page 4: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Distributed Systems Goals• Scalability• Performance • Latency • Availability • Fault Tolerance

“the ability of a system, network, or process, to handle a growing

amount of work in a capable manner or its ability to be enlarged to accommodate that growth.”

4

“Distributed Systems for Fun and Profit”, Takada

Page 5: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Distributed Systems Goals• Scalability • Performance• Latency • Availability • Fault Tolerance

5

“is characterized by the amount of useful work accomplished by a

computer system compared to the time and resources used.”

Page 6: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Distributed Systems Goals• Scalability • Performance • Latency• Availability • Fault Tolerance

6

“The state of being latent; delay, a period between the initiation of something and the it becoming

visible.”

Page 7: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Distributed Systems Goals• Scalability • Performance • Latency • Availability• Fault Tolerance

7

“the proportion of time a system is in a functioning condition. If a user

cannot access the system, it is said to be unavailable.”

Availability = uptime / (uptime + downtime).

Availability % Downtime/year90% >1 month99% < 4 days

99.9% < 9 hours99.99% <1 hour99.999% 5 minutes

99.9999% 31 seconds

Often measured in “nines”

Page 8: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Distributed Systems Goals• Scalability • Performance • Latency • Availability • Fault Tolerance

8

“ability of a system to behave in a well-defined manner once faults

occur”

What kind of faults?

Disks failPower supplies fail

Power goes out

Networking failsSecurity breached

Datacenter goes offline

Page 9: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

More machines, more problems• Say there’s a 1% chance of having some hardware

failure occur to a machine (power supply burns out, hard disk crashes, etc)

• Now I have 10 machines • Probability(at least one fails) = 1 - Probability(no

machine fails) = 1-(1-.01)10 = 10% • 100 machines -> 63% • 200 machines -> 87% • So obviously just adding more machines doesn’t

solve fault tolerance

9

Page 10: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Designing and Building Distributed Systems

To help design our algorithms and systems, we tend to leverage abstractions and models to make assumptions

10

Stre

ngth

System model

Synchronous

Asynchronous

Failure Model

Crash-fail

Partitions

Byzantine

Consistency ModelEventual

Sequential

Generally: Stronger assumptions -> worse performance Weaker assumptions -> more complicated

Page 11: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Redis Replication

11

Master

Slave SlaveClient

Client

1: Write key “foo”= “bar”

2: Acknowledge write

3: Send update

foo=barfoo=bar

Read foo

foo=bar

Asynchronous updates, weak consistency model, highly available

Page 12: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

CAP Theorem• Pick two of three:

• Consistency: All nodes see the same data at the same time (strong consistency)

• Availability: Individual node failures do not prevent survivors from continuing to operate

• Partition tolerance: The system continues to operate despite message loss (from network and/or node failure)

• You can not have all three, ever*• If you relax your consistency guarantee (from strong

to weak), then you can probably guarantee that

12

Page 13: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Choosing a consistency model

• Sequential consistency • All over - it’s the most intuitive

• Causal consistency • Increasingly useful

• Eventual consistency • Very popular in industry and academia • File synchronizers, Amazon’s Bayou and more

13

Page 14: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Transactions: Classic Example

boolean transferMoney(Person from, Person to, float amount){ if(from.balance >= amount) { from.balance = from.balance - amount; to.balance = to.balance + amount; return true; } return false; }

14

What can go wrong here?

Page 15: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Agreement

• In distributed systems, we have multiple nodes that need to all agree that some object has some state

• Examples: • Who owns a lock • Whether or not to commit a transaction • The value of a clock

15

Page 16: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Properties of Agreement

• Safety (correctness) • All nodes agree on the same value (which was

proposed by some node) • Liveness (fault tolerance, availability)

• If less than N nodes crash, the rest should still be OK

16

Page 17: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

2PC Event Sequence

17

Coordinator

prepared

Participant

committed

done

Transaction state: Local state:prepared

uncertain

committed

Can you commit?

Yes

OK, commit

OK I committed

Page 18: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Timeouts in 2PC• Example:

• Coordinator times out waiting for Goliath National Bank’s response

• Bank times out waiting for coordinator’s outcome message

• Causes? • Network • Overloaded hosts • Both are very realistic…

18

Page 19: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Coordinator Timeouts• If coordinator times out waiting to hear from a bank

• Coordinator hasn’t sent any commit messages yet

• Can safely abort - send abort message • Preserves correctness, sacrifices performance

(maybe didn’t need to abort!) • If either bank decided to commit, it’s fine - they

will eventually abort

19

Page 20: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Handling Bank Timeouts• What if the bank doesn’t hear back from

coordinator? • If bank voted “no”, it’s OK to abort • If bank voted “yes”

• It can’t decide to abort (maybe both banks voted “yes” and coordinator heard this)

• It can’t decide to commit (maybe other bank voted yes)

• Does bank just wait for ever?20

Page 21: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Handling Bank Timeouts• Can resolve SOME timeout problems with

guaranteed correctness in event bank voted “yes” to commit

• Bank asks other bank for status (if it heard from coordinator)

• If other bank heard “commit” or “abort” then do that • If other bank didn’t hear

• but other voted “no”: both banks abort • but other voted “yes”: no decision possible!

21

Page 22: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

2PC Timeouts

• We can solve a lot (but not all of the cases) by having the participants talk to each other

• But, if coordinator fails, there are cases where everyone stalls until it recovers

• Can the coordinator fail?… yes • Hence, 2PC does not guarantee liveness: a single

node failing can cause the entire set to fail

22

Page 23: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

3 Phase Commit• Goal: Eliminate this specific failure from blocking

liveness

23

Coordinator

Participant A

Participant B

Participant C

Participant D

Voted yes

Voted yesXX Heard back “commit”

Voted yes

Voted yes

Did not hear result

Did not hear result

Did not hear result

Page 24: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

3 Phase Commit• Goal: Avoid blocking on node failure • How?

• Think about how 2PC is better than 1PC • 1PC means you can never change your mind or have

a failure after committing • 2PC still means that you can’t have a failure after

committing (committing is irreversible) • 3PC idea:

• Split commit/abort into 2 sub-phases • 1: Tell everyone the outcome • 2: Agree on outcome

24

Page 25: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

3PC Example

25

Coordinator Participants (A,B,C,D)

Soliciting votes

prepare

response

pre-commitCommit authorized (if all yes)

OKcommit

OK

Done

Status: Uncertain

Status: Prepared to commit

Status: Committed

Page 26: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

3PC Crash Handling• Can B/C/D reach a safe decision…

• If any one of them has received preCommit? • YES! Assume A is dead. When A comes

back online, it will recover, and talk to B/C/D to catch up.

• Consider equivalent to in 2PC where B/C/D received the “commit” message and all voted yes

26

Participant B

Participant C

Participant D

CoordinatorXParticipant AX

Page 27: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

3PC Crash Handling• Can B/C/D reach a safe decision…

• If NONE of them has received preCommit? • YES! It is safe to abort, because A can not

have committed (because it couldn’t commit until B/C/D receive and acknowledge the pre-commit)

• This is the big strength of the extra phase over 2PC

• Summary: Any node can crash at any time, and we can always safely abort or commit.

27

Participant B

Participant C

Participant D

CoordinatorXParticipant AX

Page 28: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

3PC Timeout Handling

28

Coordinator Participants (A,B,C,D)

Soliciting votes

prepare

response

pre-commitCommit authorized (if all yes)

OKcommit

OK

Done

Status: Uncertain

Status: Prepared to commit

Status: Committed

Timeout causes abortTimeout causes abort

Timeout causes abort

Timeout causes commit

Page 29: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Does 3PC guarantee consensus?

• Reminder, that means: • Liveness (availability)

• Yes! Always terminates based on timeouts • Safety (correctness)

• Hmm…

29

Page 30: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Partitions

30

Participant B Participant C Participant D

Coordinator

Participant A

Soliciting VotesPrepared to commit

Uncertain Uncertain UncertainUncertain

Yes Yes Yes Yes

Network Partition!!!Prepared to commitPrepared to commitPrepared to commit

Timeout behavior: abortTimeout behavior: Commit!

Commit Authorized

Committed Aborted Aborted Aborted

Page 31: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

FLP - Intuition• Why can’t we make a protocol for consensus/

agreement that can tolerate both partitions and node failures?

• To tolerate a partition, you need to assume that eventually the partition will heal, and the network will deliver the delayed packages

• But the messages might be delayed forever • Hence, your protocol would not come to a result,

until forever (it would not have the liveness property)

31

Page 32: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Partition Tolerance

• Key idea: if you always have an odd number of nodes…

• There will always be a minority partition and a majority partition

• Give up processing in the minority until partition heals and network resumes

• Majority can continue processing

32

Page 33: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Partition Tolerant Consensus Algorithms

• Decisions made by majority• Typically a fixed coordinator (leader) during a time period

(epoch) • How does the leader change?

• Assume it starts out as an arbitrary node • The leader sends a heartbeat • If you haven’t heard from the leader, then you challenge it by

advancing to the next epoch and try to elect a new one • If you don’t get a majority of votes, you don’t get to be leader • …hence no leader in a minority partition

33

Page 34: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

ZooKeeper - Data Model• Provides a hierarchical namespace • Each node is called a znode • ZooKeeper provides an API to manipulate these

nodes

34

/

/app2/app1

/app1/z2/app1/z1

Page 35: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

ZooKeeper - Guarantees

• Liveness guarantees: if a majority of ZooKeeper servers are active and communicating the service will be available

• Durability guarantees: if the ZooKeeper service responds successfully to a change request, that change persists across any number of failures as long as a quorum of servers is eventually able to recover

35

Page 36: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

ZooKeeper - Applications

• Distributed locks • Group membership • Leader election • Shared counters

36

Page 37: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Failure Handling in ZK

• Just using ZooKeeper does not solve failures • Apps using ZooKeeper need to be aware of the

potential failures that can occur, and act appropriately

• ZK client will guarantee consistency if it is connected to the server cluster

37

Page 38: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Failure Handling in ZK

38

Client

ZK 1

ZK 2

ZK 3

2

1

3

Create event

ZK2 has network problem

Client reconnects to ZK3

Reissue create event to zk34

Page 39: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Failure Handling in ZK• If in the middle of an operation, client receives a

ConnectionLossException • Also, client receives a disconnected message • Clients can’t tell whether or not the operation was

completed though - perhaps it was completed before the failure

• Clients that are disconnected can not receive any notifications from ZK

39

Page 40: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Dangers of ignorance

40

Client 1

ZK

Client 2

Disconnected

Notification that /leader is dead

create /leader

Client 2 is leader

create /leaderReconnects, discovers

no longer leader

Page 41: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Dangers of ignorance

• Each client needs to be aware of whether or not its connected: when disconnected, can not assume that it is still included in any way in operations

• By default, ZK client will NOT close the client session because it's disconnected! • Assumption that eventually things will reconnect • Up to you to decide to close it or not

41

Page 42: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

ZK: Handling Reconnection

• What should we do when we reconnect? • Re-issue outstanding requests?

• Can't assume that outstanding requests didn't succeed

• Example: create /leader (succeed but disconnect), re-issue create /leader and fail to create it because you already did it!

42

Page 43: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

GFS• Hundreds of thousands of regular servers • Millions of regular disks • Failures are normal

• App bugs, OS bugs • Human Error • Disk failure, memory failure, network failure, etc

• Huge number of concurrent reads, writes

43

Page 44: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

GFS Architecture

44

Page 45: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

GFS Summary• Limitations:

• Master is a huge bottleneck • Recovery of master is slow

• Lots of success at Google • Performance isn't great for all apps • Consistency needs to be managed by apps • Replaced in 2010 by Google's Colossus system -

eliminates master

45

Page 46: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Distributing Computation• Can't I just add 100 nodes and sort my file 100

times faster? • Not so easy:

• Sending data to/from nodes • Coordinating among nodes • Recovering when one node fails • Optimizing for locality • Debugging

46

Page 47: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Distributing Computation

• Lots of these challenges re-appear, regardless of our specific problem • How to split up the task • How to put the results back together • How to store the data

• Enter, MapReduce

47

Page 48: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

MapReduce• A programming model for large-scale computations

• Takes large inputs, produces output • No side-effects or persistent state other than that input

and output • Runtime library

• Automatic parallelization • Load balancing • Locality optimization • Fault tolerance

48

Page 49: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

MapReduce: Divide & Conquer

49

Combine

Result

r1 r2 r3 r4 r5

worker worker worker worker worker

w1 w2 w3 w4 w5

Partition

Big Data (lots of work)

Page 50: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

MapReduce: Fault Tolerance• Ideally, fine granularity tasks (more tasks than

machines) • On worker-failure:

• Re-execute completed and in-progress map tasks • Re-executes in-progress reduce tasks • Commit completion to master

• On master-failure: • Recover state (master checkpoints in a primary-

backup mechanism)

50

Page 51: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Hadoop + ZooKeeper

51

DataNode DataNodeDataNode DataNode DataNodeDataNode

DataNode DataNodeDataNode DataNode DataNodeDataNode

DataNode DataNodeDataNode DataNode DataNodeDataNode

DataNode DataNodeDataNode DataNode DataNodeDataNode

NameNode

ZKClient

NameNode

ZKClient

Primary Secondary

ZK Server ZK ServerZK Server

Note - this is why ZK is helpful here: we can have the ZK servers partitioned *too* and still

tolerate it the same way

Page 52: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Where do we find data?• What's bad with the single master picture? • HDFS/GFS leverage the fact that there is relatively

little metadata, lots of data (e.g. few files, each file is large)

• What if there is really only metadata? • How can we build a system with high performance

without having this single server that knows where data is stored?

52

Page 53: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Hashing• The last one mapped every input to a different hash • Doesn't have to, could be collisions

53

Leo McGarryJosh Lyman

Sam Seaborn

Toby Ziegler

Inputs

Hash Function

01234567

Hash

Page 54: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Hashing for Partitioning

54

Input

Somebiglongpieceoftext

ordatabasekey

Hash Result

900405hash()= %20=

Server ID

5

Page 55: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Consistent HashingIt is relatively smooth: adding a new bucket doesn't change that much

55

0

4

8

12

Add new bucket: only changes location of keys

7,8,9,10

Delete bucket: only changes location of keys

1,2,3

Page 56: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Napster

56

Napster Master

Hi, I signed on, I have files f1, f2, f3 Who has f1?

client 1 client 2

client 1 doesCan I have f1?

Here is f1

Doesn’t everything just look like GFS, even things that predated it? :)

Page 57: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Gnutella 1.0

57

Can I have f1?where is f1?

where is f1?where is f1?

client 1

client 2 client 3

client 4

client 5

c2 has f1

Page 58: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

BitTorrent

58

Tracker

Client

Client

Client

ClientClient

Client

Page 59: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Byzantine Generals Problem

59

Don’t attack!

Attack!

Attack!

Page 60: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Oral BFT Example (n=4, m=1)

60

Commander

Lieutenant 1 Lieutenant 2 Lieutenant 3

xy z

zx

Page 61: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Blockchains• Solution: make it hard for participants to take over the

network; provide rewards for participants so they will still participate

• Each participant stores the entire record of transactions as blocks

• Each block contains some number of transactions and the hash of the previous block

• All participants follow a set of rules to determine if a new block is valid

61

h0 h1 h2 h3 h4 h6 h7 h8 hn dn

Page 62: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

Blockchain's view of consensus

62

h1 h2 h3 h4Miner 1:

Miner 2: h1 h2 h3 h4

h51

h52

Miner 3: h1 h2 h3 h4 h52 h62

“Longest chain rule” When is a block truly safe?

Page 63: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

CFS• What is the system model?

• Synchronous replication (thanks to WAIT), but it has no rollback in event replication fails • We solve this by destroying the entire server if

the replication fails • Partition tolerant

63

Page 64: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

CFS Fault Tolerance

64

Master

Slave SlaveSlave Slave

Client Client

Master

Goal 1: Promote new master when master crashes

Page 65: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

CFS Fault Tolerance

65

Master

Slave SlaveSlave Slave

Client Client

disconnected

OK

Master

Disconnected

Disconnected

Page 66: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

CFS Fault Tolerance

66

Slave SlaveSlave Slave

Client Client

disconnected

OK

MasterDisconnected

SlaveDisconnected

Goal 2: Promote new master when partitioned + have quorum, cease

functioning in minority

Page 67: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:

J. Bell GMU SWE 622 Spring 2017

CFS + Failures + Inconsistency

67

Master SlaveSlave Slave

Would a partition like this cause inconsistency? No, if we WAIT for all replicas; Yes if we only wait for a

quorum!