Byzantine Fault-Tolerance in Federated Local SGD under 2f ...
Byzantine fault-tolerance
description
Transcript of Byzantine fault-tolerance
![Page 1: Byzantine fault-tolerance](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160f6550346895dd03296/html5/thumbnails/1.jpg)
Byzantine fault-tolerance
COMP 413Fall 2002
![Page 2: Byzantine fault-tolerance](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160f6550346895dd03296/html5/thumbnails/2.jpg)
Overview
• Models– Synchronous vs. asynchronous systems– Byzantine failure model
• Secure storage with self-certifying data• Byzantine quorums• Byzantine state machines
![Page 3: Byzantine fault-tolerance](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160f6550346895dd03296/html5/thumbnails/3.jpg)
Models
Synchronous system: bounded message delays (implies reliable network!)
Asynchronous system: message delays are unbounded
In practice (Internet): reasonable to assume that network failures are eventually fixed (weak synchrony assumption).
![Page 4: Byzantine fault-tolerance](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160f6550346895dd03296/html5/thumbnails/4.jpg)
Model (cont’d)
• Data and services (state machines) can be replicated on a set of nodes R.
• Each node in R has iid probability of failing• Can specifiy bound f on the number of
nodes that can fail simultaneously
![Page 5: Byzantine fault-tolerance](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160f6550346895dd03296/html5/thumbnails/5.jpg)
Model (cont’d)
Byzantine failures• no assumption about nature of fault• failed nodes can behave in arbitrary ways• may act as intelligent adversary
(compromised node), with full knowledge of the protocols
• failed nodes may conspire (act as one)
![Page 6: Byzantine fault-tolerance](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160f6550346895dd03296/html5/thumbnails/6.jpg)
Self-certifying data
![Page 7: Byzantine fault-tolerance](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160f6550346895dd03296/html5/thumbnails/7.jpg)
Byzantine quorums
• Data is not self-certifying (multiple writers without shared keys)
• Idea: replicate data on sufficient number of replicas (relative to f) to be able to rely on majority vote
![Page 8: Byzantine fault-tolerance](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160f6550346895dd03296/html5/thumbnails/8.jpg)
Byzantine quorums: r/w variable
Representative problem: implement a read/write variable
Assuming no concurrent reads, writes for nowAssuming trusted clients, for now
![Page 9: Byzantine fault-tolerance](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160f6550346895dd03296/html5/thumbnails/9.jpg)
Byzantine quorums: r/w variableHow many replicas do we need?• clearly, need at least 2f+1, so we have a majority
of good nodes• write(x): send x to all replicas, wait for
acknowledgments (must get at least f+1)• read(x): request x from all replicas, wait for
responses, take majority vote (if no concurrent writes, must get f+1 identical votes!)
R
W
![Page 10: Byzantine fault-tolerance](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160f6550346895dd03296/html5/thumbnails/10.jpg)
Byzantine quorums: r/w variable
Does this work? Yes, but only if• system is synchronous (bounded msg delay)• faulty nodes cannot forge messages
(messages are authenticated!)
![Page 11: Byzantine fault-tolerance](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160f6550346895dd03296/html5/thumbnails/11.jpg)
Byzantine quorums: r/w variable
Now, assume• Weak synchrony (network failures are fixed
eventually)• messages are authenticated (e.g., signed
with sender’s private key)
![Page 12: Byzantine fault-tolerance](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160f6550346895dd03296/html5/thumbnails/12.jpg)
Byzantine quorums: r/w variableLet’s try 3f+1 replicas (known lower bound)• write(x): send x to all replicas, wait for 2f+1
responses (must have at least f+1 good replicas with correct value)
• read(x): request x from all replicas, wait for 2f+1 responses, take majority vote (if no concurrent writes, must get f+1 identical votes!? – no, it is possible that the f nodes that did not respond were good nodes!)
R
W
![Page 13: Byzantine fault-tolerance](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160f6550346895dd03296/html5/thumbnails/13.jpg)
Byzantine quorums: r/w variableLet’s try 4f+1 replicas• write(x): send x to all replicas, wait for 3f+1
responses (must have at least 2f+1 good replicas with correct value)
• read(x): request x from all replicas, wait for 3f+1 responses, take majority vote (if no concurrent writes, must get f+1 identical votes!? – no, it is possible that the f faulty nodes vote with the good nodes that have an old value of x!)
R
W
![Page 14: Byzantine fault-tolerance](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160f6550346895dd03296/html5/thumbnails/14.jpg)
Byzantine quorums: r/w variableLet’s try 5f+1 replicas• write(x): send x to all replicas, wait for 4f+1
responses (must have at least 3f+1 good replicas with correct value)
• read(x): request x from all replicas, wait for 4f+1 responses, take majority vote (if no concurrent writes, must get f+1 identical votes!)
• Actually, can use only 5f replicas if data is written with monotonically increasing timestamps
W
R
![Page 15: Byzantine fault-tolerance](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160f6550346895dd03296/html5/thumbnails/15.jpg)
Byzantine quorums: r/w variable
Still rely on trusted clients• Malicious client could send different values to
replicas, or send value to less than a full quorum • To fix this, need a byzantine agreement protocols
among the replicas
Still don’t handle concurrent accessesStill don’t handle group changes
![Page 16: Byzantine fault-tolerance](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160f6550346895dd03296/html5/thumbnails/16.jpg)
Byzantine state machine
BFT (Castro, 2000)• Can implement any service that behaves
like a deterministic state machine• Can tolerate malicious clients • Safe with concurrent requests• Requires 3f+1 replicas• 5 rounds of messages
![Page 17: Byzantine fault-tolerance](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160f6550346895dd03296/html5/thumbnails/17.jpg)
Byzantine state machine
• Clients send requests to one replica• Correct replicas execute all requests in same order• Atomic multicast protocol among replicas ensures
that all replicas receive and execute all requests in the same order
• Since all replicas start in same state, correct replicas produce identical result
• Client waits for f+1 identical results from different replicas
![Page 18: Byzantine fault-tolerance](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160f6550346895dd03296/html5/thumbnails/18.jpg)
BFT protocol
![Page 19: Byzantine fault-tolerance](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160f6550346895dd03296/html5/thumbnails/19.jpg)
BFT: Protocol overview
• Client c sends m = <REQUEST,o,t,c>σc to the primary. (o=operation,t=monotonic timestamp)
• Primary p assigns seq# n to m and sends <PRE-PREPARE,v,n,m> σp to other replicas. (v=current view, i.e., replica set)
• If replica i accepts the message, it sends <PREPARE,v,n,d,i> σi to other replicas. (d is hash of the request). Signals that i agrees to assign n to m in v.
![Page 20: Byzantine fault-tolerance](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160f6550346895dd03296/html5/thumbnails/20.jpg)
BFT: Protocol overview
• Once replica i has a pre-prepare and 2f+1 matching prepare messages, it sends <COMMIT,v,n,d,i> σi
to other replicas. At this point, correct replicas agree on an order of requests within a view.
• Once replica i has 2f+1 matching prepare and commit messages, it executes m, then sends <REPLY,v,t,c,i,r> σi to the client. (The need for this last step has to do with view changes.)
![Page 21: Byzantine fault-tolerance](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160f6550346895dd03296/html5/thumbnails/21.jpg)
BFT
• More complexity related to view changes and garbage collection of message logs
• Public-key crypto signatures are bottleneck: a variation of the protocol uses symmetric crypto (MACs) to provide authenticated channels. (Not easy: MACs are less powerful: can’t prove authenticity to a third party!)