© DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS)...

49
© DEEDS – OS © DEEDS – OS Distributed Operating Systems

Transcript of © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS)...

Page 1: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS © DEEDS – OS

Distributed Operating Systems

Page 2: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Coverage

• Distributed Systems (DS) Paradigms– DS … NOS, DOS’s

– DS Services: communication, synchronization, coordination, replication…

Page 3: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

What is a Distributed System“A distributed system is the one preventing you from workingbecause of the failure of a machine that you had never heard of”

Leslie Lamport

Multiple computers sharing (same) state and interconnected by a network

… collection of autonomous entities appearing to users as a single OS

shared memory multiprocessor

message passing multicomputer

distributed system

Page 4: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Distribution: Example Pro/Cons

The Good Stuff: Resource Sharing (concurrency performance), Distributed Access (matching spatial distribution of applications), Scalable, Load Balancing (Migration, Relocation), Fault Tolerance.

Bank account database (DB) example– Naturally centralized: easy consistency and performance

– Fragment DB among regions: exploit locality of reference, security & reduce reliance on network for remote access

– Replicate each fragment for fault tolerance

But, we now need (additional) DS techniques– Route request to right fragment

– Maintain access/consistency of fragments as a whole database

– Maintain access/consistency of each fragment’s replicas

– …

Page 5: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

OS’s for DS’s Loosely-coupled OS

– A collection of computers each running their own OS with OS’s allowing sharing of resources across machines

– A.K.A. Network Operating System (NOS) Provides local services to remote clients via remote logging Data transfer from remote OS to local OS via FTP (File Transfer Protocols)

Tightly-coupled OS– OS tries to maintain single global view of resources it manages– A.K.A. Distributed Operating System (DOS) “Local access feel” as a non-distributed, standalone OS Data migration or computation migration modes (entire process or threads)

Page 6: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Network Operating Systems (NOS)

Provide an environment where users are (explicitly) aware of the multiplicity of machines. Users can access remote resources by

logging into the remote machine OR transferring data from the remote machine to their own machine

Users should know where the required files and directories are and mount them. Each machine could act like a server and a client at the same time. E.g NFS from Sun Microsystems, CMU-AFS etc

Page 7: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Distributed Operating Systems (DOS)

Runs on a cluster of machines with no shared memory Users get the feel of a single processor - virtual uni-processor Transparency is the driving force Requires

A single global IPC mechanism Identical process management and system calls at all nodes Common file system at all nodes State, services and data consistency

Page 8: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Basic Client Server Model for DOS & NOS

Non-blocking comm.!!!File based

Comm./object based

Page 9: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Middleware

• Can we have the best of both worlds?– Scalability and openness of a NOS

– Transparency and common-state of a DOS

• Solution additional layer of SW above OS (Middleware)– Mask heterogeneity

– Improve distribution transparency (and others)

Page 10: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Middleware Openness Basis

• Document-based middleware (e.g. WWW)• Coordination-based MW (e.g., Linda, publish subscribe, Jini etc.)

• File system based MW (upload/download, remote access)

• Shared object based MW

Page 11: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Global Access Transparency

Illusion of a single computer across a DS

Distribution transparency: All of above + performance + flexibility (modification, enhancements for kernel/devices), balancing/scheduling, & scaling (allowing systems to expand without disrupting users) + …

Fragmentation Hide whether the resource is fragmented or not

Page 12: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Reliability, Performance, Scalability

Faults (Fail stop, transient, Byzantine) Fault Avoidance (de-cluster, rejuvenate) Fault Tolerance

Redundancy techniques (k failures??) Distributed control

Fault Detection & Recovery Atomic transactions Stateless servers Acknowledgements and timeout-based

retransmissions of messages

Batch if possible Cache whenever possible Minimize copying of data Minimize network traffic Take advantage of fine-grain

parallelism for multiprocessing

Avoid centralized entities Provides no/limited fault tolerance Leads to system bottlenecks Issues of network traffic capabilities with centralized entity

Avoid centralized algorithms Perform most operations on client workstations

Page 13: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Design Issues Resource management

Hard to obtain consistent information about utilization or availability of resources.

Has be calculated (costly!!) approximately using heuristic methods. Processor allocation

Load balancing Hierarchical organization of processors. If a processor cannot handle a request, ask the parent for help. …BUT crashing of a higher level processor results in isolation of all

processors attached to it. Process scheduling

Communication dependency, Causality, Linearizability… to consider Fault tolerance

Consider distribution of control and data. Services provided

Typical services include name, directory, file, time, etc.

Page 14: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Process Addressing ~ N-OS Flavor Explicit addressing

Send(process_id, message)

Implicit addressing (Functional addressing) Send_any(service_id, message)

Ex: Machine_id@local_id (Berkeley UNIX)- Limited with process migration

Link based process addressingEx: machine_id@local_id@machine_id

- Overload of locating a process

- Intermediate node failure

System-wide unique identifier (Location Transparency) High level m/c independent and low level m/c dependent

- Centralized naming server for high level id (functional)

Page 15: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

So what services do we need to realize DS?

• Communication

• Coordination (Stateful? Stateless?) & Synchronization

• Replication

• Failure Handling

• Consistency

• Liveness

• Storage

Page 16: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Communication (Group Comm)

One to many communication (blocking or non-blocking?) Multicast/Broadcast Open group/Closed group

Flexible reliability The 0-reliable The 1-reliable The m-out-of-n reliable All reliable

Atomic MulticastMany to one communicationMany to many Communication

Absolute Ordering (Global clock) Consistent ordering (Sequencer/ABCAST protocol) Causal ordering

Page 17: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Communication Failure handling Delivers messages despite

– communication link(s) failure– process failures

Main kinds of failures to tolerate– timing (link and process)– omission (link and process)– value

Loss of request message Loss of response message Unsuccessful execution of the request (system crash)Inter Process Communication (IPC)

Two message IPC (Request, reply) Three message reliable IPC (request, reply, ack) Four message reliable IPC (request, ack, reply, ack)

Failure handling At-least-once (Time out) Idempotency (no side effects no matter how many times performed) Nonidempotent (Exactly once semantics)

• Reply from the cache with unique Id

Page 18: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Communication: Reliable Delivery

• Omission failure tolerance (degree k).

• Design choices:a) Error masking (spatial): several (> k) links

b) Error masking (temporal): repeat K+1 times

c) Error recovery: detect error and recover

Page 19: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Reliable Delivery (cont.)

Error detection and recovery: ACK’s and timeouts

• Positive ACK: sent when a message is received– Timeout on sender without ACK: sender retransmits

• Negative ACK: sent when a message loss detected– Needs sequence #s or time-based reception semantics

• Tradeoffs– Positive ACKs faster failure detection usually– NACKs : fewer msgs…

Q: what kind of situations are good for– Spatial error masking?– Temporal error masking?– Error detection and recovery with positive ACKs?– Error detection and recovery with NACKs?

Page 20: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Resilience to Sender Failure

• Multicast FT-Communication harder than point-to-point– Basic problem is of failure detection

– Subsets of senders may receive msg, then sender fails

• Solutions depend on flavor of multicast reliabilitya) Unreliable: no effort to overcome link failures

b) Best-effort: some steps taken to overcome link failures

c) Reliable: participants coordinate to ensure that all or none of correct recipients get it

Page 21: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Coverage

• DS Paradigms– DS & OS’s

– Services and models

– Communication

• Coordination– Distributed ME

– Distributed Coordination

Page 22: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Co-ordination protocols in DOS/DS

• Distributed ME• Distributed atomicity• Distributed synchronization & ordering

How do we co-ordinate the distributed resources for ME, CS access, consistency etc?

Page 23: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Co-ordination in Distributed Systems

• Event ordering– centralized system: ~ easy (common clock & memory)

– distributed system: hard (convergent/consistent dist. time)

• Example: Unix “make” program– source files/object files make [compiles & links based on last version]

– a.o @99 & a.c @100 re-compile a.c [assuming a common time base]

– “make” in a DS?

A

Ba.c @ 95

a.o @ 98

a.c’ @ 97slow clock

Page 24: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Synchronization

Blocking (send primitive is blocked until receive acknowledgment) Time out

Nonblocking (send and copy to buffer) Polling at receive primitive Interrupt

Synchronous (Send and receive primitives are blocked) Asynchronous

Distributed Synchronization with failures? DB/Control apps where “order” is essential for consistency

Page 25: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

ME

• TSL, Semaphores, monitors… (Single OS)

• Do they work in DS given timing delays, ordering issues +++?

Page 26: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Let’s start with TSL for Multiprocessors

• TSL no longer atomic but over the bus! CPU #1/#2 TSL RW sequencing?• Both CPU #1 & #2 think they have CS access – ME?

– Single CPU: disabling interrupts; Multiprocessor?– Is TSL atomic at the distributed/networked level? –

TSL instruction can fail unless bus locking made part of the TSL op.

Page 27: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Progressive TSL – Private Locks per CPU

* possible to make separate decision each time locked mutex encountered* multiple locks needed to avoid cache thrashing

• CPU needing a locked mutex just waits for it, either by - polling continuously, polling intermittently, - or attaching itself to a list of waiting CPUs:

Page 28: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Distributed Lock Problems

p3

p2

p1

LOCK GRANTEDLOCK

GRANTEDLOCK

LOCK

p4

What happens? Solutions?

Page 29: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Distributed Mutual Exclusion

• Lock server solution problems– Server is a single point of failure

– Server is performance bottleneck

– Failure of client holding lock also causes problems: No unlock sent

• Similar to garbage collection problems in DSs … validity conditions etc

What is the state of the lock server? For stateless servers?

Works? Under what assumptions?

• Solution #1: Build a lock server in a centralized manner (generally simple)

Page 30: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Distributed Mutual Exclusion (cont.)

• Solution #2: Decentralized alg– Replicate state in central server

on all processes

– Requesting process sends LOCK message to all others

– Then waits for LOCK_GRANTED from all

– To release critical region, send UNLOCK to all

Works? What assumptions?

Page 31: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Co-ordination in Distributed Systems

1. Given distributed entities, how does one obtain resource “coordination” to result in an agreed action (such as CS access/ME, shared memory “writes”, producer/consumer modes or decisions)?

2. How are distributed tasks/requests “ordered”?3. Given distributed resources, how do they “all” agree on a

“single” course of action?

• Asynchronous Co-ordination-2PC, leader elections, etc…

• Synchronous Co-ordination - clocks, ordering, serialization, etc …

Page 32: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Consistency & Distributed State Machines

Page 33: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Distributed State Machine

Consensus

Page 34: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Asynch: “Single” Decision - Commit

2PC: Two Phase Commit Protocols– coordinator (pre-specified or selected dynamically)

– multiple secondary sites (“cohorts”)

Objective: All nodes agree and execute a single decision [all agree or no action taken...banking transactions]

Page 35: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Two – Phase Commit (2PC) Protocol

1. send PREPARE to all

.......”bounded waiting”.......------------------------------------4. receive OK from all put COMMIT in log & send COMMIT to all

4’. receive ABORT send ABORT to all

5. ACK from all? DONE

2. get msg (PREPARE)

3. if ready, send OK

(write undo/redo logs)

else, send NOT-OK------------------------------------

4 receive COMMIT

release resources,

send ACK

4’ receive ABORT, undo actions

Coord. Actions

EachClient Actions

Page 36: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Two-phase commit (cont.)

Problem: coordinator failure, after PREPARE & before COMMIT, blocks participants waiting for decision (a)

• Three-phase commit overcomes this (b) … slowwwwww– delay final decision until enough processes “know” which decision will be taken

Q: can this somehow block?

Page 37: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Comments...

• Time-lag in making decisions – RT applications??• Resources locked till voting/decision is completed

• Message overhead• Reliable common assumptions

• Possibilities of deadlock/livelock• Limited fault tolerance (coordinator dependency)• New coordinator initiation per new request

Page 38: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Distributed ME (all ACK)

• To request CS: send REQ msg. M to ALL; enQ M in local Q

• Upon receiving M from Pi

– if it does not want (or have) CS, send ACK

– if it has CS, enQ request Pi : M

– if it wants CS, enQ/ACK based on lowest ID (time-stamp would be so much nicer but lack of time no time basis for timestamps)

• To release CS: send ACK to all in Q, deQ [diff. from 2PC]

• To enter CS: enter CS when ACK received from all

A

CB

A

CB

A

CB

8 12

12

8

ACK

ACK ACK

{8,12}

enters CS

ACK

enters CS

{8}

{12}

Page 39: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

DS Solutions: State Machine Replication

State machine replicationImplements the abstractionof a single reliable server

Page 40: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Efficient State Machines (with Crashes)

• Observation: worst-case failures are the exception

– Replicas often have an a-priori consistent view (w/o coordination)

– There is a correct replica (called leader) known to every replica

“Fast” consensus = short latency

• Observation: latency optimizations have overhead

– Significant latency overhead if assumptions are not met

• Observation: messages & crypto are expensive

Minimal latency + no crypto + trade latency for mess. complexity

Page 41: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Example: Web Applications• Ideal setting for applying replication

– Exposed to the Internet, strong reliability requirements

Applicationcode

Front-endserver

Database

WEB SERVER

Client

Page 42: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Practical Replication

client

primary

backup

backup

backupORDER AGREE COMMIT

REQUEST f+1 REPLIES

deliver

• Seminal work on efficient replication– Optimal resilience

– Three phases: Non-optimal

– O(n2) message complexity: Non-optimal

Page 43: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Motivation

[PNUTS , ZooKeeper]

[Cassandra][GFS, Bigtable]

[Dynamo]

15K+ commodity servers

Page 44: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Internet Datacenters

• Large Scale Crashes are the common case to handle

• Need high performance for ALMOST ALL requests – Example: Dynamo’s SLA specifies worst-case latency for 99,9% of the

requests under high load

– ALL = also in presence of crashes

• Need low replication costs– 100s to 1,000s of replicated services

– Additional replication costs must be multiplied over the number of services

– Diagnosis, repair & re-configurations

– Speed with unresponsive replicas, e.g. WAN replication• Replicas can be located at geographically remote sites

• Some sites can become temporary unreachable

Page 45: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Large Scale DS Goals• Consistency (Safety)

– Linearizability

• Availability (Liveness)– Wait-freedom

• Performance– Latency, throughput, ...

Despite – Failures– Concurrency– Asynchrony

Page 46: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Challenges

• Crash failures– Detectable

– Very popular

• Byzantine failures– Nodes under adversarial control

– Worst-case needs...but costly (# replicas, latency, complexity)

• Asynchronous communication– Reflects real networks (e.g. WAN)

Page 47: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Efficient distributed solutions

• Main efficiency metrics

– Resilience: # of replicas

– Latency: # of communication steps

– Crypto: use of signatures

– Message complexity: # of messages

E.g., 3t+1 replicas

Client

Leader replica

Replicas

Page 48: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

• Key advocated abstractions: – Consensus

– Distributed Storage

• State Machine Replication (SMR) Consensus

• Reliable Shared Memory Distributed Storage

Advocated Solutions

no reply or“bad” reply

clientsrequest

reply

clientsReplicationrequest

service

service

Practical implementations of these abstractions?

Page 49: © DEEDS – OS Distributed Operating Systems. © DEEDS – OS Coverage Distributed Systems (DS) Paradigms –DS … NOS, DOS’s –DS Services: communication, synchronization,

© DEEDS – OS

Distributed Storage

Storage serverrequest

“bad” reply orno reply

clients

Previously Nowadays

request

reply certificate

clientsDistributed

Storage

Storage is a state machine w/ operations Storage is a state machine w/ operations readread and and write (SWMR/MWMR…)write (SWMR/MWMR…)