Topic 3: Large-scale Distributed Systems

118
3: Large-scale Distributed Systems Zubair Nabi [email protected] April 17, 2013 Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 1 / 29

description

Cloud Computing Workshop 2013, ITU

Transcript of Topic 3: Large-scale Distributed Systems

Page 1: Topic 3: Large-scale Distributed Systems

3: Large-scale Distributed Systems

Zubair Nabi

[email protected]

April 17, 2013

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 1 / 29

Page 2: Topic 3: Large-scale Distributed Systems

Outline

1 Introduction

2 Client-server Interaction

3 Characteristics

4 Message Passing Interface

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 2 / 29

Page 3: Topic 3: Large-scale Distributed Systems

Outline

1 Introduction

2 Client-server Interaction

3 Characteristics

4 Message Passing Interface

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 3 / 29

Page 4: Topic 3: Large-scale Distributed Systems

Distributed Systems

Set of discrete machines which cooperate to perform computation

Give the notion of a single “machine”Examples:

I Compute clustersI Distributed storage systems, such as Dropbox, Google Drive, etc.I The Web

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 4 / 29

Page 5: Topic 3: Large-scale Distributed Systems

Distributed Systems

Set of discrete machines which cooperate to perform computation

Give the notion of a single “machine”

Examples:I Compute clustersI Distributed storage systems, such as Dropbox, Google Drive, etc.I The Web

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 4 / 29

Page 6: Topic 3: Large-scale Distributed Systems

Distributed Systems

Set of discrete machines which cooperate to perform computation

Give the notion of a single “machine”Examples:

I Compute clusters

I Distributed storage systems, such as Dropbox, Google Drive, etc.I The Web

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 4 / 29

Page 7: Topic 3: Large-scale Distributed Systems

Distributed Systems

Set of discrete machines which cooperate to perform computation

Give the notion of a single “machine”Examples:

I Compute clustersI Distributed storage systems, such as Dropbox, Google Drive, etc.

I The Web

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 4 / 29

Page 8: Topic 3: Large-scale Distributed Systems

Distributed Systems

Set of discrete machines which cooperate to perform computation

Give the notion of a single “machine”Examples:

I Compute clustersI Distributed storage systems, such as Dropbox, Google Drive, etc.I The Web

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 4 / 29

Page 9: Topic 3: Large-scale Distributed Systems

Advantages

Scalability:I The scale of the Internet (think how many queries Google servers

handle daily)

I Only a matter of adding more machinesI Cheaper than super computersI More machines means more parallelism, hence better performance

Sharing:I The same resource is shared between multiple usersI Just like the Internet is shared between millions of users

Communication:I Communication between (potentially geographically isolated) machines

and users (via email, Facebook, etc.)

Reliability:I The service can remain active even if multiple machines go down

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29

Page 10: Topic 3: Large-scale Distributed Systems

Advantages

Scalability:I The scale of the Internet (think how many queries Google servers

handle daily)I Only a matter of adding more machines

I Cheaper than super computersI More machines means more parallelism, hence better performance

Sharing:I The same resource is shared between multiple usersI Just like the Internet is shared between millions of users

Communication:I Communication between (potentially geographically isolated) machines

and users (via email, Facebook, etc.)

Reliability:I The service can remain active even if multiple machines go down

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29

Page 11: Topic 3: Large-scale Distributed Systems

Advantages

Scalability:I The scale of the Internet (think how many queries Google servers

handle daily)I Only a matter of adding more machinesI Cheaper than super computers

I More machines means more parallelism, hence better performance

Sharing:I The same resource is shared between multiple usersI Just like the Internet is shared between millions of users

Communication:I Communication between (potentially geographically isolated) machines

and users (via email, Facebook, etc.)

Reliability:I The service can remain active even if multiple machines go down

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29

Page 12: Topic 3: Large-scale Distributed Systems

Advantages

Scalability:I The scale of the Internet (think how many queries Google servers

handle daily)I Only a matter of adding more machinesI Cheaper than super computersI More machines means more parallelism, hence better performance

Sharing:I The same resource is shared between multiple usersI Just like the Internet is shared between millions of users

Communication:I Communication between (potentially geographically isolated) machines

and users (via email, Facebook, etc.)

Reliability:I The service can remain active even if multiple machines go down

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29

Page 13: Topic 3: Large-scale Distributed Systems

Advantages

Scalability:I The scale of the Internet (think how many queries Google servers

handle daily)I Only a matter of adding more machinesI Cheaper than super computersI More machines means more parallelism, hence better performance

Sharing:I The same resource is shared between multiple users

I Just like the Internet is shared between millions of users

Communication:I Communication between (potentially geographically isolated) machines

and users (via email, Facebook, etc.)

Reliability:I The service can remain active even if multiple machines go down

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29

Page 14: Topic 3: Large-scale Distributed Systems

Advantages

Scalability:I The scale of the Internet (think how many queries Google servers

handle daily)I Only a matter of adding more machinesI Cheaper than super computersI More machines means more parallelism, hence better performance

Sharing:I The same resource is shared between multiple usersI Just like the Internet is shared between millions of users

Communication:I Communication between (potentially geographically isolated) machines

and users (via email, Facebook, etc.)

Reliability:I The service can remain active even if multiple machines go down

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29

Page 15: Topic 3: Large-scale Distributed Systems

Advantages

Scalability:I The scale of the Internet (think how many queries Google servers

handle daily)I Only a matter of adding more machinesI Cheaper than super computersI More machines means more parallelism, hence better performance

Sharing:I The same resource is shared between multiple usersI Just like the Internet is shared between millions of users

Communication:I Communication between (potentially geographically isolated) machines

and users (via email, Facebook, etc.)

Reliability:I The service can remain active even if multiple machines go down

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29

Page 16: Topic 3: Large-scale Distributed Systems

Advantages

Scalability:I The scale of the Internet (think how many queries Google servers

handle daily)I Only a matter of adding more machinesI Cheaper than super computersI More machines means more parallelism, hence better performance

Sharing:I The same resource is shared between multiple usersI Just like the Internet is shared between millions of users

Communication:I Communication between (potentially geographically isolated) machines

and users (via email, Facebook, etc.)

Reliability:I The service can remain active even if multiple machines go down

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29

Page 17: Topic 3: Large-scale Distributed Systems

Challenges

Concurrency:I Concurrent execution requires some form of coordination

Fault-tolerance:I Any component can fail at any instant due to a software or a hardware

bug

Security:I One machine can compromise the entire system

Coordination:I No global time so non-trivial to coordinate

Trouble shooting:I Hard to trouble shoot because hard to reason about the system

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 6 / 29

Page 18: Topic 3: Large-scale Distributed Systems

Challenges

Concurrency:I Concurrent execution requires some form of coordination

Fault-tolerance:I Any component can fail at any instant due to a software or a hardware

bug

Security:I One machine can compromise the entire system

Coordination:I No global time so non-trivial to coordinate

Trouble shooting:I Hard to trouble shoot because hard to reason about the system

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 6 / 29

Page 19: Topic 3: Large-scale Distributed Systems

Challenges

Concurrency:I Concurrent execution requires some form of coordination

Fault-tolerance:I Any component can fail at any instant due to a software or a hardware

bug

Security:I One machine can compromise the entire system

Coordination:I No global time so non-trivial to coordinate

Trouble shooting:I Hard to trouble shoot because hard to reason about the system

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 6 / 29

Page 20: Topic 3: Large-scale Distributed Systems

Challenges

Concurrency:I Concurrent execution requires some form of coordination

Fault-tolerance:I Any component can fail at any instant due to a software or a hardware

bug

Security:I One machine can compromise the entire system

Coordination:I No global time so non-trivial to coordinate

Trouble shooting:I Hard to trouble shoot because hard to reason about the system

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 6 / 29

Page 21: Topic 3: Large-scale Distributed Systems

Challenges

Concurrency:I Concurrent execution requires some form of coordination

Fault-tolerance:I Any component can fail at any instant due to a software or a hardware

bug

Security:I One machine can compromise the entire system

Coordination:I No global time so non-trivial to coordinate

Trouble shooting:I Hard to trouble shoot because hard to reason about the system

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 6 / 29

Page 22: Topic 3: Large-scale Distributed Systems

Transparency

Distributed systems give the notion of a single machine or keep thedistribution transparent

The degree of this transparency can be mapped onto an entirespectrum of options for both users and programmersFor instance:

I A web user is aware of network communication but the number ofaccessed machines is transparent

Transparency can be ensured by middleware that adds a layer ofabstraction

Can span access, concurrency, failure, location, migration,persistence, relocation, replication

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 7 / 29

Page 23: Topic 3: Large-scale Distributed Systems

Transparency

Distributed systems give the notion of a single machine or keep thedistribution transparent

The degree of this transparency can be mapped onto an entirespectrum of options for both users and programmers

For instance:I A web user is aware of network communication but the number of

accessed machines is transparent

Transparency can be ensured by middleware that adds a layer ofabstraction

Can span access, concurrency, failure, location, migration,persistence, relocation, replication

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 7 / 29

Page 24: Topic 3: Large-scale Distributed Systems

Transparency

Distributed systems give the notion of a single machine or keep thedistribution transparent

The degree of this transparency can be mapped onto an entirespectrum of options for both users and programmersFor instance:

I A web user is aware of network communication but the number ofaccessed machines is transparent

Transparency can be ensured by middleware that adds a layer ofabstraction

Can span access, concurrency, failure, location, migration,persistence, relocation, replication

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 7 / 29

Page 25: Topic 3: Large-scale Distributed Systems

Transparency

Distributed systems give the notion of a single machine or keep thedistribution transparent

The degree of this transparency can be mapped onto an entirespectrum of options for both users and programmersFor instance:

I A web user is aware of network communication but the number ofaccessed machines is transparent

Transparency can be ensured by middleware that adds a layer ofabstraction

Can span access, concurrency, failure, location, migration,persistence, relocation, replication

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 7 / 29

Page 26: Topic 3: Large-scale Distributed Systems

Transparency

Distributed systems give the notion of a single machine or keep thedistribution transparent

The degree of this transparency can be mapped onto an entirespectrum of options for both users and programmersFor instance:

I A web user is aware of network communication but the number ofaccessed machines is transparent

Transparency can be ensured by middleware that adds a layer ofabstraction

Can span access, concurrency, failure, location, migration,persistence, relocation, replication

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 7 / 29

Page 27: Topic 3: Large-scale Distributed Systems

Outline

1 Introduction

2 Client-server Interaction

3 Characteristics

4 Message Passing Interface

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 8 / 29

Page 28: Topic 3: Large-scale Distributed Systems

Request-reply protocol

Standard operation1 Client sends request to the server

2 Server processes the request and sends a corresponding response

In the synchronous model, the client blocks till the response is received

In case of the asynchronous model, the client continues its executionFor instance: HTTP 1.0

1 Client sends GET /index.html2 Server responds with index.html3 Client renders index.html

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29

Page 29: Topic 3: Large-scale Distributed Systems

Request-reply protocol

Standard operation1 Client sends request to the server2 Server processes the request and sends a corresponding response

In the synchronous model, the client blocks till the response is received

In case of the asynchronous model, the client continues its executionFor instance: HTTP 1.0

1 Client sends GET /index.html2 Server responds with index.html3 Client renders index.html

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29

Page 30: Topic 3: Large-scale Distributed Systems

Request-reply protocol

Standard operation1 Client sends request to the server2 Server processes the request and sends a corresponding response

In the synchronous model, the client blocks till the response is received

In case of the asynchronous model, the client continues its executionFor instance: HTTP 1.0

1 Client sends GET /index.html2 Server responds with index.html3 Client renders index.html

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29

Page 31: Topic 3: Large-scale Distributed Systems

Request-reply protocol

Standard operation1 Client sends request to the server2 Server processes the request and sends a corresponding response

In the synchronous model, the client blocks till the response is received

In case of the asynchronous model, the client continues its execution

For instance: HTTP 1.01 Client sends GET /index.html2 Server responds with index.html3 Client renders index.html

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29

Page 32: Topic 3: Large-scale Distributed Systems

Request-reply protocol

Standard operation1 Client sends request to the server2 Server processes the request and sends a corresponding response

In the synchronous model, the client blocks till the response is received

In case of the asynchronous model, the client continues its executionFor instance: HTTP 1.0

1 Client sends GET /index.html2 Server responds with index.html3 Client renders index.html

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29

Page 33: Topic 3: Large-scale Distributed Systems

Request-reply protocol

Standard operation1 Client sends request to the server2 Server processes the request and sends a corresponding response

In the synchronous model, the client blocks till the response is received

In case of the asynchronous model, the client continues its executionFor instance: HTTP 1.0

1 Client sends GET /index.html

2 Server responds with index.html3 Client renders index.html

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29

Page 34: Topic 3: Large-scale Distributed Systems

Request-reply protocol

Standard operation1 Client sends request to the server2 Server processes the request and sends a corresponding response

In the synchronous model, the client blocks till the response is received

In case of the asynchronous model, the client continues its executionFor instance: HTTP 1.0

1 Client sends GET /index.html2 Server responds with index.html

3 Client renders index.html

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29

Page 35: Topic 3: Large-scale Distributed Systems

Request-reply protocol

Standard operation1 Client sends request to the server2 Server processes the request and sends a corresponding response

In the synchronous model, the client blocks till the response is received

In case of the asynchronous model, the client continues its executionFor instance: HTTP 1.0

1 Client sends GET /index.html2 Server responds with index.html3 Client renders index.html

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29

Page 36: Topic 3: Large-scale Distributed Systems

Errors and failures

Errors are handled at the application-level

I For instance, if the client requests a non-existent web page just return aspecial reply: 404 Not Found

Failures are system-level thingsI For instance, lost message, client/server crash, etc.

To handle failure, the client must timeout after TI The client can retry on a timeoutI Setting value of T is system-specific

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29

Page 37: Topic 3: Large-scale Distributed Systems

Errors and failures

Errors are handled at the application-levelI For instance, if the client requests a non-existent web page just return a

special reply: 404 Not Found

Failures are system-level thingsI For instance, lost message, client/server crash, etc.

To handle failure, the client must timeout after TI The client can retry on a timeoutI Setting value of T is system-specific

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29

Page 38: Topic 3: Large-scale Distributed Systems

Errors and failures

Errors are handled at the application-levelI For instance, if the client requests a non-existent web page just return a

special reply: 404 Not Found

Failures are system-level things

I For instance, lost message, client/server crash, etc.

To handle failure, the client must timeout after TI The client can retry on a timeoutI Setting value of T is system-specific

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29

Page 39: Topic 3: Large-scale Distributed Systems

Errors and failures

Errors are handled at the application-levelI For instance, if the client requests a non-existent web page just return a

special reply: 404 Not Found

Failures are system-level thingsI For instance, lost message, client/server crash, etc.

To handle failure, the client must timeout after TI The client can retry on a timeoutI Setting value of T is system-specific

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29

Page 40: Topic 3: Large-scale Distributed Systems

Errors and failures

Errors are handled at the application-levelI For instance, if the client requests a non-existent web page just return a

special reply: 404 Not Found

Failures are system-level thingsI For instance, lost message, client/server crash, etc.

To handle failure, the client must timeout after T

I The client can retry on a timeoutI Setting value of T is system-specific

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29

Page 41: Topic 3: Large-scale Distributed Systems

Errors and failures

Errors are handled at the application-levelI For instance, if the client requests a non-existent web page just return a

special reply: 404 Not Found

Failures are system-level thingsI For instance, lost message, client/server crash, etc.

To handle failure, the client must timeout after TI The client can retry on a timeout

I Setting value of T is system-specific

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29

Page 42: Topic 3: Large-scale Distributed Systems

Errors and failures

Errors are handled at the application-levelI For instance, if the client requests a non-existent web page just return a

special reply: 404 Not Found

Failures are system-level thingsI For instance, lost message, client/server crash, etc.

To handle failure, the client must timeout after TI The client can retry on a timeoutI Setting value of T is system-specific

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29

Page 43: Topic 3: Large-scale Distributed Systems

Remote Procedure Call

Request/response protocols are widely used but too low level

I Need to define each request separately including their network messagerepresentation

Remote procedure call (RPC) presents a simpler abstractionI Programmer invokes a procedure which executes on a remote machine

(the server)I RPC subsystem takes care of message formats, communication,

timeouts, etc.

Distribution of the system becomes transparent

Integrated with the programming language

RPC layer adds stubs at client end which when invoked execute amethod at the server

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29

Page 44: Topic 3: Large-scale Distributed Systems

Remote Procedure Call

Request/response protocols are widely used but too low levelI Need to define each request separately including their network message

representation

Remote procedure call (RPC) presents a simpler abstractionI Programmer invokes a procedure which executes on a remote machine

(the server)I RPC subsystem takes care of message formats, communication,

timeouts, etc.

Distribution of the system becomes transparent

Integrated with the programming language

RPC layer adds stubs at client end which when invoked execute amethod at the server

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29

Page 45: Topic 3: Large-scale Distributed Systems

Remote Procedure Call

Request/response protocols are widely used but too low levelI Need to define each request separately including their network message

representation

Remote procedure call (RPC) presents a simpler abstraction

I Programmer invokes a procedure which executes on a remote machine(the server)

I RPC subsystem takes care of message formats, communication,timeouts, etc.

Distribution of the system becomes transparent

Integrated with the programming language

RPC layer adds stubs at client end which when invoked execute amethod at the server

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29

Page 46: Topic 3: Large-scale Distributed Systems

Remote Procedure Call

Request/response protocols are widely used but too low levelI Need to define each request separately including their network message

representation

Remote procedure call (RPC) presents a simpler abstractionI Programmer invokes a procedure which executes on a remote machine

(the server)

I RPC subsystem takes care of message formats, communication,timeouts, etc.

Distribution of the system becomes transparent

Integrated with the programming language

RPC layer adds stubs at client end which when invoked execute amethod at the server

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29

Page 47: Topic 3: Large-scale Distributed Systems

Remote Procedure Call

Request/response protocols are widely used but too low levelI Need to define each request separately including their network message

representation

Remote procedure call (RPC) presents a simpler abstractionI Programmer invokes a procedure which executes on a remote machine

(the server)I RPC subsystem takes care of message formats, communication,

timeouts, etc.

Distribution of the system becomes transparent

Integrated with the programming language

RPC layer adds stubs at client end which when invoked execute amethod at the server

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29

Page 48: Topic 3: Large-scale Distributed Systems

Remote Procedure Call

Request/response protocols are widely used but too low levelI Need to define each request separately including their network message

representation

Remote procedure call (RPC) presents a simpler abstractionI Programmer invokes a procedure which executes on a remote machine

(the server)I RPC subsystem takes care of message formats, communication,

timeouts, etc.

Distribution of the system becomes transparent

Integrated with the programming language

RPC layer adds stubs at client end which when invoked execute amethod at the server

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29

Page 49: Topic 3: Large-scale Distributed Systems

Remote Procedure Call

Request/response protocols are widely used but too low levelI Need to define each request separately including their network message

representation

Remote procedure call (RPC) presents a simpler abstractionI Programmer invokes a procedure which executes on a remote machine

(the server)I RPC subsystem takes care of message formats, communication,

timeouts, etc.

Distribution of the system becomes transparent

Integrated with the programming language

RPC layer adds stubs at client end which when invoked execute amethod at the server

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29

Page 50: Topic 3: Large-scale Distributed Systems

Remote Procedure Call

Request/response protocols are widely used but too low levelI Need to define each request separately including their network message

representation

Remote procedure call (RPC) presents a simpler abstractionI Programmer invokes a procedure which executes on a remote machine

(the server)I RPC subsystem takes care of message formats, communication,

timeouts, etc.

Distribution of the system becomes transparent

Integrated with the programming language

RPC layer adds stubs at client end which when invoked execute amethod at the server

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29

Page 51: Topic 3: Large-scale Distributed Systems

Example: XML-RPC

XML is used to encode method invocations (method names,parameters, etc.)

HTTP POST used to send request and receive response (alsoencoded in XML)

Looks like a regular web session on wire so plays well withmiddleboxes

Language agnostic and extensible

Extended with more features (namespaces, user-defined types, etc.)and diverse transports (TCP, UDP, etc.) to result in Simple ObjectAccess Protocol (SOAP)

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 12 / 29

Page 52: Topic 3: Large-scale Distributed Systems

Example: XML-RPC

XML is used to encode method invocations (method names,parameters, etc.)

HTTP POST used to send request and receive response (alsoencoded in XML)

Looks like a regular web session on wire so plays well withmiddleboxes

Language agnostic and extensible

Extended with more features (namespaces, user-defined types, etc.)and diverse transports (TCP, UDP, etc.) to result in Simple ObjectAccess Protocol (SOAP)

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 12 / 29

Page 53: Topic 3: Large-scale Distributed Systems

Example: XML-RPC

XML is used to encode method invocations (method names,parameters, etc.)

HTTP POST used to send request and receive response (alsoencoded in XML)

Looks like a regular web session on wire so plays well withmiddleboxes

Language agnostic and extensible

Extended with more features (namespaces, user-defined types, etc.)and diverse transports (TCP, UDP, etc.) to result in Simple ObjectAccess Protocol (SOAP)

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 12 / 29

Page 54: Topic 3: Large-scale Distributed Systems

Example: XML-RPC

XML is used to encode method invocations (method names,parameters, etc.)

HTTP POST used to send request and receive response (alsoencoded in XML)

Looks like a regular web session on wire so plays well withmiddleboxes

Language agnostic and extensible

Extended with more features (namespaces, user-defined types, etc.)and diverse transports (TCP, UDP, etc.) to result in Simple ObjectAccess Protocol (SOAP)

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 12 / 29

Page 55: Topic 3: Large-scale Distributed Systems

Example: XML-RPC

XML is used to encode method invocations (method names,parameters, etc.)

HTTP POST used to send request and receive response (alsoencoded in XML)

Looks like a regular web session on wire so plays well withmiddleboxes

Language agnostic and extensible

Extended with more features (namespaces, user-defined types, etc.)and diverse transports (TCP, UDP, etc.) to result in Simple ObjectAccess Protocol (SOAP)

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 12 / 29

Page 56: Topic 3: Large-scale Distributed Systems

RPC shortcomings

RPC mechanisms are synchronous

I Client blocks till response is receivedI Poor responsiveness, especially in high latency networks

2006 ushered in the age of Asynchronous JavaScript with XML (AJAX)I Update web page without reloadingI For instance, Google Maps, Gmail, etc.

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29

Page 57: Topic 3: Large-scale Distributed Systems

RPC shortcomings

RPC mechanisms are synchronousI Client blocks till response is received

I Poor responsiveness, especially in high latency networks

2006 ushered in the age of Asynchronous JavaScript with XML (AJAX)I Update web page without reloadingI For instance, Google Maps, Gmail, etc.

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29

Page 58: Topic 3: Large-scale Distributed Systems

RPC shortcomings

RPC mechanisms are synchronousI Client blocks till response is receivedI Poor responsiveness, especially in high latency networks

2006 ushered in the age of Asynchronous JavaScript with XML (AJAX)I Update web page without reloadingI For instance, Google Maps, Gmail, etc.

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29

Page 59: Topic 3: Large-scale Distributed Systems

RPC shortcomings

RPC mechanisms are synchronousI Client blocks till response is receivedI Poor responsiveness, especially in high latency networks

2006 ushered in the age of Asynchronous JavaScript with XML (AJAX)

I Update web page without reloadingI For instance, Google Maps, Gmail, etc.

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29

Page 60: Topic 3: Large-scale Distributed Systems

RPC shortcomings

RPC mechanisms are synchronousI Client blocks till response is receivedI Poor responsiveness, especially in high latency networks

2006 ushered in the age of Asynchronous JavaScript with XML (AJAX)I Update web page without reloading

I For instance, Google Maps, Gmail, etc.

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29

Page 61: Topic 3: Large-scale Distributed Systems

RPC shortcomings

RPC mechanisms are synchronousI Client blocks till response is receivedI Poor responsiveness, especially in high latency networks

2006 ushered in the age of Asynchronous JavaScript with XML (AJAX)I Update web page without reloadingI For instance, Google Maps, Gmail, etc.

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29

Page 62: Topic 3: Large-scale Distributed Systems

Representational State Transfer

AJAX still revolves around RPC (just asynchronously)

Representational State Transfer (REST) offers an alternativeI All resources have a name: URL or URII Resources are manipulated with PUT, GET, POST, and DELETE

methodsI State is sent along with operations

Widely used these days (For instance, by Amazon, Twitter, etc.)

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 14 / 29

Page 63: Topic 3: Large-scale Distributed Systems

Representational State Transfer

AJAX still revolves around RPC (just asynchronously)Representational State Transfer (REST) offers an alternative

I All resources have a name: URL or URI

I Resources are manipulated with PUT, GET, POST, and DELETEmethods

I State is sent along with operations

Widely used these days (For instance, by Amazon, Twitter, etc.)

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 14 / 29

Page 64: Topic 3: Large-scale Distributed Systems

Representational State Transfer

AJAX still revolves around RPC (just asynchronously)Representational State Transfer (REST) offers an alternative

I All resources have a name: URL or URII Resources are manipulated with PUT, GET, POST, and DELETE

methods

I State is sent along with operations

Widely used these days (For instance, by Amazon, Twitter, etc.)

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 14 / 29

Page 65: Topic 3: Large-scale Distributed Systems

Representational State Transfer

AJAX still revolves around RPC (just asynchronously)Representational State Transfer (REST) offers an alternative

I All resources have a name: URL or URII Resources are manipulated with PUT, GET, POST, and DELETE

methodsI State is sent along with operations

Widely used these days (For instance, by Amazon, Twitter, etc.)

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 14 / 29

Page 66: Topic 3: Large-scale Distributed Systems

Representational State Transfer

AJAX still revolves around RPC (just asynchronously)Representational State Transfer (REST) offers an alternative

I All resources have a name: URL or URII Resources are manipulated with PUT, GET, POST, and DELETE

methodsI State is sent along with operations

Widely used these days (For instance, by Amazon, Twitter, etc.)

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 14 / 29

Page 67: Topic 3: Large-scale Distributed Systems

Outline

1 Introduction

2 Client-server Interaction

3 Characteristics

4 Message Passing Interface

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 15 / 29

Page 68: Topic 3: Large-scale Distributed Systems

Clocks

Distributed systems need to be able to:

I Order events produced by concurrent processesI Synchronize senders and receivers of messagesI Serialize concurrent accesses to shared objectsI Coordinate joint activity

Clocks are employed for this

But quartz oscillators oscillate at slightly different frequencies leadingto clock drift and resulting in clock skew between clocks

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29

Page 69: Topic 3: Large-scale Distributed Systems

Clocks

Distributed systems need to be able to:I Order events produced by concurrent processes

I Synchronize senders and receivers of messagesI Serialize concurrent accesses to shared objectsI Coordinate joint activity

Clocks are employed for this

But quartz oscillators oscillate at slightly different frequencies leadingto clock drift and resulting in clock skew between clocks

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29

Page 70: Topic 3: Large-scale Distributed Systems

Clocks

Distributed systems need to be able to:I Order events produced by concurrent processesI Synchronize senders and receivers of messages

I Serialize concurrent accesses to shared objectsI Coordinate joint activity

Clocks are employed for this

But quartz oscillators oscillate at slightly different frequencies leadingto clock drift and resulting in clock skew between clocks

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29

Page 71: Topic 3: Large-scale Distributed Systems

Clocks

Distributed systems need to be able to:I Order events produced by concurrent processesI Synchronize senders and receivers of messagesI Serialize concurrent accesses to shared objects

I Coordinate joint activity

Clocks are employed for this

But quartz oscillators oscillate at slightly different frequencies leadingto clock drift and resulting in clock skew between clocks

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29

Page 72: Topic 3: Large-scale Distributed Systems

Clocks

Distributed systems need to be able to:I Order events produced by concurrent processesI Synchronize senders and receivers of messagesI Serialize concurrent accesses to shared objectsI Coordinate joint activity

Clocks are employed for this

But quartz oscillators oscillate at slightly different frequencies leadingto clock drift and resulting in clock skew between clocks

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29

Page 73: Topic 3: Large-scale Distributed Systems

Clocks

Distributed systems need to be able to:I Order events produced by concurrent processesI Synchronize senders and receivers of messagesI Serialize concurrent accesses to shared objectsI Coordinate joint activity

Clocks are employed for this

But quartz oscillators oscillate at slightly different frequencies leadingto clock drift and resulting in clock skew between clocks

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29

Page 74: Topic 3: Large-scale Distributed Systems

Clocks

Distributed systems need to be able to:I Order events produced by concurrent processesI Synchronize senders and receivers of messagesI Serialize concurrent accesses to shared objectsI Coordinate joint activity

Clocks are employed for this

But quartz oscillators oscillate at slightly different frequencies leadingto clock drift and resulting in clock skew between clocks

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29

Page 75: Topic 3: Large-scale Distributed Systems

Clock synchronization

Clock synchronization algorithms try to minimize skew between a set ofclocks

I Decide upon a correct timeI Communicate to agree (compensating for delays)I Possibly multiple servers involved

In reality, still a 1-10ms skew after sync (but we can live with that)

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 17 / 29

Page 76: Topic 3: Large-scale Distributed Systems

Clock synchronization

Clock synchronization algorithms try to minimize skew between a set ofclocks

I Decide upon a correct time

I Communicate to agree (compensating for delays)I Possibly multiple servers involved

In reality, still a 1-10ms skew after sync (but we can live with that)

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 17 / 29

Page 77: Topic 3: Large-scale Distributed Systems

Clock synchronization

Clock synchronization algorithms try to minimize skew between a set ofclocks

I Decide upon a correct timeI Communicate to agree (compensating for delays)

I Possibly multiple servers involved

In reality, still a 1-10ms skew after sync (but we can live with that)

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 17 / 29

Page 78: Topic 3: Large-scale Distributed Systems

Clock synchronization

Clock synchronization algorithms try to minimize skew between a set ofclocks

I Decide upon a correct timeI Communicate to agree (compensating for delays)I Possibly multiple servers involved

In reality, still a 1-10ms skew after sync (but we can live with that)

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 17 / 29

Page 79: Topic 3: Large-scale Distributed Systems

Clock synchronization

Clock synchronization algorithms try to minimize skew between a set ofclocks

I Decide upon a correct timeI Communicate to agree (compensating for delays)I Possibly multiple servers involved

In reality, still a 1-10ms skew after sync (but we can live with that)

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 17 / 29

Page 80: Topic 3: Large-scale Distributed Systems

Ordering

Time is used to ensure ordering

I Withdraw money at 23:59.45I Bank calculates interest at 00:00.0I The withdraw money should not be included in the interest calculation

In most cases, only need to know that a happened before b, known asthe happens-before relation

Multiple algorithms exists to ensure the happens-before relation

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29

Page 81: Topic 3: Large-scale Distributed Systems

Ordering

Time is used to ensure orderingI Withdraw money at 23:59.45

I Bank calculates interest at 00:00.0I The withdraw money should not be included in the interest calculation

In most cases, only need to know that a happened before b, known asthe happens-before relation

Multiple algorithms exists to ensure the happens-before relation

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29

Page 82: Topic 3: Large-scale Distributed Systems

Ordering

Time is used to ensure orderingI Withdraw money at 23:59.45I Bank calculates interest at 00:00.0

I The withdraw money should not be included in the interest calculation

In most cases, only need to know that a happened before b, known asthe happens-before relation

Multiple algorithms exists to ensure the happens-before relation

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29

Page 83: Topic 3: Large-scale Distributed Systems

Ordering

Time is used to ensure orderingI Withdraw money at 23:59.45I Bank calculates interest at 00:00.0I The withdraw money should not be included in the interest calculation

In most cases, only need to know that a happened before b, known asthe happens-before relation

Multiple algorithms exists to ensure the happens-before relation

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29

Page 84: Topic 3: Large-scale Distributed Systems

Ordering

Time is used to ensure orderingI Withdraw money at 23:59.45I Bank calculates interest at 00:00.0I The withdraw money should not be included in the interest calculation

In most cases, only need to know that a happened before b, known asthe happens-before relation

Multiple algorithms exists to ensure the happens-before relation

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29

Page 85: Topic 3: Large-scale Distributed Systems

Ordering

Time is used to ensure orderingI Withdraw money at 23:59.45I Bank calculates interest at 00:00.0I The withdraw money should not be included in the interest calculation

In most cases, only need to know that a happened before b, known asthe happens-before relation

Multiple algorithms exists to ensure the happens-before relation

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29

Page 86: Topic 3: Large-scale Distributed Systems

Distributed Mutual Exclusion

Concurrent access to shared resources needs to be synchronized

Need hardware support on local machineI Locks, semaphores, etc.

But this support is not available across a distributed system

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 19 / 29

Page 87: Topic 3: Large-scale Distributed Systems

Distributed Mutual Exclusion

Concurrent access to shared resources needs to be synchronizedNeed hardware support on local machine

I Locks, semaphores, etc.

But this support is not available across a distributed system

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 19 / 29

Page 88: Topic 3: Large-scale Distributed Systems

Distributed Mutual Exclusion

Concurrent access to shared resources needs to be synchronizedNeed hardware support on local machine

I Locks, semaphores, etc.

But this support is not available across a distributed system

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 19 / 29

Page 89: Topic 3: Large-scale Distributed Systems

Distributed Mutual Exclusion

Concurrent access to shared resources needs to be synchronizedNeed hardware support on local machine

I Locks, semaphores, etc.

But this support is not available across a distributed system

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 19 / 29

Page 90: Topic 3: Large-scale Distributed Systems

Distributed Mutual Exclusion (2)

Multiple methods exist to ensure this:

Central lock server: All lock requests are handled by a central server

Token passing: Arrange nodes into a ring and a token is passedaround

Totally-ordered multicast: Clients multicast requests to each other

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 20 / 29

Page 91: Topic 3: Large-scale Distributed Systems

Distributed Mutual Exclusion (2)

Multiple methods exist to ensure this:

Central lock server: All lock requests are handled by a central server

Token passing: Arrange nodes into a ring and a token is passedaround

Totally-ordered multicast: Clients multicast requests to each other

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 20 / 29

Page 92: Topic 3: Large-scale Distributed Systems

Distributed Mutual Exclusion (2)

Multiple methods exist to ensure this:

Central lock server: All lock requests are handled by a central server

Token passing: Arrange nodes into a ring and a token is passedaround

Totally-ordered multicast: Clients multicast requests to each other

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 20 / 29

Page 93: Topic 3: Large-scale Distributed Systems

Consensus

Getting processes in a distributed system to agree on something

Requirements for correct solutionI Agreement: All nodes arrive at the same answerI Validity: Answer is one that was proposed by someoneI Termination: All nodes eventually decide

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 21 / 29

Page 94: Topic 3: Large-scale Distributed Systems

Consensus

Getting processes in a distributed system to agree on somethingRequirements for correct solution

I Agreement: All nodes arrive at the same answer

I Validity: Answer is one that was proposed by someoneI Termination: All nodes eventually decide

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 21 / 29

Page 95: Topic 3: Large-scale Distributed Systems

Consensus

Getting processes in a distributed system to agree on somethingRequirements for correct solution

I Agreement: All nodes arrive at the same answerI Validity: Answer is one that was proposed by someone

I Termination: All nodes eventually decide

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 21 / 29

Page 96: Topic 3: Large-scale Distributed Systems

Consensus

Getting processes in a distributed system to agree on somethingRequirements for correct solution

I Agreement: All nodes arrive at the same answerI Validity: Answer is one that was proposed by someoneI Termination: All nodes eventually decide

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 21 / 29

Page 97: Topic 3: Large-scale Distributed Systems

Distributed transactions

Composite operations (i.e. A collection of reads and updates to a set ofobjects)

A transaction is atomicI If it commits, all operations are appliedI If it aborts, no state mutation at all

Distributed transactions span multiple transaction processing serversI For instance, booking flights: Lahore -> Dubai -> New YorkI Need to book entire trip

Actions need to be coordinated across multiple parties

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29

Page 98: Topic 3: Large-scale Distributed Systems

Distributed transactions

Composite operations (i.e. A collection of reads and updates to a set ofobjects)A transaction is atomic

I If it commits, all operations are appliedI If it aborts, no state mutation at all

Distributed transactions span multiple transaction processing serversI For instance, booking flights: Lahore -> Dubai -> New YorkI Need to book entire trip

Actions need to be coordinated across multiple parties

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29

Page 99: Topic 3: Large-scale Distributed Systems

Distributed transactions

Composite operations (i.e. A collection of reads and updates to a set ofobjects)A transaction is atomic

I If it commits, all operations are applied

I If it aborts, no state mutation at all

Distributed transactions span multiple transaction processing serversI For instance, booking flights: Lahore -> Dubai -> New YorkI Need to book entire trip

Actions need to be coordinated across multiple parties

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29

Page 100: Topic 3: Large-scale Distributed Systems

Distributed transactions

Composite operations (i.e. A collection of reads and updates to a set ofobjects)A transaction is atomic

I If it commits, all operations are appliedI If it aborts, no state mutation at all

Distributed transactions span multiple transaction processing serversI For instance, booking flights: Lahore -> Dubai -> New YorkI Need to book entire trip

Actions need to be coordinated across multiple parties

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29

Page 101: Topic 3: Large-scale Distributed Systems

Distributed transactions

Composite operations (i.e. A collection of reads and updates to a set ofobjects)A transaction is atomic

I If it commits, all operations are appliedI If it aborts, no state mutation at all

Distributed transactions span multiple transaction processing servers

I For instance, booking flights: Lahore -> Dubai -> New YorkI Need to book entire trip

Actions need to be coordinated across multiple parties

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29

Page 102: Topic 3: Large-scale Distributed Systems

Distributed transactions

Composite operations (i.e. A collection of reads and updates to a set ofobjects)A transaction is atomic

I If it commits, all operations are appliedI If it aborts, no state mutation at all

Distributed transactions span multiple transaction processing serversI For instance, booking flights: Lahore -> Dubai -> New York

I Need to book entire trip

Actions need to be coordinated across multiple parties

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29

Page 103: Topic 3: Large-scale Distributed Systems

Distributed transactions

Composite operations (i.e. A collection of reads and updates to a set ofobjects)A transaction is atomic

I If it commits, all operations are appliedI If it aborts, no state mutation at all

Distributed transactions span multiple transaction processing serversI For instance, booking flights: Lahore -> Dubai -> New YorkI Need to book entire trip

Actions need to be coordinated across multiple parties

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29

Page 104: Topic 3: Large-scale Distributed Systems

Distributed transactions

Composite operations (i.e. A collection of reads and updates to a set ofobjects)A transaction is atomic

I If it commits, all operations are appliedI If it aborts, no state mutation at all

Distributed transactions span multiple transaction processing serversI For instance, booking flights: Lahore -> Dubai -> New YorkI Need to book entire trip

Actions need to be coordinated across multiple parties

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29

Page 105: Topic 3: Large-scale Distributed Systems

Replication

A number of distributed systems involve replication

I Data replication: Multiple copies of some object stored at differentservers

I Computation replication: Multiple servers capable of providing anoperation

Advantages:1 Load balancing: Work spread out across clients2 Lower latency: Better performance if replica close to the client3 Fault tolerance: Failure of some replicas can be tolerated

Examples: DNS, content distribution networks, database replication,etc.

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29

Page 106: Topic 3: Large-scale Distributed Systems

Replication

A number of distributed systems involve replicationI Data replication: Multiple copies of some object stored at different

servers

I Computation replication: Multiple servers capable of providing anoperation

Advantages:1 Load balancing: Work spread out across clients2 Lower latency: Better performance if replica close to the client3 Fault tolerance: Failure of some replicas can be tolerated

Examples: DNS, content distribution networks, database replication,etc.

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29

Page 107: Topic 3: Large-scale Distributed Systems

Replication

A number of distributed systems involve replicationI Data replication: Multiple copies of some object stored at different

serversI Computation replication: Multiple servers capable of providing an

operation

Advantages:1 Load balancing: Work spread out across clients2 Lower latency: Better performance if replica close to the client3 Fault tolerance: Failure of some replicas can be tolerated

Examples: DNS, content distribution networks, database replication,etc.

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29

Page 108: Topic 3: Large-scale Distributed Systems

Replication

A number of distributed systems involve replicationI Data replication: Multiple copies of some object stored at different

serversI Computation replication: Multiple servers capable of providing an

operation

Advantages:1 Load balancing: Work spread out across clients

2 Lower latency: Better performance if replica close to the client3 Fault tolerance: Failure of some replicas can be tolerated

Examples: DNS, content distribution networks, database replication,etc.

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29

Page 109: Topic 3: Large-scale Distributed Systems

Replication

A number of distributed systems involve replicationI Data replication: Multiple copies of some object stored at different

serversI Computation replication: Multiple servers capable of providing an

operation

Advantages:1 Load balancing: Work spread out across clients2 Lower latency: Better performance if replica close to the client

3 Fault tolerance: Failure of some replicas can be tolerated

Examples: DNS, content distribution networks, database replication,etc.

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29

Page 110: Topic 3: Large-scale Distributed Systems

Replication

A number of distributed systems involve replicationI Data replication: Multiple copies of some object stored at different

serversI Computation replication: Multiple servers capable of providing an

operation

Advantages:1 Load balancing: Work spread out across clients2 Lower latency: Better performance if replica close to the client3 Fault tolerance: Failure of some replicas can be tolerated

Examples: DNS, content distribution networks, database replication,etc.

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29

Page 111: Topic 3: Large-scale Distributed Systems

Replication

A number of distributed systems involve replicationI Data replication: Multiple copies of some object stored at different

serversI Computation replication: Multiple servers capable of providing an

operation

Advantages:1 Load balancing: Work spread out across clients2 Lower latency: Better performance if replica close to the client3 Fault tolerance: Failure of some replicas can be tolerated

Examples: DNS, content distribution networks, database replication,etc.

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29

Page 112: Topic 3: Large-scale Distributed Systems

CAP

CAP:1 Consistency: All nodes see the same state

2 Availability: All requests get a response3 Partitioning: System continues to operate even in the face of node

failure

Brewer’s conjecture states that in a distributed system only 2 out of 3possible

In the current setup, partitioning is a given: Hardware/software fails allthe time

Therefore, systems need to choose between consistency andavailability

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29

Page 113: Topic 3: Large-scale Distributed Systems

CAP

CAP:1 Consistency: All nodes see the same state2 Availability: All requests get a response

3 Partitioning: System continues to operate even in the face of nodefailure

Brewer’s conjecture states that in a distributed system only 2 out of 3possible

In the current setup, partitioning is a given: Hardware/software fails allthe time

Therefore, systems need to choose between consistency andavailability

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29

Page 114: Topic 3: Large-scale Distributed Systems

CAP

CAP:1 Consistency: All nodes see the same state2 Availability: All requests get a response3 Partitioning: System continues to operate even in the face of node

failure

Brewer’s conjecture states that in a distributed system only 2 out of 3possible

In the current setup, partitioning is a given: Hardware/software fails allthe time

Therefore, systems need to choose between consistency andavailability

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29

Page 115: Topic 3: Large-scale Distributed Systems

CAP

CAP:1 Consistency: All nodes see the same state2 Availability: All requests get a response3 Partitioning: System continues to operate even in the face of node

failure

Brewer’s conjecture states that in a distributed system only 2 out of 3possible

In the current setup, partitioning is a given: Hardware/software fails allthe time

Therefore, systems need to choose between consistency andavailability

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29

Page 116: Topic 3: Large-scale Distributed Systems

CAP

CAP:1 Consistency: All nodes see the same state2 Availability: All requests get a response3 Partitioning: System continues to operate even in the face of node

failure

Brewer’s conjecture states that in a distributed system only 2 out of 3possible

In the current setup, partitioning is a given: Hardware/software fails allthe time

Therefore, systems need to choose between consistency andavailability

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29

Page 117: Topic 3: Large-scale Distributed Systems

CAP

CAP:1 Consistency: All nodes see the same state2 Availability: All requests get a response3 Partitioning: System continues to operate even in the face of node

failure

Brewer’s conjecture states that in a distributed system only 2 out of 3possible

In the current setup, partitioning is a given: Hardware/software fails allthe time

Therefore, systems need to choose between consistency andavailability

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29

Page 118: Topic 3: Large-scale Distributed Systems

References

George Coulouris, Jean Dollimore, Tim Kindberg, and Gordon Blair.2011. Distributed Systems: Concepts and Design (5th ed.).Addison-Wesley Publishing Company, USA.

Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 25 / 29