Distributed Systems CS 15-440

51
Distributed Systems CS 15-440 Distributed System Architecture and Introduction to Networking Lecture 3, Sep 12, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud

description

Distributed Systems CS 15-440. Distributed System Architecture and Introduction to Networking Lecture 3, Sep 12, 2011 Majd F. Sakr, Vinay Kolar , Mohammad Hammoud. Today…. Last Session: Trends and challenges in Distributed Systems Today’s session: - PowerPoint PPT Presentation

Transcript of Distributed Systems CS 15-440

Page 1: Distributed Systems CS 15-440

Distributed SystemsCS 15-440

Distributed System Architecture

and

Introduction to Networking

Lecture 3, Sep 12, 2011

Majd F. Sakr, Vinay Kolar, Mohammad Hammoud

Page 2: Distributed Systems CS 15-440

Today…

Last Session: Trends and challenges in Distributed Systems

Today’s session: Part I: Distributed System Architectures Part II: Introduction to Networking

Announcements: Project 1 design report is due on Wednesday at midnight

Project handout has been updated with a “design deliverable” section

Page 3: Distributed Systems CS 15-440

Part I

Distributed Systems Architecture

Page 4: Distributed Systems CS 15-440

A Distributed System

A distributed system is simply a collection of hardware or software components that communicate to solve a complex problem

Each component performs a “task”

Communication mechanism

Componentsperforming

tasks

Page 5: Distributed Systems CS 15-440

Bird’s eye view of some Distributed Systems

How would one classify these distributed systems?

Google Server

Search Client 1

Search Client 2

Search Client 3

Req

uest

Res

pons

e

Expedia

Peer 1

Peer 2

Peer 3

Peer 4

Google Search Bit-torrent

Reservation Client 1

Reservation Client 2

Reservation Client 3

Airline Booking Skype

Page 6: Distributed Systems CS 15-440

Classification of Distributed Systems

What are the entities that are communicating in a DS?a) Communicating entities

How do the entities communicate?b) Communication paradigms

What roles and responsibilities do they have?c) Roles and responsibilities

How are they mapped to the physical distributed infrastructure?d) Placement of entities

Page 7: Distributed Systems CS 15-440

Classification of Distributed Systems

What are the entities that are communicating in a DS?a) Communicating entities

How do the entities communicate?b) Communication paradigms

What roles and responsibilities do they have?c) Roles and responsibilities

How are they mapped to the physical distributed infrastructure?d) Placement of entities

Page 8: Distributed Systems CS 15-440

Communicating Entities

What entities are communicating in a DS?System-oriented entities

Processes

Threads

Nodes

Problem-oriented entitiesObjects (in object-oriented programming based approaches)

Page 9: Distributed Systems CS 15-440

Classification of Distributed Systems

What are the entities that are communicating in a DS?a) Communicating entities

How do the entities communicate?b) Communication paradigms

What roles and responsibilities do they have?c) Roles and responsibilities

How are they mapped to the physical distributed infrastructure?d) Placement of entities

Page 10: Distributed Systems CS 15-440

Communication Paradigms

Three types of communication paradigmsInter-Process Communication (IPC)

Remote Invocation

Indirect Communication

Internet ProtocolsInternet Protocols

IPC Primitives

Remote Invocation, Indirect Communication

Applications, ServicesApplications, Services

Middleware layers

Page 11: Distributed Systems CS 15-440

Inter-Process Communication (IPC)Relatively low-level support for communication

e.g., Direct access to internet protocols (Socket API)

AdvantagesEnables seamless communication between processes on heterogeneous operating systems

Well-known and tested API adopted across multiple operating systems

DisadvantagesIncreased programming effort for application developers

Socket programming: Programmer has to explicitly write code for communication (in addition to program logic)

Space Coupling (Identity is known in advance): Sender should know receiver’s ID (e.g., IP Address, port)

Time Coupling: Receiver should be explicitly listening to the communication from the sender

Page 12: Distributed Systems CS 15-440

Remote Invocation

An entity runs a procedure that typically executes on an another computer without the programmer explicitly coding the details for this remote interaction

A middleware layer will take care of the raw-communication

ExamplesRemote Procedure Call (RPC) – Sun’s RPC (ONC RPC)

Remote Method Invocation (RMI) – Java RMI

Page 13: Distributed Systems CS 15-440

Remote Invocation

Advantages:Programmer does not have to write code for socket communication

Disadvantages:

Space Coupling: Where the procedure resides should beknown in advance

Time Coupling: On the receiver, a process should be explicitly waiting to accept requests for procedure calls

Page 14: Distributed Systems CS 15-440

Space and Time Coupling in RPC and RMI

doOperation..

(wait)..

(continuation)

getRequest.

Select operation

.Execute operation

.Send reply

Request Message

Reply Message

The sender knows the Identity of the receiver (space coupling)

Time Coupling

Sender Receiver

RMI strongly resembles RPC but in a world of distributed objects

Page 15: Distributed Systems CS 15-440

Indirect Communication ParadigmIndirect communication uses middleware to:

Provide one-to-many communication

Some mechanisms eliminate space and time couplingSender and receiver do not need to know each other’s identities

Sender and receiver need not be explicitly listening to communicate

Approach used: IndirectionSender A middle-man Receiver

Types of indirect communication1. Group communication

2. Publish-subscribe

3. Message queues

Page 16: Distributed Systems CS 15-440

1. Group Communication

One-to-many communicationMulticast communication

Abstraction of a groupGroup is represented in the system by a groupId

Recipients join the group

A sender sends a message to the group which is received by all the recipients

Sender

Recv 2 Recv 1

Recv 3

Page 17: Distributed Systems CS 15-440

1. Group Communication (cont’d)

Services provided by middlewareGroup membership

Handling the failure of one or more group members

AdvantagesEnables one-to-many communication

Efficient use of bandwidth

Identity of the group members need not be available at all nodes

DisadvantagesTime coupling

Page 18: Distributed Systems CS 15-440

2. Publish-SubscribeAn event-based communication mechanism

Publishers publish events to an event service

Subscribers express interest in particular events

Large number of producers distribute information to large number of consumers

Publish-subscribe Event Service

Publish (Event1)

Publish (Event2)

Publish (Event3)

Subscribe (Event1, Event2)

Subscribe (Event1)

Subscribe (Event3)

Publishers Subscribers

Page 19: Distributed Systems CS 15-440

2. Publish-Subscribe (cont’d)

Example: Financial trading

Dealer process

Page 20: Distributed Systems CS 15-440

3. Message Queues

A refinement of Publish-Subscribe where Producers deposit the messages in a queue

Messages are delivered to consumers through different methods

Queue takes care of ensuring message delivery

AdvantagesEnables space decoupling

Enables time decoupling

Page 21: Distributed Systems CS 15-440

Recap: Communication Entities and Paradigms

Communicating entities(what is communicating)

System-oriented

Problem-oriented

• Nodes• Processes• Threads

• Objects

Communication Paradigms (how they communicate)

IPC Remote Invocation

Indirect Communication

• Sockets • RPC• RMI

• Group communication• Publish-subscribe• Message queues

Page 22: Distributed Systems CS 15-440

Classification of Distributed Systems

What are the entities that are communicating in a DS?a) Communicating entities

How do the entities communicate?b) Communication paradigms

What roles and responsibilities do they have?c) Roles and responsibilities

How are they mapped to the physical distributed infrastructure?d) Placement of entities

Page 23: Distributed Systems CS 15-440

Roles and Responsibilities

In DS, communicating entities take on roles to perform tasks

Roles are fundamental in establishing overall architectureQuestion: Does your smart-phone perform the same role as Google Search Server?

We classify DS architectures into two types based on the roles and responsibilities of the entities

Client-Server

Peer-to-Peer

Page 24: Distributed Systems CS 15-440

Client-Server Architecture

Approach:Server provides a service that is needed by a client

Client requests to a server (invocation), the server serves (result)

Widely used in many systemse.g., DNS, Web-servers

Client

Client

Server

Server

Page 25: Distributed Systems CS 15-440

Client-Server Architecture: Pros and Cons

Advantages:Simplicity and centralized control

Computation-heavy processing can be offloaded to a powerful server, Clients can be “thin”

DisadvantagesSingle-point of failure at server

Scalability

Page 26: Distributed Systems CS 15-440

Peer to Peer (P2P) Architecture

In P2P, roles of all entities are identicalAll nodes are peers

Peers are equally privileged participants in the application

e.g.: Napster, Bit-torrent, Skype

Page 27: Distributed Systems CS 15-440

Peer to Peer Architecture

Example: Downloading files from bit-torrent

Peer 2

Peer 6

Peer 1

Peer 3

Peer 4

Peer 5

Peer 1 wants a file. Parts of the file is present at peers 3,4 and 5

Peer 3 wants a file stored at peers 1,2 and 6

Page 28: Distributed Systems CS 15-440

Architectural Patterns

Primitive architectural elements can be combined to form various patterns

Tiered Architecture

Layering

Tiered architecture and layering are complementaryLayering = vertical organization of services

Tiered Architecture = horizontal splitting of services

Page 29: Distributed Systems CS 15-440

Tiered Architecture

A technique to:1. Organize the functionality of a service, and

2. Place the functionality into appropriate servers

Airline Search Application

Get user Input

Get data from

database

Rank the offers

ServerServerServerServer ServerServer

Display UI screen

ClientClient

Page 30: Distributed Systems CS 15-440

Application and data

management

Application and data

management

A Two-Tiered Architecture

User view, controls and

data manipulation

Application and data

management

Application and data

management

Application and data

management

Application and data

management

User view, controls and

data manipulation

Personal computer or mobile devices Server

Page 31: Distributed Systems CS 15-440

How do you design an airline search application:

Organize functionality of a given layer

A Three-Tiered Architecture

EXPEDIA Airline Search Application

Rank the offers

Display result to

user

Get user Input

Display user input

screenAirline

Database

Tier 1 Tier 2 Tier 3

Page 32: Distributed Systems CS 15-440

Application logic

Application logic

A Three-Tiered Architecture

Database managerDatabase manager

User view, and

controls

Application logic

Application logic

User view, and

control

Personal computer or mobile devices

Database Server

Application Server

Page 33: Distributed Systems CS 15-440

LayeringA complex system is partitioned into layers

Upper layer utilizes the services of the lower layer

A vertical organization of services

Layering simplifies design of complex distributed systems by hiding the complexity of below layers

Control flows from layer to layer

Layer 1

Layer 3

Requestflow

Responseflow

Page 34: Distributed Systems CS 15-440

Layering – Platform and middleware

Distributed Systems can be organized into three layers

1. PlatformLow-level hardware and software layers

Provides common services for higher layers

2. MiddlewareMask heterogeneity and provide convenient programming models to application programmers

Typically, it simplifies application programming by abstracting communication mechanisms

3. Applications

Operating system

Applications

Page 35: Distributed Systems CS 15-440

Classification of Distributed Systems

What are the entities that are communicating in a DS?a) Communicating entities

How do the entities communicate?b) Communication paradigms

What roles and responsibilities do they have?c) Roles and responsibilities

How are they mapped to the physical distributed infrastructure?d) Placement of entities

Page 36: Distributed Systems CS 15-440

Placement

Observation:A large number of heterogonous hardware (machines, network).

Smart mapping of entities (processes, objects) to hardware helps performance, security and fault-tolerance.

“Placement” maps entities to underlying physical distributed infrastructure.

Placement should be decided after a careful study of application characteristics

Example strategies:Mapping services to multiple servers

Moving the mobile code to the client

Page 37: Distributed Systems CS 15-440

Placement

Hi-performance Server

Hi-performance Server DesktopDesktop Smart-phoneSmart-phone

User-interface

Web search indexing

Mobile Code

Entities

Physical infrastructure

Page 38: Distributed Systems CS 15-440

Recap

So far, we have covered primitive architectural elements

Communicating entities

Communication paradigms of entitiesIPC, RMI, RPC, Indirect Communication

Roles and responsibilities that entities assume, and resulting architectures

Client-Server, Peer-to-Peer, Hybrid

Placement of entities

Page 39: Distributed Systems CS 15-440

Part II

Introduction to Networking

Page 40: Distributed Systems CS 15-440

Introduction to Networking – Learning objectives

You will identify how computers over Internet communicate.

After today’s classYou will be able to identify different types of networks

After the next classDescribe networking principles such as layering, encapsulation and packet-switching

Examine how packets get routed and how congestion is avoided

Analyze scalability, reliability and fault-tolerance of Internet

Page 41: Distributed Systems CS 15-440

Networks in Distributed Systems

Distributed System is simply a collection of components that communicate to solve a problem

Why should distributed systems programmers know about networks?

Networking issues severely affect performance, fault-tolerance and security in Distributed Systems.

e.g., Gmail outage on Sep 1, 2010 – Google Spokesman said “we had slightly underestimated the load which some recent changes placed on the request routers. … . few of the request routers became overloaded… causing a few more of them to also become overloaded, and within minutes nearly all of the request routers were overloaded.”

Page 42: Distributed Systems CS 15-440

Networks in Distributed Systems

Networking Issue Comments on Distributed System design

Affects latency and data-transfer-rate of messages.

Size of Internet is increasing. Expect greater traffic in future.

Detect communication errors and perform error-checks at the application layer

Install firewalls. Deploy end-to-end authentication, privacy and security modules.

Expect intermittent connection for mobile devices.

Internet is best-effort. It is hard to ensure strict QoS guarantees for, say, multimedia messages.

Performance

Scalability

Reliability

Security

Mobility

Quality-of-service

Page 43: Distributed Systems CS 15-440

Network Classification

Important ways to classify networks1. Based on size

Body Area Networks (BAN)

Personal Area Networks (PAN)

Local Area Networks (LAN)

Wide Area Networks (WAN)

2. Based on technologyEthernet Networks

Wireless Networks

Cellular Networks

Page 44: Distributed Systems CS 15-440

Network classification – BANs and PANs

Page 45: Distributed Systems CS 15-440

Network Classification – LANComputers connected by single communication medium

e.g., twisted copper wire, optical fiber

High data-transfer-rate and low latency

LAN consists of

1.SegmentUsually within a department/floor of a building

Shared bandwidth, no routing necessary

2.Local NetworksServes campus/office building

Many segments connected by a switch/hub

Typically, represents a network within an organization

Page 46: Distributed Systems CS 15-440

Network classification – WAN

Generally covers a wider area (cities, countries,...)

Consists of networks of different organizations

Traffic is routed from one organization to anotherRouters

Bandwidth and delayVaries

Worse than a LAN

Largest WAN = Internet

Page 47: Distributed Systems CS 15-440

Brief Summary of Important Networks (Based on Size)

A Segment

A Network

Page 48: Distributed Systems CS 15-440

Types of Networks – Based on Technology

Ethernet NetworksPredominantly used in the wired Internet

Wireless LANsPrimarily designed to provide

wireless access to the Internet

Low-range (100s of m), High-bandwidth

Cellular networks (2G/3G)Initially, designed to carry voice

Large range (few kms)

Low-bandwidth

Page 49: Distributed Systems CS 15-440

Typical Performance for Different Types of Networks

Network Example Range Bandwidth (Mbps)

Latency (ms)

Wired LAN Ethernet 1-2 km 10 – 10,000 1 – 10

Wired WAN Internet Worldwide 0.5 – 600 100 – 500

Wireless PAN Bluetooth 10 – 30 m 0.5 – 2 5 – 20

Wireless LAN WiFi 0.15 – 1.5 km 11 – 108 5 – 20

Cellular 2G – GSM 100m – 20 km 0.270 – 1.5 5

Modern Cellular

3G 1 – 5 km 348 – 14.4 100 – 500

Page 50: Distributed Systems CS 15-440

Next Class

Describe networking principles such as layering, encapsulation and packet-switching

Examine how packets get routed and how congestion is avoided

Analyze scalability, reliability and fault-tolerance of Internet

Page 51: Distributed Systems CS 15-440

Referenceshttp://en.wikipedia.org/wiki/Remote_procedure_call

http://www.generalsoftwares.co.uk/remote-services.html

http://gmailblog.blogspot.com/2009/09/more-on-todays-gmail-issue.html

http://innovation4u.wordpress.com/2010/08/17/why-we-dont-share-stuff/

http://essentiawhipsfloggers.wordpress.com/2010/05/08/waiting-times-queue-jumping/

http://www.cdk5.net/