Distributed System Building Blocks

OutlineDistributed Programming Paradigm◦ Shared Memory Programming◦ Message Passing Interface

Networking

Remote Procedure Call

Distributed Programming Paradigm

Based on the memory architecture, the programming paradigm can be roughly categorizes to different classes

Shared Memory Programming◦ The processing units share the same memory

space

Message Passing Interface Programming◦ There is no shared memory among multiple

processing units thus the processing units can only communicate sending and receiving messages

Parallel Processing IdeaSerialized processing with context switch

Parallel processing

Task 1Task 2Task 1Task 2

Shared Memory Programming ModelMultiple processing units connect to the shared memory and have the same memory address space. All the processing units can see the virtually the same memory.

Memory Buses

Processing Unit

Processing Unit

Processing Unit……

Memory Unit Memory Unit Memory Unit……

Single shared memory address space

Multi-thread ProgrammingShared memory multi-thread programming is the standard for single machine programming. It can harness the full power of multicore architecture (with careful programming). The programming model is quite simple but it is hard to program correctly and efficiently.◦ Windows: WinThread◦ Linux: pthread◦ Scientific Computing: OpenMP

We will see some examples of pthread and OpenMP programming and use the examples to show some important concepts while building our distributed systems.

Process and Thread

Thread creation and terminationCreate a thread by providing the entry of the thread (a function)◦ pthread_create(thread, attr, start_routine, arg)

Wait a thread to finish. This is a special kind of thread synchronization.◦ pthread_join

Quit the execution of a thread◦ pthread_exit

Once a thread is created, they are peers and independent.

pthreadcreate.c

Thread synchronizationAs threads share the same memory address space, it is dangerous that if a shared resources are accessed simultaneously. If this happens, the behavior of the program will not be defined. Thus, we need the mechanisms to synchronize the access to the shared resources. Commonly, this can be achieved through locks.

Another synchronization happens when you want to define the order of instruction flow in different threads. As every two threads are independent, some synchronization should be used to implement this.

Mutual ExclusionProviding mutual exclusion access to shared resources.

Shared Resources

Thread Thread

Thread

Thread

pthread Mutual Exclusion Mutex is an abbreviation for "mutual exclusion". Mutex variables are one of the primary means of implementing thread synchronization and for protecting shared data when multiple writes occur. Only one thread can lock (or own) a mutex variable at any given time.

pthread_mutex_init

pthread_mutex_destroy

pthread_mutex_lock

pthread_mutex_unlock

pthreadmutex.c

Define execution order among different threadsIt is quite common that events are used as a mechanism for defining the execution order among different portions of codes located in multiple threads.

Event means the execution of some threads will not continue until something has happened. Another thread will make the thing happen.

Notifythread1

thread2

Waiting

Working

pthread Conditional VariablesCondition variables allow threads to synchronize based upon the actual value of data.

pthread_cond_int(condition,attr)

pthread_cond_destroy(condition)

pthread_condition_wait(condition, mutex)

pthread_condition_signal(condition)

pthread_condition_broadcast(condition)

pthreadcondition.c

//thread 1

for (int i = 0; i<50;i++)

global_counter+=i;

//thread 2

for (int i = 50; i<=100;i++)

global_counter += i;

Race conditionTwo thread access the shared resources without synchronization. The behavior of race condition is undefined and might bring some undesirable results.

int global_counter=0;

What will be the final result of global_couter after these two code blocks finished?

//thread 1

lock(A)

lock(B)

do_something()

unlock(B)

unlock(A)

//thread 2

lock(B)

lock(A)

do_someotherthings()

unlock(A)

unlock(B)

Dead lock

Deadlock on the road

Live lockOnly request for lock but do nothing useful.

while (true){

Lock L1

if (!Lock L2)

Release(L1)

else

break;

}

do something useful here

while (true){

Lock L2

if (!Lock L1)

Release(L2)

else

break;

}

do something useful here

Message Passing InterfaceWith the large number of computing nodes, it is very difficult to build a single shared memory space for the processing units. Thus, process can exchange information by sending/receiving messages.

MPI is the de-facto standard for programming in the cluster environment for scientific computing.

MPI ProgramsEach process has its own stack and code segment. Processes exchange information by passing messages. Support both SPMD and MPMD computing.

process 0 process 1 process 2

Load

Process

Gather

Store

SPMD Program

MPI supports MPMD(a) MPMDMaster/Worker

(b) MPMDCoupled Analysis

Node 1

Node 2

Node 3

Node 1

Node 2

Node 3

prog_a

prog_b

prog_a

prog_b

prog_c

(c) MPMDStreamline

prog_a prog_b prog_c

Node 1 Node 2 Node 3

Create the MPI world#include <stdio.h>#include "mpi.h" int main （ int argc, char *argv[] ） {

int rank;int size;

MPI_Init （ argc, argv ） ; MPI_Comm_rank （ MPI_COMM_WORLD, &rank ） ; MPI_Comm_size （ MPI_COMM_WORLD, &size ） ; printf （ "Hello world from process %d of %d\n", rank, size ） ; MPI_Finalize （） ; return 0;}

Hello world from process 0 of 4 Hello world from process 1 of 4 Hello world from process 2 of 4 Hello world from process 3 of 4

MPI Basic: Point to Point Communicationint MPI_SEND(buf, count, datatype, dest, tag, comm)

int MPI_RECV(buf,count,datatype,source,tag,comm,status)

What parameter make the communication happen?◦ The buffer of sender or receiver◦ quantity of data, count◦ data type◦ source and destination◦ tag◦ the communicators and groups

Group SynchronizationMPI_Barrier(comm)◦ Creates a barrier synchronization in a group. Each task, when reaching the

MPI_Barrier call, blocks until all tasks in the group reach the same MPI_Barrier call.

Broadcastdata

broadcast

(a)

A0

A0

A0

A0

A0

A0

A0

processes

Scatter and Gather

A1

A4

A0

A5

A2 A3

scatter

gather

(b)

A1 A4 A0 A5 A2 A3

Gather: many to oneScatter: one to many

process

data

Allgather

allgather

B0

E0

A0

F0

C0 D0

B0 E0 A0 F0 C0 D0

B0 E0 A0 F0 C0 D0

B0 E0 A0 F0 C0 D0

B0 E0 A0 F0 C0 D0

B0 E0 A0 F0 C0 D0

B0 E0 A0 F0 C0 D0

data

processes

Alltoall

A1

A4

A0

A5

A2

A3

alltoall

(d)

A1 A4A0 A5A2 A3

B1 B4B0 B5B2 B3

C1 C4C0 C5C2 C3

D1 D4D0 D5D2 D3

E1 E4E0 E5E2 E3

F1 F4F0 D5F2 F3

B1

B4

B0

B5

B2

B3

C1

C4

C0

C5

C2

C3

D1

D4

D0

D5

D2

D3

E1

E4

E0

E5

E2

E3

F1

F4

F0

D5

F2

F3

process

data

Network basics

IP, TCP, DNS

Socket

Protocols

programming structures

TCP/IP, DNSIP: The Internet Protocol (IP) is the principal communications protocol used for relaying datagrams (also known as network packets) across an internetwork using the Internet Protocol Suite responsible for routing packets across network boundaries. (Routing, finding a specified destination on the Internet)

TCP: TCP provides reliable, ordered delivery of a stream of octets from a program on one computer to another program on another computer.

DNS: You can not remember something like 74.125.128.100 (IP address) easily. But you can remember www.google.com easily. DNS is just the system to translate the name of a machine to the IP address.

http://www.google.com/

What makes two process talk?The address of two machines and the identification of two process in each machine.

Source IP, Destination IP: the address of two machines

Source Port, Destination Port: the identification of two processes in two machines

So, a connection is identified by following four parameters:◦ source IP◦ destination IP◦ source Port◦ destination Port

SocketSock is the fundamental programming abstraction for communication between two processes on two different machines. (It is OK to use socket for communication in the same machine, however this is not the typical usage for inter process communication between two processes on the same machine.)

Client and server use two different types of sockets:◦ Client creates a client-side socket and make a connect call to the server

socket. After successfully connected, the client can start sending data to the server.

◦ Server creates a server-side socket and often listen on this socket waiting for the client. If a connection request package is received, the server will then make a new socket to accept the connect and start communication. The original socket can be used waiting for other connections.

PortsAs mentioned before, ports are used to identify a specific process within a machine(with an IP address).

Using different source ports allows multiple clients to connect to a server at once.

Example: Web Server (1/3)

33

1) Server creates a socket attached to port 80

80

The server creates a listener socket attached to a specific port. 80 is the agreed-upon port number for web traffic.


34

The client-side socket have to use a source port, but the OS chooses a random unused port number

When the client requests a URL (e.g., “www.google.com”), its OS uses DNS system to find its IP address.

2) Client creates a socket and connects to host

80Connect: 66.102.7.99 : 80

(anon)


35

Listener is ready for more incoming connections, while current connection can processed in parallel.

3) Server accepts connection, gets new socket for client

80

(anon) (anon)

4) Data flows across connected socket as a “stream”, just like a file

Example: Web Server

36

The network packetData transfer over Internet by using data packets.

Packets wrapper various information that is used for different usage. For example, addresses are used for routing, serial number and size are used for stream control.

Your data can be considered as payload in the packet which looks like a letter inside an envelop.

Your dataTCP headerIP header

You should know that here are some lower level of protocols for interoperate with physical devices such as MAC layer for Ethernet or using 802.11 wireless.

IP: the Internet ProtocolIP mainly focuses on how to find a machine on the Internet. Thus, IP define the addressing schema of machines.

IP packet encapsulate the upper layer protocol information as well as the data that provided by the applications.

IP protocol does not provide reliability. IP protocol just includes enough information for the data to tell the routers where is the destination for the data carried in the packet.

TCP: Transport Control TCP is built on top of IP.

TCP provides a virtual line between two ends. The data is stream oriented instead of message (packet) orientied.

TCP provides the reliability and ordering of messages.

TCP is very important basic building block for upper layer protocol. For example, HTTP is built on top of TCP.

You and the web you want to accessNot actually tube-like “underneath the hood”

Unlike phone system (circuit switched), the packet switched Internet uses many routes at once

you www.google.com

It is difficult to handle network problemsIf you can not receive a message from a specific machine, it is quite difficult even impossible to identify whether it is node crash or network crash.

If you send some data to a machine and a party to a socket disconnects, how can we identify how much data did the other receive.

Security problems: during the data transfer, Can someone in the middle intercept/modify our data?

Performance problems: Traffic congestion makes switch/router topology important for efficient throughput

Programming structures for processing network informationfork() based server data processing

multiple threads based

select() based

poll() based

see the bible of UNIX Network Programming

CLIENT

fd=socket()

setsockopt(fd)

r=connect(fd,destination)

read(fd)/write(fd)

send(fd)/recv(fd)

sendto/recvfrom(fd)

close(fd)

SERVER

listenfd=socket();

setsockopt(listenfd)

bind(listenfd)

listen(listenfd)

acceptedfd=accet(listenfd)

do_various_work_with_acceptedfd();//see following slides

Before you do the data transfer

fork based server data processingacceptedfd=accept(listenfd);

pid=for();

assert(pid>=0);

if(pid==0) //child process

close(listenfd);

do_some_thing_with_acceptedfd;

else

close(acceptedfd);

go_back_to_accept();

threads codeacceptedfd=accept(listenfd);

thread=get_free_thread_from_pool();

set_thread_data(thread,acceptedfd);

activate_thread(thread)

go_back_to_accept()

//in the thread

do_something(acceptedfd);

close(acceptedfd);

select code for multiple socketsWhy? you want to reuse the power of single thread and processing on multiple sockets. you want to stay in the same thread an thus you can keep the information more conveniently ( you don’t want to do some synchronization among threads).

FD_ZERO, FD_SET, fd_set(readfds, writefds), maxfd //setting the fds you want to monitor

switch(select(fd_set)){

-1: something is wrong; break;

0: rarely happen, you should do the select again;break;

default: for each fd you want to monitor if FD_ISSET(fd_set, fd) do data transfer with the fd.

}

poll code for multiple socketsSome other group propose the function of poll and use similar but different programming interface.

The programming structure can be the same as select.

int poll(struct pollfd *ufds, unsigned int nfds, int timeout);

POLLIN, POLLOUT, POLLPRI

If you are using the poll or select version, how can you notify the working thread?using fifo, and put one of the fd pair in the poll or select list

otherwise, you can use eventfd() which only use one instead of two fds.

RPC

What is remote procedure call?

Why RPC?

The types of RPC

How can we implement RPC (RPC internals)

RPCA remote procedure call (RPC) is an inter-process communication that allows a computer program to cause a subroutine or procedure to execute in another address space (commonly on another computer on a shared network) without the programmer explicitly coding the details for this remote interaction.

That is, the programmer writes essentially the same code whether the subroutine is local to the executing program, or remote.

Consider the object oriented principles, RPC is called as remote invocation or remote method invocation.

Request/Response over InternetRegular client-server protocols involve sending data back and forth according to a shared state.

Client: Server:

HTTP/1.0 index.html GET 200 OK Length: 2400 (file data)HTTP/1.0 hello.gif GET 200 OK Length: 81494

…

This is the straightforward way to use the network facilities.

Call a function in another process in another machineRPC servers will call arbitrary functions in its address spaces with arguments passed over the network and return values back over network.

Client: Server:foo.dll,bar(4, 10, “hello”) “returned_string”

foo.dll,baz(42) err: no such function …

Possible modes of RPCSynchronous RPC: Client call an RPC function, and then wait until the return value is sent back from the server.

Asynchronous PRC: Client call an RPC function, and then can continue on some other work. After a while, the client can check a handle to find out whether the return value of an PRC call is ready.

callback supported RPC: Client call an RPC function and then can continue on some other work. When the execution of the function is finished on the server, the server will notify the client to call a registered callback function.

Similar concepts exists in many areas of computer science including networking (different types of sockets), operating system (think about the system call provided).

Synchronous RPC

54

s = RPC(server_name, “foo.dll”, get_hello, arg, arg, arg…)

RPC dispatcher

String get_hello(a, b, c){ … return “some hello str!”;}

foo.dll:

print(s);...

client server

time

Asynchronous RPC

55

h = Spawn(server_name, “foo.dll”, long_runner, x, y…)

RPC dispatcher

String long_runner(x, y){ … return new GiantObject();}

foo.dll:

GiantObject myObj = Sync(h);

client server

time

(More code

...

keeps running…)

Callbacks

56

h = Spawn(server_name, “foo.dll”, callback, long_runner, x, y…)

RPC dispatcher

String long_runner(x, y){ … return new Result();}

foo.dll:

void callback(o) { Uses Result}

client server

time

(More code

...

runs…)

Thread spawns:

So, how can we implement RPC?From the client side:

1 wrap the argument

2 wrap the function id

3 wrap the server:port

So, translate the function call of bar(arg0, arg1) to some underlying mechanism like rpc_call(foo.dll, bar, arg0, arg1)

Programmers want bar(arg0, arg1) but the RPC designer have to implement rpc_call

Design ConsiderationsProtocol Choices: UDP? TCP?

Fault Tolerant: what if the network is broken? The call might be sent 0, 1, 2, ……times. A client sent just one call, but the server might receive multiple invocations. What should the server do? (at-most-once semantic vs. multiple invocation semantic.)

Security: can any one call RPC functions? This is through network, some malicious user might send a lot of invocations.

Compatible: how do you handle multiple versions of a function?

Error conditions: the function call itself might return error. The RPC framework might raise errors. So, how to handle various error conditions?

Object Oriented Support: we need to marshal/unmarshal objects.

A lot of RPC protocols: DCOM, CORBA, JRMI……

Go to Lab1Continue code guide for the lib1 and help students to understand various aspects of the source code in yfs.

RPC

FUSE

Thank you! Any Questions?

Click icon to add picture

Distributed System Building Blocks

Documents

Transcript of Distributed System Building Blocks