Parallel and Distributed Programming 236370 Spring 2001

1

Parallel and Distributed Programming 236370Spring 2001

Course website: www.cs.technion.ac.il/236370

Lecture: Thursday, 10:30

Lecturer: Assaf Schuster, Reception hr. Thu 14:00-15:00, Room 626

Frontal exercises: see course website.

Teaching Assistants: Ran Wolff (in charge), Nili Efargan

Checking Exercises:

Grading Policy:

3 programming home assignments in Java,

1 programming home assignments in MPI (speedups),

Midterm in the last lecture (June 19)

Moed B in Moed A of exam (???).

Relative weight: Midterm 30%, have to pass midterm to get exe grades.

2

Sources

• no books •MPI programming literature – find on the WWW and in library

• Java programming – find on the WWW or in the library

• Doug Lea: “Concurrent Programming in Java”, Addison-Wesley, 1996.

• Papers in the library

• Other papers, listed in the transparencies.

3

Planned Syllabus

• see course site.

4

Basic Paradigms

process = a unit of sequential instruction executionprogram = a collection of processesProcess communication:

• Shared Memory; in the language level we find:– Shared variables– Semaphores for synchronization– Mutual exclusion, Critical Code, Monitors/Locks

• Message Passing: – Local variables for each process – Send/receive parameters and data– Remote Procedure Call (Java’s Remote Message Invocation)– Barrier synchronization

• Many variants: Linda’s tupple space, Ada’s randevous, CSP’s guarded execution

5

Reality is Different from Paradigm

• In shared memory reading and writing is non-atomic because of queues and caching effects.

• Message passing is by way of point to point jumping and packetization, no direct connection.

OS should present to the user one of the simpler models. User may assume everything works as in the spec.

More often than not – implementation is buggy, or exposes details of a native view different from the spec.

Sometimes – model is being complicated to enhance performance and reduce communication – relaxed consistency.

6

1. Multi-threading on a uni-processor (your home PC)2. Multi-threading on a multi-processor (SMP)3. Tightly-coupled parallel computer

(Compaq’s Proliant, SGI’s Origin 2000,IBM’s MP/2, Cray’s T3D)

1. Distributed system (cluster)2. Internet computing (peer-to-peer)

Traditionally: 1+2 are programmable using shared memory, 3+4 are programmable using message passing, in 5 peer processes communicate with central control only.

However: things change! Most importantly: recent systems in 3 move towards presenting a shared memory interface to a physically distributed system. Is this an indication for the future?

Common Types of Parallel Systems

CommunicationEfficiency (bandwidth+ latency)

Scalability,Level of Parallelism

7

Execution Order

• Process execution is a-synchronic, no global bip, no global clock. Each process has a different execution speed, which may change over time. For an observer, on the time axis, instruction execution is ordered in execution order. Any order is legal. (Sometimes different processes may observe different global orders, TBD).

• Execution order for a single process is called program order.

x x x x x x x x x x x x

o o o o o o o o o o o o

time

P1

P2

8

Atomicity of Instruction Execution

The atomicity model is important for answering the question:

“Is my parallel program correct?”

Consider: P1 INC(i)P2 INC(i)

i := i+2

But what if: INC(i) = Read Rx,i Add Rx,1

Store Rx,i Then, possible order of execution: Read R1,i

Read R2,iAdd R1,1Add R2,1Store R1,iStore R2,i

i := i+1

9

Correctness of Concurrent Programs

Correctness is proven by means of invariants, or, properties.

• Necessity: Recall that the speed of instruction execution varies in time. Hence, if a certain property is true for any program execution, then it is necessarily true for each and every execution order.

• Sufficiency: We will assume the other direction as well: if a property is true for any execution order, then it is true for the program.

Sufficiency is not always true; it may fail to hold when “true” concurrency prevails. However, there is commonly a refinement of the model in which it holds (see above INC example).

The intuitive reason: there exists a software/hardware level in which instruction are ordered (say, when accessing a joint bus).

10

Correctness cntd.

Sufficiency implies a general method for proving correctness of parallel systems/programs:

By induction on all possible execution orders.

There are a lot of execution orders. For p processes of n instructions each, about p^(np).

With a little luck – induction is not too complicated.

11

Program Properties – Safety Properties

• are kept throughout computation, always true

• something “bad” cannot happen

• if does not hold, we will know within finite number of steps

Example: deadlock freedom חסר חבק

There is always a process that can execute another instruction (However, not necessarily does execute it).

Example: mutual exclusion ביצוע זר

It is not allowed for two given code regions (in two different processes) to execute concurrently.

Example: if x>y holds then x>y holds for the rest of the execution.

However: mutual exclusion as above holds even if the program does not allow any of the processes to execute any of the code regions!

12

Liveness Properties

• Guarantee progress in computation

• Something “good” must happen (in finite number of steps)

Example: no starvation חסר רעב

Any process wishes to execute an instruction will eventually be able to execute.

Example: Program/process eventually terminates.

Example: One of the processes will enter critical section.

(note the difference from deadlock freedom)

13

Fairness PropertiesLiveness properties are relatively weak guarantee of access to a shared resource.

• Weak fairness – if a process awaits on a certain request then eventually it will be granted.“Eventually” is not good enough for OS and real-time systems, when response time counts.

• Strong fairness – if the process performs the request sufficiently frequently then eventually it will be granted.

• Linear waiting – if a process performs the request it will be allowed previous to any other process granted twice.

• FIFO - …. previous to granting any other process that asked later.Easy to implement in a centralized system. However, in a distributed system it is not clear what “before or “later” mean.

14

N processes perform an infinite loop of instruction sequence, which is composed of a critical section and a non-critical section.

Mutual exclusion property: instructions from critical sections of two or more processes must not be interleaved in the (global observer’s) execution order.

Mutual Exclusion

x (x x x) x x x x x x x x

o (o o o) o o o o o o o o

time

P1

P2

15

The Solution

The solution is by way of additional instructions executed by every process which is to enter or leave its critical section.

• The pre_protocol הפרוטוקול המקדים

• The post_protocol הפרוטוקול המסיים

Loop

Non_critical_section;

Pre_protocol;

Critical_section;

Post_protocol;

End_loop;

16

Solution must guarantee

1. A process cannot stop for indefinite time in the critical_section or the protocols. The solution must ensure that such a stop at the non_critical_section by one of the processes will not violate the ability of the other processes to enter the critical section.

2. No deadlock. It may be that several processes perform inside their pre_protocols. Eventually, one of them will succeed to enter the critical_section.

3. No starvation. If a process enters its pre_protocol with the intention to enter the critical section, it will eventually succeed.

4. No self exclusion (מניעה עצמית). In the absence of other processes trying to enter the critical_section, a single process will always succeed doing so in a very short time.

17

Solution try 1 – Give them a token to decide whose turn is it

Integer Turn = 1;

P1:

begin

loop

non_crit_1;

loop

exit when Turn = 1;

end loop;

crit_sec_1;

Turn := 2;

end loop;

end P1;

P2:

begin

loop

non_crit_2;

loop

exit when Turn = 2;

end loop;

crit_sec_2;

Turn := 1;

end loop;

end P2;

(Note: atomicRead/Write)

18

Solution try 2 – Let’s give each process a variable it can use to announce that it is in its crit_sec

Integer C1=1, C2=1;

P1:

Loop

non_crit_sec_1;

loop

exit when C2=1;

end loop;

C1 := 0;

crit_sec_1;

C1 := 1;

End Loop;

P2:

Loop

non_crit_sec_2;

loop

exit when C1=1;

end loop;

C2 := 0;

crit_sec_2;

C2 := 1;

End Loop;

Problem: no mutual exclusion

Execution example:

P1 sees C2=1

P2 sees C1=1

P1 sets C1 := 0

P2 sets C2 := 0

P1 enters critical sec

P2 enters critical sec

19

Solution try 3 – Let’s set announcing variable before the loop

Integer C1=1, C2=1;

P1:

Loop

non_crit_sec_1;

C1 := 0;

loop

exit when C2=1;

end loop;

crit_sec_1;

C1 := 1;

End Loop;

P2:

Loop

non_crit_sec_2;

C2 := 0;

loop

exit when C1=1;

end loop;

crit_sec_2;

C2 := 1;

End Loop;

Problem: deadlock

Execution example:

P1 sets C1:=0

P2 sets C2:=0

P1 checks C2 forever

P2 checks C1 forever

20

Solution try 4 – Let’s allow other process to enter its crit_sec if we fail to do so

Integer C1=1, C2=1;

P1:

Loop

non_crit_sec_1;

C1 := 0;

loop

exit when C2=1;

C1 := 1;

C1 := 0;

end loop;

crit_sec_1;

C1 := 1;

End Loop;

P2:

Loop

non_crit_sec_2;

C2 := 0;

loop

exit when C1=1;

C2 := 1;

C2 := 0;

end loop;

crit_sec_1;

C2 := 1;

End Loop;

Can other processenter betweenCi:=1 and Ci:=0 ?

Problem: starvation

Between C1:=1 and C1:=0 P2 completed a full “round”.

Problem: livelock

21

Dekker’s algorithm – let’s give processes a priority token that will give holder the right of way when competing

Integer C1=1, C2=1, Turn=1;

P1:

Loop

non_crit_sec_1;

C1 := 0;

loop

exit when C2=1;

if Turn = 2 then

C1 := 1;

loop exit when Turn = 1;

end loop;

C1 := 0;

end if;

end loop;

crit_sec_1;

C1 := 1;

Turn := 2;

End Loop;

P2:

Loop

non_crit_sec_2;

C2 := 0;

loop

exit when C1=1;

if Turn = 1 then

C2 := 1;

loop exit when Turn = 2;

end loop;

C2 := 0;

end if;

end loop;

crit_sec_2;

C2 := 1;

Turn := 1;

End Loop;

Algorithm Correct!!!

P1 is performing inside the

“insisting loop”:

• If C2==0 then P1 knows P2 wants to enter crit_sec

• If, in addition, Turn=2, then P1 gives turn to P2, and waits for P2 to finish.

• Clearly, while P1 does all these, P2 itself will not give up because it is his Turn.

All characteristics for a valid solution exist.

22

Bakery Algorithm – mutual exclusion for N processesLoop

non_crit_sec_i;

choosing(i) := 1;

number(i) := 1 + max(number);

choosing(i) := 0;

for j in 1..N loop

if j /= i then

loop exit when choosing(j) = 0; end loop;

loop

exit when

number(j) = 0 or

number(i) < number(j) or

number(i) = number (j) and i < j);

end loop;

end if;

end loop;

crit_sec_i;

number(i) := 0;

End loop;

Shared arrays:

array(1..N) of integer Choosing, Number;

Process Pi performs:

integer i := process id;

The idea is to have processes take tickets with numbers on them (just like in the city hall, or health care).Other processes give turn to processholding the ticket with minimal number(he got there first).If two tickets happen to be the same,the process having minimal id enters.

23

C – shared variable

Bi – Pi’s private variable

T&S (Test and Set) = Bi := C;

C := 1;

C&S (Compare and Swap) =

if Bi /= C

tmp := C;

C := Bi;

Bi := tmp;

end if;

Changing the rules of the game – increasing atomicity (load+store)

Loop:

non_crit_sec_i;

loop

T&S(Bi);

exit when Bi=0;

end loop;

crit_sec_i;

C := 0;

End loop;

Such strong op’s are usually supportedby the underlying hardware/OS.

24

The Price of Atomic [load+store]or: Why not Simply Always use Strong Operations?

The “Set” of C must be seen immediately by all other processors, in case they execute competing code. Since communication between processors is via the main memory, need to cut through cache levels. Price: dozens to hundreds of clock cycles, and growing.

Main Memory

Local cacheand registers

L2/L3 cache

Proc. 0 Proc. 1 Proc. 2 Proc. 3

L2/L3 cache

C

T&S

B0 B2

Load+Store

Load+Store

25

Semaphores

A semaphore is a special variable.

After initialization, only two atomic operations are applicable:

Busy-Wait Semaphore:

P(S) = WAIT(S):: When S>0 then S:= S-1

V(S) = SIGNAL(S):: S:= S+1

Another definition: Blocked-Set Semaphore:

WAIT(S):: if S>0 then S:= S-1

else “wait on S”

SIGNAL(S):: if there are processes waiting on S

then let one of them proceed,

else S:=S+1

NOTE:[Load+Store] areembedded in bothWAIT and SIGNAL.

Thus, MutualExclusion usingsemaphores is easy.

26

Semaphores cntd.

Note: in blocked-set and busy-wait semaphores starvation is possible.

Blocked-Queue Semaphore:

Change blocked-set semaphore definition, so that

blocked processes are released in FIFO.

Fair Semaphore:

Change busy-wait semaphore definition, so that

if S>0 infinitely many times then every process performing

WAIT(S) will eventually be released.

27

Binary Semaphores

Replace S:=S+1 by S:=1 in all definitions.

Note: operations are still strong and expensive.

“Implementing Semaphores by Binary Semaphores”, Hans Barz, SIGPLAN Notices, vol. 18, Feb. 1983.

S- semaphore; S1,S2 binary semaphores; X variable;

WAIT(S) wait(S2);wait(S1);X:=X-1;if X>0 signal(S2);signal(S1);

SIGNAL(S) wait(S1);X:=X+1;if X=1 signal(S2);signal(S1);

Outside the “atomic” regions wait(S1)signal(S1): X>0 iff S2=1.

This check to saveon signal operations

28

Policy for Programming with Semaphores

Use semaphores as little as possible – these are strong operations!

Define the role of each semaphore using a fixed relation between semaphore’s value and “something” in the program.

Examples:

• Mutual Exclusion: Process may enter critical section iff S=1.

• Readers-Writers: S = # of free slots in the buffer.

Then do:

1. Identify the necessity of each wait and signal wrt the above mentioned role of the semaphore.

2. Same for semaphore initialization.

3. Make sure each wait is eventually released.

29

Semaphores – a software engineering problem

1. Processes handling semaphores contain code related to the role of these shared variables in other processes.

2. An error using semaphore in any of the places in the system manifests itself in other processes at other times. It is extremely hard to identify the sources of such bugs.

30

Monitors – C.A.R. Hoare, CACM, vol 17, no. 10, Oct. 1974

Idea: lets put all the code for handling shared variables in one place.

So, let’s make something which is:

1. Object-oriented programming style (Simula class)

2. Monolithic monitor – a central core handling all requests.

Each monitor has its own mission, and private data.

Only a single process can enter a monitor at any point in time.

Monitor <name>

(declaring variables local to the monitor and global to monitor procedures)

Procedure name1 (…)

Procedure name2 (…)

…

Begin

::: initializing monitor local variables

End.

31

Condition Variables

Each handles a set of waiting processes.

• wait(condvar) – the process always blocks and enters the set of processes waiting on condvar.

• Signal(condvar) – one process from the set of those waiting on condvar is released. Empty queue – nothing happens.

(Since only one process is allowed into the monitor, we shall assume signal to be the last instruction when exiting the monitor.)

• Nonempty(condvar) – returns True iff the waiting set is not empty.

32

Barrier Synchronization – schedule 5 processes for concurrent executionMonitor example

Integer count; Condition five;Procedure sync(); {

If count<4 {count := count+1;wait(five);signal(five);

} else {count := 0;signal(five);

}}

Note this program is unfair (unless five is aFIFO): it allowsa process to be releasedfrom waiting on five,looooooooooooooop,fetch the monitor again,wait on five,and be released again,while other processeskeep waiting on five.

33

Concurrent readers or Exclusive writerMonitor readwrite;

integer readers; boolean writing; condition okread, okwrite;

Procedure startread {

if (writing or nonempty(okwrite)) wait(okread);

readers := readers + 1; signal(okread);}

Procedure endread {

readers := readers – 1;

if (readers == 0) signal(okwrite);}

Procedure startwrite {

if (readers /= 0 or writing) wait(okwrite);

writing := true;}

Procedure endwrite {

writing := false;

if (nonempty(okread)) signal(okread)

else signal(okwrite);}

BeginMonitor readers:=0; writing :=false; EndMonitor;

Procedure readproc {repeat

M.startread;read the dataM.endread;

forever}Procedure writeproc {

repeatM.startwrite;write the dataM.endwrite;

forever}Cobeginreadproc; readproc; writeproc; …Coend.

34

The program in the previous slidedoes not work

Because what if a reader goes to sleep on OKREAD. Now, a writer comes in, goes out while signaling OKREAD, and another writer comes in. When the other writer is between startwrite and endwrite, the reader can fetch the monitor and since this time it does not check “writing”, it will enter the critical section together with the second writer.

Fix: check “writing” also when re-entering after waiting on the condition variable (replace “if” with a “while”).

35

In general…..

When re-acquiring the monitor after waiting on a condition variable, always make sure that conditions (program state) remain as they were when wait was performed.

Either:

1. Prove that this is always the case.

2. Check it when re-acquiring the monitor.

36

Recursive Monitor Calls

The problem: When a procedure A.x in Monitor A calls a procedure B.y at Monitor B, should the process exit A before entering B?

Cons:

1. From software engineering point of view, it is important that when the process returns to A, conditions of its exit will persist.

2. Exiting B (right back into A) will be dependent on succeeding to enter A again.

Pros:

1. There is no activity in A while process is in B, can use the time.

2. May prevent deadlock, if B has dependencies on actions happening in A.

Current Java definition: no release.

37

WAIT in recursive calls

• If monitor A calls monitor B, and B waits, does this releases the lock on A?– Current Java definition: no release. Release

ONLY the locks on B.

Parallel and Distributed Programming 236370 Spring 2001

Documents

Transcript of Parallel and Distributed Programming 236370 Spring 2001