ECE 1747: Parallel Programming

31
ECE 1747: Parallel Programming Distributed Shared Memory (DSM)

description

ECE 1747: Parallel Programming. Distributed Shared Memory (DSM). Multiprocessor (SMP). proc1. proc3. proc2. X=0. X=0. X=0. X=0. Consistency Models. Sequential Consistency All processors observe the same order Must correspond to some serial order - PowerPoint PPT Presentation

Transcript of ECE 1747: Parallel Programming

Page 1: ECE 1747: Parallel Programming

ECE 1747: Parallel Programming

Distributed Shared Memory (DSM)

Page 2: ECE 1747: Parallel Programming

Multiprocessor (SMP)

proc1 proc3

X=0

X=0 X=0

proc2

X=0

Page 3: ECE 1747: Parallel Programming

Consistency Models

• Sequential Consistency– All processors observe the same order– Must correspond to some serial order– Only ordering constraint is that

reads/writes of P1 appear in the same order, but no restrictions on relative ordering between processors.

Page 4: ECE 1747: Parallel Programming

Common consistency protocols

• Write update– Multicast update to all replicas

• Write invalidate– Invalidate cached copies in p2, p3– Cache miss if p2/p3 access X

• Valid data from other cache

Page 5: ECE 1747: Parallel Programming

Distributed Shared Memory (DSM)

mem0

proc0

mem1

proc1

mem2

proc2

memN

procN

network

...

shared memory

Page 6: ECE 1747: Parallel Programming

DSM programming

• Standard – pthread-like• synchronizations

– Barriers – Locks– Semaphores

Page 7: ECE 1747: Parallel Programming

Sequential SOR

for some number of timesteps/iterations {for (i=0; i<n; i++ )

for( j=1, j<n, j++ )temp[i][j] = 0.25 *

( grid[i-1][j] + grid[i+1][j]

grid[i][j-1] + grid[i][j+1] );for( i=0; i<n; i++ )

for( j=1; j<n; j++ )grid[i][j] = temp[i][j];

}

Page 8: ECE 1747: Parallel Programming

Parallel SOR with Barriers (1 of 2)

void* sor (void* arg){

int slice = (int)arg;int from = (slice * (n-1))/p + 1;int to = ((slice+1) * (n-1))/p + 1;

for some number of iterations { … }}

Page 9: ECE 1747: Parallel Programming

Parallel SOR with Barriers (2 of 2)

for (i=from; i<to; i++) for (j=1; j<n; j++)

temp[i][j] = 0.25 * (grid[i-1][j] + grid[i+1][j] + grid[i][j-1] + grid[i][j+1]);

barrier();for (i=from; i<to; i++)

for (j=1; j<n; j++) grid[i][j]=temp[i][j];

barrier();

Page 10: ECE 1747: Parallel Programming

Sequential Consistency DSM

• As proposed by Li & Hudak, TOCS ‘86.• Use virtual memory to implement

sharing.• Shared memory divided up by virtual

memory pages.• Use an SMP-like coherence protocol.• Keep pages in one of three states:

– invalid, read-only, read-write

Page 11: ECE 1747: Parallel Programming

SC implementation

• Synchronous read/write– Writes must be propagated before

moving on to the next operation

Page 12: ECE 1747: Parallel Programming

Read-Write False Sharing

x

y

Page 13: ECE 1747: Parallel Programming

Read-Write False Sharing (Cont.)

w(x)

r(y) r(y) r(x)

w(x) w(x)

Page 14: ECE 1747: Parallel Programming

Read-Write False Sharing (Cont.)

w(x)

r(y) r(y) r(x)

synch

w(x) w(x)

Page 15: ECE 1747: Parallel Programming

Weak Consistency (WEAKC)

• Data modifications are only propagated at the time of synchronization.

• Works fine if program is properly synchronized through system primitives.– All programs should be …

Page 16: ECE 1747: Parallel Programming

Read-Write False Sharing (Before)

w(x)

r(y) r(y) r(x)

synch

w(x) w(x)

Page 17: ECE 1747: Parallel Programming

Read-Write False Sharing (WEAKC)

w(x) w(x)

r(y) r(y) r(x)

synch

Page 18: ECE 1747: Parallel Programming

Write-Write False Sharing

x

y

Page 19: ECE 1747: Parallel Programming

Write-Write False Sharing

w(x)

w(y) w(y) r(x)

synch

w(x) w(x)

Page 20: ECE 1747: Parallel Programming

Write-Write False Sharing (WEAKC)

w(x)

w(y) w(y) r(x)

synch

w(x) w(x)

Page 21: ECE 1747: Parallel Programming

Multiple Writer (MW) Protocols

• Allows multiple writers per page.• Modifications merged at

synchronization (according to weakc definition).

• Modifications are recorded through a mechanism called twinning and diffing.

Page 22: ECE 1747: Parallel Programming

Write-Write False Sharing and MW

w(x)

w(y) w(y) r(x)

synch

w(x) w(x)

Page 23: ECE 1747: Parallel Programming

Creating a diff (delta)

w(x) w(x)...

twin Diff (delta)

writablewrite-protected

write-protected

Page 24: ECE 1747: Parallel Programming

Write-Write False Sharing and MW

w(x)

w(y) w(y) r(x)

synch

w(x) w(x)

y yx

xtwin

twin

x

Page 25: ECE 1747: Parallel Programming

Release Consistency (RC)

• Distinguish acquires from releases– Ordinary read/write wait until the

previous acquire is performed– Release waits until previous

read/write are performed– Acquire/release are sequentially

consistent w.r.t. one another

Page 26: ECE 1747: Parallel Programming

Eager & Lazy Release Consistency

• Eager release consistency: transfer consistency information at release of a lock.

• Lazy release consistency: transfer consistency information at acquire of a lock.

Page 27: ECE 1747: Parallel Programming

Eager Release Consistency

w(x) rel

acq r(x)

acq w(x) rel

p1

p2

p3

p4

Acq w(x) rel

Page 28: ECE 1747: Parallel Programming

Lazy Release Consistency

w(x) rel

acq r(x)

acq w(x) rel

p1

p2

p3

p4

Acq w(x) rel

Page 29: ECE 1747: Parallel Programming

Lazy Release Consistency

• Acquiring processor determines witch modifications it needs to see.

w(x) rel

acq w(y) rel

p1

p2

p3acq r(x) r(y)

synch

Page 30: ECE 1747: Parallel Programming

Vector Timestamps

w(x) rel

acq w(y) rel

p1

p2

p3acq r(x) r(y)

000

000

000

100

110

Page 31: ECE 1747: Parallel Programming

DSM Summary

• Relaxed consistency– application’s definition of correctness

• >70% performance of corresponding message passing applications