Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.

Post on 14-Dec-2015

225 views 2 download

Transcript of Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.

InstructorNeelima Gupta

ngupta@cs.du.ac.in

Table of Contents

Parallel Algorithms

Thanks to: Tejinder Kaur (35, MCS '09)

Instructor: Ms Neelima Gupta

Thanks to: Tejinder Kaur (35, MCS '09)

Solving a problem on multiple processors. S(n) is sequential time to solve a problem. T(n,p) is the parallel time to solve a

problem on p processors. W(n) is the work done by a parallel

algorithm. W(n)=T(n,p) p A parallel algorithm is optimal if the work

done is best of known sequential algorithm. i.e. if W(n)=S(n) Speed up is how much time is gained by

using more processors. speed up = S(n)/T(n,p)

Thanks to: Tejinder Kaur (35, MCS '09)

Take a problem of computing sum of numbers.Sequential time = Θ(n)We have 2 processors p1 and p2 and the numbers are2,3,4,5,1,11,13,10,7,8Initially all the numbers are with p1 and it sends half of them to p2. Both p1 and p2 compute sums and send the sums s1 and s2 to each other. So both have the final sum. p1 p2 2,3,4,5,1 11,13,10,7,8 (s1+s2) (s1+s2) Communication time= Θ(1)Computation time= Θ(n/2)T(n,2)= Θ(n/2)W(n)= n/2 2 =nHence this algorithm is optimal.Speed up = n/ n = 2 2

Thanks to: Tejinder Kaur (35, MCS '09)

PARALLEL MODELSDistributed Computing

Several independent machines are there.They communicate with

each oher by passing messages.The final result comes from all

independent machines.

M1 M2

M3

M5

M4

Thanks to: Tejinder Kaur (35, MCS '09)

SHARED MEMORY MODEL All the processors are reading and writing to the same memory.

There is no communication between them. Can not write at same time but can read at

same time.

SharedmemoryShared

memory

p1

p2

p3

pn

Thanks to: Tejinder Kaur (35, MCS '09)

Models for concurrency in shared memory modelEREW(Exclusive read exclusive write)CREW(Concurrent read exclusive write)CRCW(Concurrent read Concurrent Write)The weakest is EREW.CREW is Better than EREW but weaker thanCRCW.If we go from CRCW to CREW there is a

slowdownof factor of log(n).

Made By : Deepika Kamboj ( Roll No.7, MSc '11 )

Searching for a key Key =

x1 x2 xnx3

p1p

pnp

p3p

p2p

x1= xn=x3==

x2==

…….…

0

COMPARISON

OUTPUT

Thanks to 'PREETI'

xi …….…

…….…

….….…

…….…

…….…

pip

xi==

CRCW Key =

x1 x2 xnx3

p1p

pnp

p3p

p2p

x1= xn=x3==

x2==

…….…

0

COMPARISON

OUTPUT

Thanks to 'PREETI'

xi …….…

…….…

….….…

…….…

…….…

pip

xi==

Match Match found

CRCW Key =

x1 x2 xnx3

p1p

pnp

p3p

p2p

x1= xn=x3==

x2==

…….…

1

COMPARISON

OUTPUT

Thanks to 'PREETI'

xi …….…

…….…

….….…

…….…

…….…

pip

xi==

Match Match found

VERSION 1 OF SEARCHINGTo find the existence of the given KEY.MODEL used

CRCW Common Priority Arbitrary

Thanks to 'PREETI'

example for version1

Key = 7 12 7 3022

p1p

p6p

p3p

p2p

12≠7

0

COMPARISON

OUTPUT

15 7

p4p

p5p

30≠7

7=715≠7

22≠7

7=7

Thanks to 'PREETI'

example for version1

Key = 7 12 7 3022

p1p

p6p

p3p

p2p

12≠7

0

COMPARISON

OUTPUT

15 7

p4p

p5p

30≠7

7=715≠

722≠

77=7

Thanks to 'PREETI'

example for version1

Key = 7 12 7 3022

p1p

p6p

p3p

p2p

12≠7

1

COMPARISON

OUTPUT

15 7

p4p

p5p

30≠7

7=715≠

722≠

77=7

Thanks to 'PREETI'

VERSION 2 OF SEARCHINGTo find the processor id.MODEL used

CRCW Common Priority Arbitrary

Thanks to 'PREETI'

example for version2

Key = 7 12 7 3022

p1p

p6p

p3p

p2p

12≠7

0

COMPARISON

OUTPUT

15 7

p4p

p5p

30≠7

7=715≠7

22≠7

7=7

Thanks to 'PREETI'

example for version2

Key = 7 12 7 3022

p1p

p6p

p3p

p2p

12≠7

0

COMPARISON

OUTPUT

15 7

p4p

p5p

30≠7

7=715≠

722≠

77=7

Thanks to 'PREETI'

example for version2

Key = 7 12 7 3022

p1p

p6p

p3p

p2p

12≠7

p5

COMPARISON

OUTPUT

15 7

p4p

p5p

30≠7

7=715≠

722≠

77=7

P2 or p5

gets

written

Thanks to 'PREETI'

Or

Key = 7 12 7 3022

p1p

p6p

p3p

p2p

12≠7

p2

COMPARISON

OUTPUT

15 7

p4p

p5p

30≠7

7=715≠

722≠

77=7

P2 or p5

gets

written

Thanks to 'PREETI'

VERSION 3 OF SEARCHINGTo find the LEFT MOST OCCURRENCE of

the given KEY.MODEL used

CRCW Common Arbitrary Priority

×

Thanks to 'PREETI'

example for version3

Key = 7 12 7 3022

p1p

p6p

p3p

p2p

12≠7

0

COMPARISON

OUTPUT

15 7

p4p

p5p

30≠7

7=715≠7

22≠7

7=7

Thanks to 'PREETI'

example for version3

Key = 7 12 7 3022

p1p

p6p

p3p

p2p

12≠7

0

COMPARISON

OUTPUT

15 7

p4p

p5p

30≠7

7=715≠

722≠

77=7

Thanks to 'PREETI'

example for version3

Key = 7 12 7 3022

p1p

p6p

p3p

p2p

12≠7

p2

COMPARISON

OUTPUT

15 7

p4p

p5p

30≠7

7=715≠

722≠

77=7

P2 has

highest

priority.

Thanks to 'PREETI'

Thanks to: Tejinder Kaur (35, MCS '09)

SUM PROBLEMFind sum of n numbers and there are n processors. n

processors

n/2 processors

n/4

processors

1 processor

a1 a2 a3 a4 an

Thanks to: Tejinder Kaur (35, MCS '09)

Height of this tree is log n.Each step is taking constant time.Hence this algo takes O(log n) time.W(n)= n log n= nlogn.Speed up=n/log n.This algorithm is not optimal as half of the

processors areidle in first step and number of idle

processors isincreasing in further steps.What if we use n/log n processors.

Thanks to: Tejinder Kaur (35, MCS '09)

As the number of processors is n/log n.Each processor will get log n values.

s1 s2 sm

Take m=n/log nEach processor has n/log n values so sm sums will be generated.

Thanks to: Tejinder Kaur (35, MCS '09)

The height is log m.So it will take log m time <= log nSo T(n,p) <= 2logn = O(log n)W(n)= n=O(S(n))As sequential time is O(n).Hence this algorithm is optimal.

Thanks to: Tejinder Kaur (35, MCS '09)

SORTING Sort n numbers in parallel with n processors. Initially each procesor has an element.

n/2,2 merge

n/4,4 merge

1,n merge

a1 a2 a3 a4 an

Thanks to: Tejinder Kaur (35, MCS '09)

The last step will take n units of time n + n/2 + n/4 + - - - - - - + 2 <= 2nSo it takes O(n) time.W(n)= n2

Thanks to: Surbhi Tripathi (27, MCS '09)

Instructor: Ms Neelima Gupta

Thanks to: Surbhi Tripathi (27, MCS '09)

Definition: Prefix SumsGiven: Set of n values A = {a0,a1…….,an-1}

We want to find the prefix sums S0, S1,………..Sn-1.

Where, S0=a0 S1=a1+a0 | | Sn-1=an-1+…………+a1+a0

Thanks to: Surbhi Tripathi (27, MCS '09)

STEP - IIa0 a1 a2 a3 a4 a5 a6

a7

P1:s1 P2:a2oa3 P3:a4oa5 P4:a6oa7

p2: s3(s1oa2oa3)

p3:a4oa5oa6

p4:a4oa5oa6oa7

p1: s2(s1oa2)

Thanks to: Surbhi Tripathi (27, MCS '09)

STEP - IIIa0 a1 a2 a3 a4 a5 a6 a7

P1:s1 P2:a2oa3 P3:a4oa5 P4:a6oa7

p2: s3(s1oa2oa3)

p3:a4oa4oa6

p4:a4oa5oa6oa7

p1: s2(s1oa2)

p1=s4 (s3oa4)

p4: s7 (s3oa4oa5oa6oa7)

p3: s6 (s3oa4oa5oa6)

p2: s5 (s3oa4oa5)

Thanks to: Surbhi Tripathi (27, MCS '09)

CREW Model

Computations of prefix sums do not require any concurrent writes.

Thanks to: Surbhi Tripathi (27, MCS '09)

TIME COMPLEXITYTo compute prefix sums of n numbersAs,the number of prefix sums computed

doubles at each step.While computing n prefix sums we get a tree of height log n.

Each step takes constant time.

So, computing n prefix sums using n processors in parallel takes log n time