L ECTURE 2

61
1 LECTURE 2 Matrix Multiplication Tableau Construction Recurrences (Review) Conclusion Merge Sort

description

L ECTURE 2. Recurrences (Review). Matrix Multiplication. Merge Sort. Tableau Construction. Conclusion. Algorithmic Complexity Measures. T P = execution time on P processors. T 1 = work. T 1 = span *. L OWER B OUNDS T P ¸ T 1 / P T P ¸ T 1. - PowerPoint PPT Presentation

Transcript of L ECTURE 2

Page 1: L ECTURE  2

LECTURE 2

• Matrix Multiplication

• Tableau Construction

• Recurrences (Review)

• Conclusion

• Merge Sort

Page 2: L ECTURE  2

Algorithmic Complexity MeasuresTP = execution time on P processors

T1 = work

LOWER BOUNDS• TP ¸ T1/P• TP ¸ T1

*Also called critical-path length or computational depth.

T1 = span*

Page 3: L ECTURE  2

Speedup

Definition: T1/TP = speedup on P processors.

If T1/TP = (P) · P, we have linear speedup;= P, we have perfect linear speedup;> P, we have superlinear speedup,

which is not possible in our model, because of the lower bound TP ¸ T1/P.

Page 4: L ECTURE  2

July 13, 2006 4

Parallelism

Because we have the lower bound TP ¸ T1, the maximum possible speedup given T1 and T1 isT1/T1 = parallelism

= the average amount of work per step along the span.

Page 5: L ECTURE  2

The Master MethodThe Master Method for solving recurrences applies to recurrences of the form

T(n) = a T(n/b) + f (n) , where a ¸ 1, b > 1, and f is asymptotically positive.

IDEA: Compare nlogba with f (n) .

*The unstated base case is T(n) = (1) for sufficiently small n.

*

Page 6: L ECTURE  2

Master Method — CASE 1

nlogba À f (n)

Specifically, f (n) = O(nlogba – e) for some constant e > 0.Solution: T(n) = (nlogba) .

T(n) = a T(n/b) + f (n)

Page 7: L ECTURE  2

Master Method — CASE 2

Specifically, f (n) = (nlogba lgkn) for some constant k ¸ 0.Solution: T(n) = (nlogba lgk+1n) .

nlogba ¼ f (n)

T(n) = a T(n/b) + f (n)

Page 8: L ECTURE  2

Master Method — CASE 3

Specifically, f (n) = (nlogba + e) for some constant e > 0 and f (n) satisfies the regularity condition that a f (n/b) · c f (n) for some constant c < 1.Solution: T(n) = (f (n)) .

nlogba ¿ f (n)

T(n) = a T(n/b) + f (n)

Page 9: L ECTURE  2

Master Method Summary

CASE 1: f (n) = O(nlogba – e), constant e > 0 T(n) = (nlogba) .CASE 2: f (n) = (nlogba lgkn), constant k 0 T(n) = (nlogba lgk+1n) .CASE 3: f (n) = (nlogba + e ), constant e > 0, and regularity condition T(n) = ( f (n)) .

T(n) = a T(n/b) + f (n)

Page 10: L ECTURE  2

Master Method Quiz•T(n) = 4 T(n/2) + n

nlogba = n2 À n ) CASE 1: T(n) = (n2).•T(n) = 4 T(n/2) + n2

nlogba = n2 = n2 lg0n ) CASE 2: T(n) = (n2lg n).•T(n) = 4 T(n/2) + n3

nlogba = n2 ¿ n3 ) CASE 3: T(n) = (n3).•T(n) = 4 T(n/2) + n2/ lg n

Master method does not apply!

Page 11: L ECTURE  2

LECTURE 2

• Matrix Multiplication

• Tableau Construction

• Recurrences (Review)

• Conclusion

• Merge Sort

Page 12: L ECTURE  2

Square-Matrix Multiplicationc11 c12 L c1n

c21 c22 L c2n

M M O Mcn1 cn2 L cnn

a11 a12 L a1n

a21 a22 L a2n

M M O Man1 an2 L ann

b11 b12 L b1n

b21 b22 L b2n

M M O Mbn1 bn2 L bnn

= £

C A B

cij = k = 1

n

aik bkj

Assume for simplicity that n = 2k.

Page 13: L ECTURE  2

Recursive Matrix Multiplication

8 multiplications of (n/2) £ (n/2) matrices.1 addition of n £ n matrices.

Divide and conquer —

C11 C12

C21 C22

= £A11 A12

A21 A22

B11 B12

B21 B22

= +A11B11 A11B12

A21B11 A21B12

A12B21 A12B22

A22B21 A22B22

Page 14: L ECTURE  2

cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return;}

Matrix Multiply in Pseudo-Cilk

C = A¢ BAbsence of type

declarations.

Page 15: L ECTURE  2

cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return;}

C = A¢ B Coarsen base cases for efficiency.

Matrix Multiply in Pseudo-Cilk

Page 16: L ECTURE  2

cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return;}

C = A¢ B

Submatrices are produced by pointer calculation, not copying of elements.

Also need a row-size argument for array indexing.

Matrix Multiply in Pseudo-Cilk

Page 17: L ECTURE  2

cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return;}

C = A¢ B

cilk void Add(*C, *T, n) { h base case & partition matrices i spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); sync; return;}C = C + T

Matrix Multiply in Pseudo-Cilk

Page 18: L ECTURE  2

A1(n) = ?4 A1(n/2) + (1)

cilk void Add(*C, *T, n) { h base case & partition matrices i spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); sync; return;}

Work of Matrix Addition

Work:

nlogba = nlog24 = n2 À (1) .

— CASE 1= (n2)

Page 19: L ECTURE  2

cilk void Add(*C, *T, n) { h base case & partition matrices i spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); sync; return;}

cilk void Add(*C, *T, n) { h base case & partition matrices i spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); sync; return;}

A1(n) = ?

Span of Matrix Addition

A1(n/2) + (1)Span:

nlogba = nlog21 = 1 ) f (n) = (nlogba lg0n) .

— CASE 2= (lg n)

maximum

Page 20: L ECTURE  2

M1(n) = ?

Work of Matrix Multiplication

8 M1(n/2) +A1(n) + (1)Work:

cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return;}

8

nlogba = nlog28 = n3 À (n2) .

= 8 M1(n/2) + (n2)= (n3) — CASE 1

Page 21: L ECTURE  2

cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return;}

cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return;}

M1(n) = ?M1(n/2) + A1(n) + (1)

Span of Matrix Multiplication

Span:

nlogba = nlog21 = 1 ) f (n) = (nlogba lg1n) .

= M1(n/2) + (lg n)= (lg2 n) — CASE 2

8

Page 22: L ECTURE  2

Parallelism of Matrix Multiply

M1(n) = (n3)Work:

M1(n) = (lg2n)Span:

Parallelism: M1(n)M1(n)

= (n3/lg2n)

For 1000 £ 1000 matrices, parallelism ¼ (103)3/102 = 107.

Page 23: L ECTURE  2

cilk void Mult(*C, *A, *B, n) { h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return;}

Stack Temporariesfloat *T = Cilk_alloca(n*n*sizeof(float));

In hierarchical-memory machines (especially chip multiprocessors), memory accesses are so expensive that

minimizing storage often yields higher performance.

IDEA: Trade off parallelism for less storage.

Page 24: L ECTURE  2

No-Temp Matrix Multiplicationcilk void MultA(*C, *A, *B, n) { // C = C + A * B h base case & partition matrices i spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); sync; spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); sync; return;}

Saves space, but at what expense?

Page 25: L ECTURE  2

= (n3)

Work of No-Temp Multiply

M1(n) = ?8 M1(n/2) + (1)Work:— CASE 1

cilk void MultA(*C, *A, *B, n) { // C = C + A * B h base case & partition matrices i spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); sync; spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); sync; return;}

Page 26: L ECTURE  2

cilk void MultA(*C, *A, *B, n) { // C = C + A * B h base case & partition matrices i spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); sync; spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); sync; return;}

cilk void MultA(*C, *A, *B, n) { // C = C + A * B h base case & partition matrices i spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); sync; spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); sync; return;}

= (n)M1(n) = ?

Span of No-Temp Multiply

Span:— CASE 1

2 M1(n/2) + (1)

maximum

maximum

Page 27: L ECTURE  2

Parallelism of No-Temp MultiplyM1(n) = (n3)Work:

M1(n) = (n)Span:

Parallelism: M1(n)M1(n)

= (n2)

For 1000 £ 1000 matrices, parallelism ¼ (103)3/103 = 106.

Faster in practice!

Page 28: L ECTURE  2

Testing SynchronizationCilk language feature: A programmer can check whether a Cilk procedure is “synched” (without actually performing a sync) by testing the pseudovariable SYNCHED:• SYNCHED = 0 ) some spawned children

might not have returned.• SYNCHED = 1 ) all spawned children have

definitely returned.

Page 29: L ECTURE  2

Best of Both Worldscilk void Mult1(*C, *A, *B, n) {// multiply & store h base case & partition matrices i spawn Mult1(C11,A11,B11,n/2); // multiply & store spawn Mult1(C12,A11,B12,n/2); spawn Mult1(C22,A21,B12,n/2); spawn Mult1(C21,A21,B11,n/2); if (SYNCHED) { spawn MultA1(C11,A12,B21,n/2); // multiply & add spawn MultA1(C12,A12,B22,n/2); spawn MultA1(C22,A22,B22,n/2); spawn MultA1(C21,A22,B21,n/2); } else { float *T = Cilk_alloca(n*n*sizeof(float)); spawn Mult1(T11,A12,B21,n/2); // multiply & store spawn Mult1(T12,A12,B22,n/2); spawn Mult1(T22,A22,B22,n/2); spawn Mult1(T21,A22,B21,n/2); sync; spawn Add(C,T,n); // C = C + T } sync; return;}

This code is just as parallel as the original, but it only uses more space if runtime parallelism actually exists.

Page 30: L ECTURE  2

Ordinary Matrix Multiplication

cij = k = 1

n

aik bkj

IDEA: Spawn n2 inner products in parallel. Compute each inner product in parallel.

Work: (n3)Span: (lg n)Parallelism: (n3/lg n)

BUT, this algorithm exhibits poor locality and does not exploit the cache hierarchy of modern microprocessors, especially CMP’s.

Page 31: L ECTURE  2

LECTURE 2

• Matrix Multiplication

• Tableau Construction

• Recurrences (Review)

• Conclusion

• Merge Sort

Page 32: L ECTURE  2

3 12 19 46

4 14 21 23

193

4

12

14 21 23

46

Merging Two Sorted Arraysvoid Merge(int *C, int *A, int *B, int na, int nb) { while (na>0 && nb>0) { if (*A <= *B) { *C++ = *A++; na--; } else { *C++ = *B++; nb--; } } while (na>0) { *C++ = *A++; na--; } while (nb>0) { *C++ = *B++; nb--; }}

Time to merge n elements = ? (n).

Page 33: L ECTURE  2

cilk void MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn MergeSort(C, A, n/2); spawn MergeSort(C+n/2, A+n/2, n-n/2); sync; Merge(B, C, C+n/2, n/2, n-n/2); } }

Merge Sort

144619 3 12 33 4 21

4 3319 46 143 12 21

46 333 12 19 4 14 21

46143 4 12 19 21 33

merge

merge

merge

Page 34: L ECTURE  2

= (n lg n)T1(n) = ?2 T1(n/2) + (n)

Work of Merge Sort

Work:— CASE 2

nlogba = nlog22 = n ) f (n) = (nlogba lg0n) .

cilk void MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn MergeSort(C, A, n/2); spawn MergeSort(C+n/2, A+n/2, n-n/2); sync; Merge(B, C, C+n/2, n/2, n-n/2); } }

Page 35: L ECTURE  2

T1(n) = ?T1(n/2) + (n)

Span of Merge Sort

Span:— CASE 3= (n)

nlogba = nlog21 = 1 ¿ (n) .

cilk void MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn MergeSort(C, A, n/2); spawn MergeSort(C+n/2, A+n/2, n-n/2); sync; Merge(B, C, C+n/2, n/2, n-n/2); } }

Page 36: L ECTURE  2

Parallelism of Merge SortT1(n) = (n lg n)Work:

T1(n) = (n)Span:

Parallelism: T1(n)T1(n)

= (lg n)

We need to parallelize the merge!

Page 37: L ECTURE  2

B

A0 na

0 nbna ¸ nb

Parallel Merge

· A[na/2] ¸ A[na/2]

Binary search

j j+1

Recursivemerge

Recursivemerge

na/2

· A[na/2] ¸ A[na/2]

KEY IDEA: If the total number of elements to be merged in the two arrays is n = na + nb, the total number of elements in the larger of the two recursive merges is at most ? (3/4) n .

Page 38: L ECTURE  2

Parallel Mergecilk void P_Merge(int *C, int *A, int *B, int na, int nb) { if (na < nb) { spawn P_Merge(C, B, A, nb, na); } else if (na==1) { if (nb == 0) { C[0] = A[0]; } else { C[0] = (A[0]<B[0]) ? A[0] : B[0]; /* minimum */ C[1] = (A[0]<B[0]) ? B[0] : A[0]; /* maximum */ } } else { int ma = na/2; int mb = BinarySearch(A[ma], B, nb); spawn P_Merge(C, A, B, ma, mb); spawn P_Merge(C+ma+mb, A+ma, B+mb, na-ma, nb-mb); sync; }}

Coarsen base cases for efficiency.

Page 39: L ECTURE  2

T1(n) = ?T1(3n/4) + (lg n)

Span of P_Merge

Span:— CASE 2= (lg2n)

cilk void P_Merge(int *C, int *A, int *B, int na, int nb) { if (na < nb) { } else { int ma = na/2; int mb = BinarySearch(A[ma], B, nb); spawn P_Merge(C, A, B, ma, mb); spawn P_Merge(C+ma+mb, A+ma, B+mb, na-ma, nb-mb); sync; }}

nlogba = nlog4/31 = 1 ) f (n) = (nlogba lg1n) .

Page 40: L ECTURE  2

T1(n) = ?T1(n) + T1((1–)n) + (lg n),where 1/4 · · 3/4 .

Work of P_Merge

Work:

cilk void P_Merge(int *C, int *A, int *B, int na, int nb) { if (na < nb) { } else { int ma = na/2; int mb = BinarySearch(A[ma], B, nb); spawn P_Merge(C, A, B, ma, mb); spawn P_Merge(C+ma+mb, A+ma, B+mb, na-ma, nb-mb); sync; }}

CLAIM: T1(n) = (n) .

Page 41: L ECTURE  2

Analysis of Work Recurrence

Substitution method: Inductive hypothesis is T1(k) · c1k – c2lg k, where c1,c2 > 0. Prove that the relation holds, and solve for c1 and c2.

T1(n) = T1(n) + T1((1–)n) + (lg n), where 1/4 · · 3/4 .

T1(n) = T1(n) + T1((1–)n) + (lg n)· c1(n) – c2lg(n)

+ c1((1–)n) – c2lg((1–)n) + (lg n)

Page 42: L ECTURE  2

Analysis of Work Recurrence

T1(n) = T1(n) + T1((1–)n) + (lg n)· c1(n) – c2lg(n)

+ c1(1–)n – c2lg((1–)n) + (lg n)

T1(n) = T1(n) + T1((1–)n) + (lg n), where 1/4 · · 3/4 .

Page 43: L ECTURE  2

T1(n) = T1(n) + T1((1–)n) + (lg n)· c1(n) – c2lg(n)

+ c1(1–)n – c2lg((1–)n) + (lg n)

Analysis of Work Recurrence

· c1n – c2lg(n) – c2lg((1–)n) + (lg n)· c1n – c2 ( lg((1–)) + 2 lg n ) + (lg n)· c1n – c2 lg n

– (c2(lg n + lg((1–))) – (lg n))· c1n – c2 lg n

by choosing c1 and c2 large enough.

T1(n) = T1(n) + T1((1–)n) + (lg n), where 1/4 · · 3/4 .

Page 44: L ECTURE  2

Parallelism of P_Merge

T1(n) = (n)Work:

T1(n) = (lg2n)Span:

Parallelism: T1(n)T1(n)

= (n/lg2n)

Page 45: L ECTURE  2

cilk void P_MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn P_MergeSort(C, A, n/2); spawn P_MergeSort(C+n/2, A+n/2, n-n/2); sync;

spawn P_Merge(B, C, C+n/2, n/2, n-n/2); } }

Parallel Merge Sort

Page 46: L ECTURE  2

T1(n) = 2 T1(n/2) + (n)

Work of Parallel Merge Sort

Work:— CASE 2= (n lg n)

cilk void P_MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn P_MergeSort(C, A, n/2); spawn P_MergeSort(C+n/2, A+n/2, n-n/2); sync;

spawn P_Merge(B, C, C+n/2, n/2, n-n/2); } }

Page 47: L ECTURE  2

Span of Parallel Merge Sort

T1(n) = ?T1(n/2) + (lg2n)Span:— CASE 2= (lg3n)

nlogba = nlog21 = 1 ) f (n) = (nlogba lg2n) .

cilk void P_MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn P_MergeSort(C, A, n/2); spawn P_MergeSort(C+n/2, A+n/2, n-n/2); sync;

spawn P_Merge(B, C, C+n/2, n/2, n-n/2); } }

Page 48: L ECTURE  2

Parallelism of Merge Sort

T1(n) = (n lg n)Work:

T1(n) = (lg3n)Span:

Parallelism: T1(n)T1(n)

= (n/lg2n)

Page 49: L ECTURE  2

LECTURE 2

• Matrix Multiplication

• Tableau Construction

• Recurrences (Review)

• Conclusion

• Merge Sort

Page 50: L ECTURE  2

Tableau Construction

A[i, j] = f ( A[i, j–1], A[i–1, j], A[i–1, j–1] ).Problem: Fill in an n £ n tableau A, where

Dynamic programming• Longest common

subsequence• Edit distance• Time warping

00 01 02 03 04 05 06 07

10 11 12 13 14 15 16 17

20 21 22 23 24 25 26 27

30 31 32 33 34 35 36 37

40 41 42 43 44 45 46 47

50 51 52 53 54 55 56 57

60 61 62 63 64 65 66 67

70 71 72 73 74 75 76 77Work: (n2).

Page 51: L ECTURE  2

n

n

spawn I;sync;spawn II;spawn III;sync;spawn IV;sync;

I II

III IV

Cilk code

Recursive Construction

Page 52: L ECTURE  2

n

n

Work: T1(n) = ?4T1(n/2) + (1)

spawn I;sync;spawn II;spawn III;sync;spawn IV;sync;

I II

III IV

Cilk code

Recursive Construction

= (n2) — CASE 1

Page 53: L ECTURE  2

Span: T1(n) = ?

n

n

spawn I;sync;spawn II;spawn III;sync;spawn IV;sync;

I II

III IV

Cilk code

Recursive Construction

3T1(n/2) + (1)= (nlg3) — CASE 1

Page 54: L ECTURE  2

Analysis of Tableau ConstructionWork: T1(n) = (n2)

Span: T1(n) = (nlg3)¼ (n1.58)

Parallelism: T1(n)T1(n)

¼ (n0.42)

Page 55: L ECTURE  2

n spawn I;sync;spawn II;spawn III;sync;spawn IV;spawn V;spawn VIsync;spawn VII;spawn VIII;sync;spawn IX;sync;

A More-Parallel Construction

I II

III

IV

V

VI

VII

VIII IX

n

Page 56: L ECTURE  2

n spawn I;sync;spawn II;spawn III;sync;spawn IV;spawn V;spawn VIsync;spawn VII;spawn VIII;sync;spawn IX;sync;

A More-Parallel Construction

I II

III

IV

V

VI

VII

VIII IX

n

Work: T1(n) = ?9T1(n/3) + (1)= (n2) — CASE 1

Page 57: L ECTURE  2

n spawn I;sync;spawn II;spawn III;sync;spawn IV;spawn V;spawn VIsync;spawn VII;spawn VIII;sync;spawn IX;sync;

A More-Parallel Construction

I II

III

IV

V

VI

VII

VIII IX

n

Span: T1(n) = ?5T1(n/3) + (1)= (nlog35) — CASE 1

Page 58: L ECTURE  2

Analysis of Revised ConstructionWork: T1(n) = (n2)

Span: T1(n) = (nlog35)¼ (n1.46)

Parallelism: T1(n)T1(n)

¼ (n0.54)

More parallel by a factor of(n0.54)/(n0.42) = (n0.12) .

Page 59: L ECTURE  2

LECTURE 2

• Matrix Multiplication

• Tableau Construction

• Recurrences (Review)

• Conclusion

• Merge Sort

Page 60: L ECTURE  2

Key Ideas•Cilk is simple: cilk, spawn, sync, SYNCHED

•Recurrences, recurrences, recurrences, …•Work & span• Work & span• Work & span• Work & span• Work & span• Work & span• Work & span• Work & span• Work & span• Work & span• Work & span• Work & span• Work & span• Work & span• Work & span

Page 61: L ECTURE  2

61

Palindrome• Propose a Cilk Palindrome solver.• What is the key idea?• What is the Algorithm?• What is the Span?• What is the Work? • What is the Parallelism?