L ECTURE 2
description
Transcript of L ECTURE 2
![Page 1: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/1.jpg)
1
LECTURE 2
• Matrix Multiplication
• Tableau Construction
• Recurrences (Review)
• Conclusion
• Merge Sort
![Page 2: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/2.jpg)
2
The Master MethodThe Master Method for solving recurrences applies to recurrences of the form
T(n) = a T(n/b) + f (n) , where a ¸ 1, b > 1, and f is asymptotically positive.
IDEA: Compare nlogba with f (n) .
*The unstated base case is T(n) = (1) for sufficiently small n.
*
![Page 3: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/3.jpg)
3
Master Method — CASE 1
nlogba À f (n)
Specifically, f (n) = O(nlogba – ) for some constant > 0.Solution: T(n) = (nlogba) .
T(n) = a T(n/b) + f (n)
![Page 4: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/4.jpg)
4
Master Method — CASE 2
Specifically, f (n) = (nlogba lgkn) for some constant k ¸ 0.Solution: T(n) = (nlogba lgk+1n) .
nlogba ¼ f (n)
T(n) = a T(n/b) + f (n)
![Page 5: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/5.jpg)
5
Master Method — CASE 3
Specifically, f (n) = (nlogba + ) for some constant > 0 and f (n) satisfies the regularity condition that a f (n/b) · c f (n) for some constant c < 1.Solution: T(n) = (f (n)) .
nlogba ¿ f (n)
T(n) = a T(n/b) + f (n)
![Page 6: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/6.jpg)
6
Master Method Summary
CASE 1: f (n) = O(nlogba – ), constant > 0 T(n) = (nlogba) .
CASE 2: f (n) = (nlogba lgkn), constant k 0 T(n) = (nlogba lgk+1n) .
CASE 3: f (n) = (nlogba + ), constant > 0, and regularity condition
T(n) = ( f (n)) .
T(n) = a T(n/b) + f (n)
![Page 7: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/7.jpg)
7
Master Method Quiz• T(n) = 4 T(n/2) + n
nlogba = n2 À n ) CASE 1: T(n) = (n2).• T(n) = 4 T(n/2) + n2
nlogba = n2 = n2 lg0n ) CASE 2: T(n) = (n2lg n).• T(n) = 4 T(n/2) + n3
nlogba = n2 ¿ n3 ) CASE 3: T(n) = (n3).• T(n) = 4 T(n/2) + n2/ lg n
Master method does not apply!
![Page 8: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/8.jpg)
8
LECTURE 2
• Matrix Multiplication
• Tableau Construction
• Recurrences (Review)
• Conclusion
• Merge Sort
![Page 9: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/9.jpg)
9
Square-Matrix Multiplicationc11 c12 c1n
c21 c22 c2n
cn1 cn2 cnn
a11 a12 a1n
a21 a22 a2n
an1 an2 ann
b11 b12 b1n
b21 b22 b2n
bn1 bn2 bnn
= £
C A B
cij = k = 1
n
aik bkj
Assume for simplicity that n = 2k.
![Page 10: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/10.jpg)
10
Recursive Matrix Multiplication
8 multiplications of (n/2) £ (n/2) matrices.1 addition of n £ n matrices.
Divide and conquer —
C11 C12
C21 C22
= £A11 A12
A21 A22
B11 B12
B21 B22
= +A11B11 A11B12
A21B11 A21B12
A12B21 A12B22
A22B21 A22B22
![Page 11: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/11.jpg)
11
cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return;}
Matrix Multiply in Pseudo-Cilk
C = A¢ BAbsence of type
declarations.
![Page 12: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/12.jpg)
12
cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return;}
C = A¢ B Coarsen base cases for efficiency.
Matrix Multiply in Pseudo-Cilk
![Page 13: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/13.jpg)
13
cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return;}
C = A¢ B
Submatrices are produced by pointer calculation, not copying of elements.
Also need a row-size argument for array indexing.
Matrix Multiply in Pseudo-Cilk
![Page 14: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/14.jpg)
14
cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return;}
C = A¢ B
cilk void Add(*C, *T, n) { h base case & partition matrices i spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); sync; return;}C = C + T
Matrix Multiply in Pseudo-Cilk
![Page 15: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/15.jpg)
15
A1(n) = ?4 A1(n/2) + (1)
cilk void Add(*C, *T, n) { h base case & partition matrices i spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); sync; return;}
Work of Matrix Addition
Work:
nlogba = nlog24 = n2 À (1) .
— CASE 1= (n2)
![Page 16: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/16.jpg)
16
cilk void Add(*C, *T, n) { h base case & partition matrices i spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); sync; return;}
cilk void Add(*C, *T, n) { h base case & partition matrices i spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); sync; return;}
A1(n) = ?
Span of Matrix Addition
A1(n/2) + (1)Span:
nlogba = nlog21 = 1 ) f (n) = (nlogba lg0n) .
— CASE 2= (lg n)
maximum
![Page 17: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/17.jpg)
17
M1(n) = ?
Work of Matrix Multiplication
8 M1(n/2) +A1(n) + (1)Work:
cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return;}
8
nlogba = nlog28 = n3 À (n2) .
= 8 M1(n/2) + (n2)= (n3) — CASE 1
![Page 18: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/18.jpg)
18
cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return;}
cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return;}
M1(n) = ?M1(n/2) + A1(n) + (1)
Span of Matrix Multiplication
Span:
nlogba = nlog21 = 1 ) f (n) = (nlogba lg1n) .
= M1(n/2) + (lg n)= (lg2 n) — CASE 2
8
![Page 19: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/19.jpg)
19
Parallelism of Matrix Multiply
M1(n) = (n3)Work:
M1(n) = (lg2n)Span:
Parallelism: M1(n)M1(n)
= (n3/lg2n)
For 1000 £ 1000 matrices, parallelism ¼ (103)3/102 = 107.
![Page 20: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/20.jpg)
20
cilk void Mult(*C, *A, *B, n) { h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return;}
Stack Temporariesfloat *T = Cilk_alloca(n*n*sizeof(float));
In hierarchical-memory machines (especially chip multiprocessors), memory accesses are so expensive that
minimizing storage often yields higher performance.
IDEA: Trade off parallelism for less storage.
![Page 21: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/21.jpg)
21
No-Temp Matrix Multiplicationcilk void MultA(*C, *A, *B, n) { // C = C + A * B h base case & partition matrices i spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); sync; spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); sync; return;}
Saves space, but at what expense?
![Page 22: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/22.jpg)
22
= (n3)
Work of No-Temp Multiply
M1(n) = ?8 M1(n/2) + (1)Work:— CASE 1
cilk void MultA(*C, *A, *B, n) { // C = C + A * B h base case & partition matrices i spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); sync; spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); sync; return;}
![Page 23: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/23.jpg)
23
cilk void MultA(*C, *A, *B, n) { // C = C + A * B h base case & partition matrices i spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); sync; spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); sync; return;}
cilk void MultA(*C, *A, *B, n) { // C = C + A * B h base case & partition matrices i spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); sync; spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); sync; return;}
= (n)M1(n) = ?
Span of No-Temp Multiply
Span:— CASE 1
2 M1(n/2) + (1)
maximum
maximum
![Page 24: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/24.jpg)
24
Parallelism of No-Temp MultiplyM1(n) = (n3)Work:
M1(n) = (n)Span:
Parallelism: M1(n)M1(n)
= (n2)
For 1000 £ 1000 matrices, parallelism ¼ (103)3/103 = 106.
Faster in practice!
![Page 25: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/25.jpg)
25
Testing SynchronizationCilk language feature: A programmer can check whether a Cilk procedure is “synched” (without actually performing a sync) by testing the pseudovariable SYNCHED:•SYNCHED = 0 ) some spawned children
might not have returned.•SYNCHED = 1 ) all spawned children have
definitely returned.
![Page 26: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/26.jpg)
26
Best of Both Worldscilk void Mult1(*C, *A, *B, n) {// multiply & store h base case & partition matrices i spawn Mult1(C11,A11,B11,n/2); // multiply & store spawn Mult1(C12,A11,B12,n/2); spawn Mult1(C22,A21,B12,n/2); spawn Mult1(C21,A21,B11,n/2); if (SYNCHED) { spawn MultA1(C11,A12,B21,n/2); // multiply & add spawn MultA1(C12,A12,B22,n/2); spawn MultA1(C22,A22,B22,n/2); spawn MultA1(C21,A22,B21,n/2); } else { float *T = Cilk_alloca(n*n*sizeof(float)); spawn Mult1(T11,A12,B21,n/2); // multiply & store spawn Mult1(T12,A12,B22,n/2); spawn Mult1(T22,A22,B22,n/2); spawn Mult1(T21,A22,B21,n/2); sync; spawn Add(C,T,n); // C = C + T } sync; return;}
This code is just as parallel as the original, but it only uses more space if runtime parallelism actually exists.
![Page 27: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/27.jpg)
27
Ordinary Matrix Multiplication
cij = k = 1
n
aik bkj
IDEA: Spawn n2 inner products in parallel. Compute each inner product in parallel.
Work: (n3)Span: (lg n)Parallelism: (n3/lg n)
BUT, this algorithm exhibits poor locality and does not exploit the cache hierarchy of modern microprocessors, especially CMP’s.
![Page 28: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/28.jpg)
28
LECTURE 2
• Matrix Multiplication
• Tableau Construction
• Recurrences (Review)
• Conclusion
• Merge Sort
![Page 29: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/29.jpg)
29
3 12 19 46
4 14 21 23
193
4
12
14 21 23
46
Merging Two Sorted Arraysvoid Merge(int *C, int *A, int *B, int na, int nb) { while (na>0 && nb>0) { if (*A <= *B) { *C++ = *A++; na--; } else { *C++ = *B++; nb--; } } while (na>0) { *C++ = *A++; na--; } while (nb>0) { *C++ = *B++; nb--; }}
Time to merge n elements = ? (n).
![Page 30: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/30.jpg)
30
cilk void MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn MergeSort(C, A, n/2); spawn MergeSort(C+n/2, A+n/2, n-n/2); sync; Merge(B, C, C+n/2, n/2, n-n/2); } }
Merge Sort
144619 3 12 33 4 21
4 3319 46 143 12 21
46 333 12 19 4 14 21
46143 4 12 19 21 33
merge
merge
merge
![Page 31: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/31.jpg)
31
= (n lg n)T1(n) = ?2 T1(n/2) + (n)
Work of Merge Sort
Work:— CASE 2
nlogba = nlog22 = n ) f (n) = (nlogba lg0n) .
cilk void MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn MergeSort(C, A, n/2); spawn MergeSort(C+n/2, A+n/2, n-n/2); sync; Merge(B, C, C+n/2, n/2, n-n/2); } }
![Page 32: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/32.jpg)
32
T1(n) = ?T1(n/2) + (n)
Span of Merge Sort
Span:— CASE 3= (n)
nlogba = nlog21 = 1 ¿ (n) .
cilk void MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn MergeSort(C, A, n/2); spawn MergeSort(C+n/2, A+n/2, n-n/2); sync; Merge(B, C, C+n/2, n/2, n-n/2); } }
![Page 33: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/33.jpg)
33
Parallelism of Merge SortT1(n) = (n lg n)Work:
T1(n) = (n)Span:
Parallelism: T1(n)T1(n)
= (lg n)
We need to parallelize the merge!
![Page 34: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/34.jpg)
34
B
A0 na
0 nbna ¸ nb
Parallel Merge
· A[na/2] ¸ A[na/2]
Binary search
j j+1
Recursivemerge
Recursivemerge
na/2
· A[na/2] ¸ A[na/2]
KEY IDEA: If the total number of elements to be merged in the two arrays is n = na + nb, the total number of elements in the larger of the two recursive merges is at most ? (3/4) n .
![Page 35: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/35.jpg)
35
Parallel Mergecilk void P_Merge(int *C, int *A, int *B, int na, int nb) { if (na < nb) { spawn P_Merge(C, B, A, nb, na); } else if (na==1) { if (nb == 0) { C[0] = A[0]; } else { C[0] = (A[0]<B[0]) ? A[0] : B[0]; /* minimum */ C[1] = (A[0]<B[0]) ? B[0] : A[0]; /* maximum */ } } else { int ma = na/2; int mb = BinarySearch(A[ma], B, nb); spawn P_Merge(C, A, B, ma, mb); spawn P_Merge(C+ma+mb, A+ma, B+mb, na-ma, nb-mb); sync; }}
Coarsen base cases for efficiency.
![Page 36: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/36.jpg)
36
T1(n) = ?T1(3n/4) + (lg n)
Span of P_Merge
Span:— CASE 2= (lg2n)
cilk void P_Merge(int *C, int *A, int *B, int na, int nb) { if (na < nb) { } else { int ma = na/2; int mb = BinarySearch(A[ma], B, nb); spawn P_Merge(C, A, B, ma, mb); spawn P_Merge(C+ma+mb, A+ma, B+mb, na-ma, nb-mb); sync; }}
nlogba = nlog4/31 = 1 ) f (n) = (nlogba lg1n) .
![Page 37: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/37.jpg)
37
T1(n) = ?T1(n) + T1((1–)n) + (lg n),where 1/4 · · 3/4 .
Work of P_Merge
Work:
cilk void P_Merge(int *C, int *A, int *B, int na, int nb) { if (na < nb) { } else { int ma = na/2; int mb = BinarySearch(A[ma], B, nb); spawn P_Merge(C, A, B, ma, mb); spawn P_Merge(C+ma+mb, A+ma, B+mb, na-ma, nb-mb); sync; }}
CLAIM: T1(n) = (n) .
![Page 38: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/38.jpg)
38
Analysis of Work Recurrence
Substitution method: Inductive hypothesis is T1(k) · c1k – c2lg k, where c1,c2 > 0. Prove that the relation holds, and solve for c1 and c2.
T1(n) = T1(n) + T1((1–)n) + (lg n), where 1/4 · · 3/4 .
T1(n) = T1(n) + T1((1–)n) + (lg n)· c1(n) – c2lg(n)
+ c1((1–)n) – c2lg((1–)n) + (lg n)
![Page 39: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/39.jpg)
39
Analysis of Work Recurrence
T1(n) = T1(n) + T1((1–)n) + (lg n)· c1(n) – c2lg(n)
+ c1(1–)n – c2lg((1–)n) + (lg n)
T1(n) = T1(n) + T1((1–)n) + (lg n), where 1/4 · · 3/4 .
![Page 40: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/40.jpg)
40
T1(n) = T1(n) + T1((1–)n) + (lg n)· c1(n) – c2lg(n)
+ c1(1–)n – c2lg((1–)n) + (lg n)
Analysis of Work Recurrence
· c1n – c2lg(n) – c2lg((1–)n) + (lg n)· c1n – c2 ( lg((1–)) + 2 lg n ) + (lg n)· c1n – c2 lg n
– (c2(lg n + lg((1–))) – (lg n))· c1n – c2 lg n
by choosing c1 and c2 large enough.
T1(n) = T1(n) + T1((1–)n) + (lg n), where 1/4 · · 3/4 .
![Page 41: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/41.jpg)
41
Parallelism of P_Merge
T1(n) = (n)Work:
T1(n) = (lg2n)Span:
Parallelism: T1(n)T1(n)
= (n/lg2n)
![Page 42: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/42.jpg)
42
cilk void P_MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn P_MergeSort(C, A, n/2); spawn P_MergeSort(C+n/2, A+n/2, n-n/2); sync;
spawn P_Merge(B, C, C+n/2, n/2, n-n/2); } }
Parallel Merge Sort
![Page 43: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/43.jpg)
43
T1(n) = 2 T1(n/2) + (n)
Work of Parallel Merge Sort
Work:— CASE 2= (n lg n)
cilk void P_MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn P_MergeSort(C, A, n/2); spawn P_MergeSort(C+n/2, A+n/2, n-n/2); sync;
spawn P_Merge(B, C, C+n/2, n/2, n-n/2); } }
![Page 44: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/44.jpg)
44
Span of Parallel Merge Sort
T1(n) = ?T1(n/2) + (lg2n)Span:— CASE 2= (lg3n)
nlogba = nlog21 = 1 ) f (n) = (nlogba lg2n) .
cilk void P_MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn P_MergeSort(C, A, n/2); spawn P_MergeSort(C+n/2, A+n/2, n-n/2); sync;
spawn P_Merge(B, C, C+n/2, n/2, n-n/2); } }
![Page 45: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/45.jpg)
45
Parallelism of Merge Sort
T1(n) = (n lg n)Work:
T1(n) = (lg3n)Span:
Parallelism: T1(n)T1(n)
= (n/lg2n)
![Page 46: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/46.jpg)
46
LECTURE 2
• Matrix Multiplication
• Tableau Construction
• Recurrences (Review)
• Conclusion
• Merge Sort
![Page 47: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/47.jpg)
47
Tableau Construction
A[i, j] = f ( A[i, j–1], A[i–1, j], A[i–1, j–1] ).Problem: Fill in an n £ n tableau A, where
Dynamic programming• Longest common
subsequence• Edit distance• Time warping
00 01 02 03 04 05 06 07
10 11 12 13 14 15 16 17
20 21 22 23 24 25 26 27
30 31 32 33 34 35 36 37
40 41 42 43 44 45 46 47
50 51 52 53 54 55 56 57
60 61 62 63 64 65 66 67
70 71 72 73 74 75 76 77Work: (n2).
![Page 48: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/48.jpg)
48
n
n
spawn I;sync;spawn II;spawn III;sync;spawn IV;sync;
I II
III IV
Cilk code
Recursive Construction
![Page 49: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/49.jpg)
49
n
n
Work: T1(n) = ?4T1(n/2) + (1)
spawn I;sync;spawn II;spawn III;sync;spawn IV;sync;
I II
III IV
Cilk code
Recursive Construction
= (n2) — CASE 1
![Page 50: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/50.jpg)
50
Span: T1(n) = ?
n
n
spawn I;sync;spawn II;spawn III;sync;spawn IV;sync;
I II
III IV
Cilk code
Recursive Construction
3T1(n/2) + (1)= (nlg3) — CASE 1
![Page 51: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/51.jpg)
51
Analysis of Tableau ConstructionWork: T1(n) = (n2)
Span: T1(n) = (nlg3)¼ (n1.58)
Parallelism: T1(n)T1(n)
¼ (n0.42)
![Page 52: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/52.jpg)
52
n spawn I;sync;spawn II;spawn III;sync;spawn IV;spawn V;spawn VIsync;spawn VII;spawn VIII;sync;spawn IX;sync;
A More-Parallel Construction
I II
III
IV
V
VI
VII
VIII IX
n
![Page 53: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/53.jpg)
53
n spawn I;sync;spawn II;spawn III;sync;spawn IV;spawn V;spawn VIsync;spawn VII;spawn VIII;sync;spawn IX;sync;
A More-Parallel Construction
I II
III
IV
V
VI
VII
VIII IX
n
Work: T1(n) = ?9T1(n/3) + (1)= (n2) — CASE 1
![Page 54: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/54.jpg)
54
n spawn I;sync;spawn II;spawn III;sync;spawn IV;spawn V;spawn VIsync;spawn VII;spawn VIII;sync;spawn IX;sync;
A More-Parallel Construction
I II
III
IV
V
VI
VII
VIII IX
n
Span: T1(n) = ?5T1(n/3) + (1)= (nlog35) — CASE 1
![Page 55: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/55.jpg)
55
Analysis of Revised ConstructionWork: T1(n) = (n2)
Span: T1(n) = (nlog35)¼ (n1.46)
Parallelism: T1(n)T1(n)
¼ (n0.54)
More parallel by a factor of(n0.54)/(n0.42) = (n0.12) .
![Page 56: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/56.jpg)
56
LECTURE 2
• Matrix Multiplication
• Tableau Construction
• Recurrences (Review)
• Conclusion
• Merge Sort
![Page 57: L ECTURE 2](https://reader036.fdocuments.in/reader036/viewer/2022070502/56814aa0550346895db7b250/html5/thumbnails/57.jpg)
57
Key Ideas• Cilk is simple: cilk, spawn, sync, SYNCHED
• Recurrences, recurrences, recurrences, …• Work & span• Work & span• Work & span• Work & span• Work & span• Work & span• Work & span• Work & span• Work & span• Work & span• Work & span• Work & span• Work & span• Work & span• Work & span