Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory...
-
Upload
kerry-rhoda-allison -
Category
Documents
-
view
216 -
download
0
Transcript of Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory...
![Page 1: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/1.jpg)
Automatic Parallelization of Divide and Conquer Algorithms
Radu Rugina and Martin RinardLaboratory for Computer Science
Massachusetts Institute of Technology
![Page 2: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/2.jpg)
Outline
• Example• Information required to parallelize
divide and conquer algorithms• How compiler extracts parallelism
• Key technique: constraint systems• Results• Related work• Conclusion
![Page 3: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/3.jpg)
Example - Divide and Conquer Sort
47 6 1 53 8 2
![Page 4: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/4.jpg)
Example - Divide and Conquer Sort
47 6 1 53 8 2
8 2536 147 Divide
![Page 5: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/5.jpg)
Example - Divide and Conquer Sort
47 6 1 53 8 2
8 2536 147
2 8531 674
Divide
Conquer
![Page 6: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/6.jpg)
Example - Divide and Conquer Sort
47 6 1 53 8 2
8 2536 147
2 8531 674
Divide
Conquer
32 5 841 6 7Combine
![Page 7: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/7.jpg)
Example - Divide and Conquer Sort
47 6 1 53 8 2
8 2536 147
2 8531 674
Divide
Conquer
32 5 841 6 7
21 3 4 65 7 8
Combine
![Page 8: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/8.jpg)
Divide and Conquer Algorithms
• Lots of Generated Concurrency• Solve Subproblems in Parallel
![Page 9: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/9.jpg)
Divide and Conquer Algorithms
• Lots of Generated Concurrency• Solve Subproblems in Parallel
![Page 10: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/10.jpg)
Divide and Conquer Algorithms• Lots of Recursively Generated Concurrency
• Recursively Solve Subproblems in Parallel
![Page 11: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/11.jpg)
Divide and Conquer Algorithms• Lots of Recursively Generated Concurrency
• Recursively Solve Subproblems in Parallel• Combine Results in Parallel
![Page 12: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/12.jpg)
Divide and Conquer Algorithms
• Lots of Recursively Generated Concurrency• Recursively Solve Subproblems in
Parallel• Combine Results in Parallel
• Good Cache Performance• Problems Naturally Scale to Fit in
Cache• No Cache Size Constants in Code
![Page 13: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/13.jpg)
Divide and Conquer Algorithms• Lots of Recursively Generated Concurrency
• Recursively Solve Subproblems in Parallel• Combine Results in Parallel
• Good Cache Performance• Problems Naturally Scale to Fit in Cache• No Cache Size Constants in Code
• Lots of Programs• Sort Programs• Dense Matrix Programs
![Page 14: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/14.jpg)
“Sort n Items in d, Using t as Temporary Storage”
void sort(int *d, int *t, int n)if (n > CUTOFF) {
sort(d,t,n/4); sort(d+n/4,t+n/4,n/4);sort(d+n/2,t+n/2,n/4);sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));merge(d,d+n/4,d+n/2,t);merge(d+n/2,d+3*(n/4),d+n,t+n/2);merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
![Page 15: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/15.jpg)
“Recursively Sort Four Quarters of d”
void sort(int *d, int *t, int n)if (n > CUTOFF) {
sort(d,t,n/4); sort(d+n/4,t+n/4,n/4);sort(d+n/2,t+n/2,n/4);sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));merge(d,d+n/4,d+n/2,t);merge(d+n/2,d+3*(n/4),d+n,t+n/2);merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
Subproblems Identified Using Pointers Into
Middle of Array
47 6 1 53 8 2d
d+n/4d+n/2
d+3*(n/4)
![Page 16: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/16.jpg)
“Recursively Sort Four Quarters of d”
void sort(int *d, int *t, int n)if (n > CUTOFF) {
sort(d,t,n/4); sort(d+n/4,t+n/4,n/4);sort(d+n/2,t+n/2,n/4);sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));merge(d,d+n/4,d+n/2,t);merge(d+n/2,d+3*(n/4),d+n,t+n/2);merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
Sorted Results Written Back Into
Input Array
74 1 6 53 2 8d
d+n/4d+n/2
d+3*(n/4)
![Page 17: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/17.jpg)
“Merge Sorted Quarters of d Into Halves of t”
void sort(int *d, int *t, int n)if (n > CUTOFF) {
sort(d,t,n/4); sort(d+n/4,t+n/4,n/4);sort(d+n/2,t+n/2,n/4);sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));merge(d,d+n/4,d+n/2,t);merge(d+n/2,d+3*(n/4),d+n,t+n/2);merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n); 74 1 6 53 2 8
41 6 7 32 5 8
d
tt+n/2
![Page 18: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/18.jpg)
“Merge Sorted Halves of t Back Into d”
void sort(int *d, int *t, int n)if (n > CUTOFF) {
sort(d,t,n/4); sort(d+n/4,t+n/4,n/4);sort(d+n/2,t+n/2,n/4);sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));merge(d,d+n/4,d+n/2,t);merge(d+n/2,d+3*(n/4),d+n,t+n/2);merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n); 41 6 7 32 5 8t
t+n/2
21 3 4 65 7 8d
![Page 19: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/19.jpg)
“Use a Simple Sort for Small Problem Sizes”
void sort(int *d, int *t, int n)if (n > CUTOFF) {
sort(d,t,n/4); sort(d+n/4,t+n/4,n/4);sort(d+n/2,t+n/2,n/4);sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));merge(d,d+n/4,d+n/2,t);merge(d+n/2,d+3*(n/4),d+n,t+n/2);merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n); 47 6 1 53 8 2
dd+n
![Page 20: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/20.jpg)
“Use a Simple Sort for Small Problem Sizes”
void sort(int *d, int *t, int n)if (n > CUTOFF) {
sort(d,t,n/4); sort(d+n/4,t+n/4,n/4);sort(d+n/2,t+n/2,n/4);sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));merge(d,d+n/4,d+n/2,t);merge(d+n/2,d+3*(n/4),d+n,t+n/2);merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n); 47 1 6 53 8 2
dd+n
![Page 21: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/21.jpg)
Parallel Execution
void sort(int *d, int *t, int n)if (n > CUTOFF) {
spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4);spawn sort(d+n/2,t+n/2,n/4);spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));sync;spawn merge(d,d+n/4,d+n/2,t);spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2);sync;merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
![Page 22: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/22.jpg)
What Do You Need to Know to Exploit this Form of Parallelism?
![Page 23: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/23.jpg)
Calls to sort access disjoint parts of d and tTogether, calls access [d,d+n-1] and [t,t+n-1]
sort(d,t,n/4);
sort(d+n/4,t+n/4,n/4);
sort(d+n/2,t+n/2,n/4);
sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));
What Do You Need to Know to Exploit this Parallelism?
dt
dt
dt
dt
d+n-1t+n-1
d+n-1t+n-1
d+n-1t+n-1
d+n-1t+n-1
![Page 24: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/24.jpg)
First two calls to merge access disjoint parts of d,t
Together, calls access [d,d+n-1] and [t,t+n-1]
merge(d,d+n/4,d+n/2,t);
merge(d+n/2,d+3*(n/4),d+n,t+n/2);
merge(t,t+n/2,t+n,d);
What Do You Need to Know to Exploit this Parallelism?
dt
dt
dt
d+n-1t+n-1
d+n-1t+n-1
d+n-1t+n-1
![Page 25: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/25.jpg)
Calls to insertionSort access [d,d+n-1]
insertionSort(d,d+n);
What Do You Need to Know to Exploit this Parallelism?
dt
d+n-1t+n-1
![Page 26: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/26.jpg)
What Do You Need to Know to Exploit this Parallelism?
The Regions of Memory Accessed by Complete
Executions of Procedures
![Page 27: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/27.jpg)
How Hard Is it to Extract these Regions?
![Page 28: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/28.jpg)
How Hard Is it to Extract these Regions?
Challenging
![Page 29: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/29.jpg)
How Hard Is it to Extract these Regions?
insertionSort(int *l, int *h) {int *p, *q, k;for (p = l+1; p < h; p++) { for (k = *p, q = p-1; l <= q && k < *q; q--)*(q+1) = *q;*(q+1) = k;}
}
Not Immediately Obvious That insertionSort(l,h) Accesses [l,h-1]
![Page 30: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/30.jpg)
merge(int *l1, int*m, int *h2, int *d) {int *h1 = m; int *l2 = m;while ((l1 < h1) && (l2 < h2))
if (*l1 < *l2) *d++ = *l1++;else *d++ = *l2++;
while (l1 < h1) *d++ = *l1++;while (l2 < h2) *d++ = *l2++;
}
Not Immediately Obvious That merge(l,m,h,d) Accesses [l,h-1] and [d,d+(h-l)-1]
How Hard Is it to Extract these Regions?
![Page 31: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/31.jpg)
Issues
• Pervasive Use of Pointers• Pointers into Middle of Arrays• Pointer Arithmetic• Pointer Comparison
• Multiple Procedures• sort(int *d, int *t, n)• insertionSort(int *l, int *h)• merge(int *l, int *m, int *h, int *t)
• Recursion
![Page 32: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/32.jpg)
How The Compiler Does It
![Page 33: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/33.jpg)
Structure of Compiler
Pointer Analysis
Bounds Analysis
Region Analysis
Parallelization
Disambiguate References at Granularity of Arrays
Symbolic Upper and LowerBounds for Each Memory Access in Each Procedure
Symbolic Regions AccessedBy Execution of Each Procedure
Independent Procedure CallsThat Can Execute in Parallel
![Page 34: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/34.jpg)
Example
f(char *p, int n) if (n > CUTOFF) {
f(p, n/2); initialize first half
f(p+n/2, n/2); initialize second half
} else {base case: initialize small array
int i = 0;while (i < n) { *(p+i) = 0; i++; }
}
![Page 35: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/35.jpg)
Bounds Analysis
• For each variable at each program point, derive upper and lower bounds for value
• Bounds are symbolic expressions• symbolic variables in expressions
represent initial values of parameters• linear combinations of these variables• multivariate polynomials
![Page 36: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/36.jpg)
Bounds Analysis
What are upper and lower bounds for region accessed by while loop in base
case?
int i = 0;while (i < n) { *(p+i) = 0; i++; }
![Page 37: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/37.jpg)
Bounds Analysis, Step 1Build control flow graph
i = 0
i < n
*(p+i) = 0;i = i +1
![Page 38: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/38.jpg)
Bounds Analysis, Step 2Number different versions of variables
i0 = 0
i1 < n
*(p+i2) = 0;i3 = i2 +1
![Page 39: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/39.jpg)
Bounds Analysis, Step 3Set up constraints for lower bounds
i0 = 0
i1 < n
*(p+i2) = 0;i3 = i2 +1
l(i0) <= 0
l(i1) <= l(i0)l(i1) <= l(i3)
l(i2) <= l(i1)l(i3) <= l(i2)+1
![Page 40: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/40.jpg)
Bounds Analysis, Step 3Set up constraints for lower bounds
i0 = 0
i1 < n
*(p+i2) = 0;i3 = i2 +1
l(i0) <= 0
l(i1) <= l(i0)l(i1) <= l(i3)
l(i2) <= l(i1)l(i3) <= l(i2)+1
![Page 41: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/41.jpg)
Bounds Analysis, Step 3Set up constraints for lower bounds
i0 = 0
i1 < n
*(p+i2) = 0;i3 = i2 +1
l(i0) <= 0
l(i1) <= l(i0)l(i1) <= l(i3)
l(i2) <= l(i1)l(i3) <= l(i2)+1
![Page 42: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/42.jpg)
Bounds Analysis, Step 4Set up constraints for upper bounds
i0 = 0
i1 < n
*(p+i2) = 0;i3 = i2 +1
l(i0) <= 0
l(i1) <= l(i0)l(i1) <= l(i3)
l(i2) <= l(i1)l(i3) <= l(i2)+1
0 <= u(i0)
u(i0) <= u(i1)u(i3) <= u(i1)
min(u(i1),n-1) <= u(i2)u(i2)+1 <= u(i3)
![Page 43: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/43.jpg)
Bounds Analysis, Step 4Set up constraints for upper bounds
i0 = 0
i1 < n
*(p+i2) = 0;i3 = i2 +1
l(i0) <= 0
l(i1) <= l(i0)l(i1) <= l(i3)
l(i2) <= l(i1)l(i3) <= l(i2)+1
0 <= u(i0)
u(i0) <= u(i1)u(i3) <= u(i1)
min(u(i1),n-1) <= u(i2)u(i2)+1 <= u(i3)
![Page 44: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/44.jpg)
Bounds Analysis, Step 4Set up constraints for upper bounds
i0 = 0
i1 < n
*(p+i2) = 0;i3 = i2 +1
l(i0) <= 0
l(i1) <= l(i0)l(i1) <= l(i3)
l(i2) <= l(i1)l(i3) <= l(i2)+1
0 <= u(i0)
u(i0) <= u(i1)u(i3) <= u(i1)
n-1 <= u(i2)u(i2)+1 <= u(i3)
![Page 45: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/45.jpg)
Bounds Analysis, Step 5Generate symbolic expressions for
boundsGoal: express bounds in terms of
parametersl(i0) = c1p + c2n + c3
l(i1) = c4p + c5n + c6
l(i2) = c7p + c8n + c9
l(i3) = c10p + c11n + c12
u(i0) = c13p + c14n + c15
u(i1) = c16p + c17n + c18
u(i2) = c19p + c20n + c21
u(i3) = c22p + c23n + c24
![Page 46: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/46.jpg)
c1p + c2n + c3 <= 0
c4p + c5n + c6 <= c1p + c2n + c3
c4p + c5n + c6 <= c10p + c11n + c12
c7p + c8n + c9 <= c4p + c5n + c6
c10p + c11n + c12 <= c7p + c8n + c9+10 <= c13p + c14n + c15
c13p + c14n + c15 <= c16p + c17n + c18
c22p + c23n + c24 <= c16p + c17n + c18
n-1 <= c19p + c20n + c21
c19p + c20n + c21+1 <= c22p + c23n + c24
Bounds Analysis, Step 6Substitute expressions into constraints
![Page 47: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/47.jpg)
Goal
Solve Symbolic Constraint System
find values for constraint variables c1, ..., c24 that satisfy the inequality constraints
Maximize Lower Bounds
Minimize Upper Bounds
![Page 48: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/48.jpg)
Bounds Analysis, Step 7Apply expression ordering principle
c1p + c2n + c3 <= c4p + c5n + c6
If
c1 <= c4, c2 <= c5, and c3 <= c6
![Page 49: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/49.jpg)
Bounds Analysis, Step 7Apply expression ordering principle
Generate a linear program
Objective Function:max (c1 + ••• + c12) - (c13 + ••• + c24)
c1 <= 0 c2 <= 0 c3 <= 0
c4 <= c1 c5 <= c2 c6 <= c3
c4 <= c10 c5 <= c11 c6 <= c12
c7 <= c4 c8 <= c5 c9 <= c6
c10 <= c7 c11 <= c8 c12 <= c9+1
0 <= c13 0 <= c14 0 <= c15
c13 <= c16 c14 <= c17 c15 <= c18
c22 <= c16 c23 <= c17 c24 <= c18
0 <= c19 1 <= c20 -1 <= c21
c19 <= c22 c20 <= c23 c21+1 <= c24
lower bounds upper bounds
![Page 50: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/50.jpg)
Bounds Analysis, Step 8Solve linear program to extract bounds
l(i0) = 0
l(i1) = 0
l(i2) = 0
l(i3) = 0
u(i0) = 0
u(i1) = n
u(i2) = n-1
u(i3) = n
i0 = 0
i1 < n
*(p+i2) = 0;i3 = i2 +1
![Page 51: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/51.jpg)
Region Analysis
Goal: Compute Accessed Regions of Memory
• Intra-Procedural• Use bounds at each load or store• Compute accessed region
• Inter-Procedural• Use intra-procedural results• Set up another constraint system• Solve to find regions accessed by entire
execution of the procedure
![Page 52: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/52.jpg)
Basic Principle of Inter-Procedural Region Analysis
• For each procedure• Generate symbolic expressions for
upper and lower bounds of accessed regions
• Constraint System• Accessed regions include regions
accessed by statements in procedure• Accessed regions include regions
accessed by invoked procedures
![Page 53: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/53.jpg)
Inter-Procedural Constraints in Example
f(char *p, int n) if (n > CUTOFF) {
f(p, n/2);
f(p+n/2, n/2);} else {
int i = 0;while (i < n) { *(p+i) = 0; i++; }
}
l(f,p,n) <= l(f,p,n/2)u(f,p,n) <= u(f,p,n/2)
l(f,p,n) <= l(f,p+n/2,n/2)u(f,p,n) <= u(f,p+n/2,n/2)
l(f,p,n) <= pu(f,p,n) <= p+n-1
![Page 54: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/54.jpg)
Derive Constraint System• Generate symbolic expressions
• l(f,p,n) = C1p + C2n + C3
• u(f,p,n) = C4p + C5n + C6
• Build constraint system
• C1p + C2n + C3 <= p
• C4p + C5n + C6 <= p + n -1
• C1p + C2n + C3 <= C1p + C2(n/2) + C3
• C4p + C5n + C6 <= C4p + C5(n/2) + C6
• C1p + C2n + C3 <= C1(p+n/2) + C2(n/2) + C3
• C4p + C5n + C6 <= C4(p+n/2) + C5(n/2) + C6
![Page 55: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/55.jpg)
Solve Constraint System
• Simplify Constraint System
• C1p + C2n + C3 <= p
• C4p + C5n + C6 <= p + n -1
• C2n <= C2(n/2)
• C5n <= C5(n/2)
• C2(n/2) <= C1(n/2)
• C5(n/2) <= C4(n/2)
• Generate and Solve Linear Program• l(f,p,n) = p• u(f,p,n) = p+n-1
![Page 56: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/56.jpg)
Parallelization
• Dependence Testing of Two Calls• Do accessed regions intersect?• Based on comparing upper and lower
bounds of accessed regions• Comparison done using expression
ordering principle• Parallelization
• Find sequences of independent calls• Execute independent calls in parallel
![Page 57: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/57.jpg)
Details
• Inter-procedural positivity analysis• Verify that variables are positive• Required for correctness of expression
ordering principle• Correlation Analysis• Integer Division
• Basic Idea : (n-1)/2 <= n/2 <= n/2
• Generalized : (n-m+1)/m <= n/m <= n/m
• Linear System Decomposition
![Page 58: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/58.jpg)
Experimental Results
• Implementation - SUIF, lp_solve, Cilk
0
2
4
6
8
0 2 4 6 8
0
2
4
6
8
0 2 4 6 8
Speedup for SortSpeedup for Matrix Multiply
Thanks: Darko Marinov, NateKushman, Don Dailey
![Page 59: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/59.jpg)
Related Work
• Shape Analysis • Chase, Wegman, Zadek (PLDI 90)• Ghiya, Hendren (POPL 96)• Sagiv, Reps, Wilhelm (TOPLAS 98)
• Commutativity Analysis• Rinard and Diniz (PLDI 96)
• Predicated Dataflow Analysis• Moon, Hall, Murphy (ICS 98)
![Page 60: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/60.jpg)
Related Work
• Array Region Analysis • Triolet, Irigoin and Feautrier (PLDI 86)• Havlak and Kennedy (IEEE TPDS 91)• Hall, Amarasinghe, Murphy, Liao and
Lam (SC 95)• Gu, Li and Lee (PPoPP 97)
• Symbolic Analysis of Loop Variables• Blume and Eigenmann (IPPS 95)• Haghigat and Polychronopoulos (LCPC
93)
![Page 61: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/61.jpg)
Future
• Static Race Detection for Explicitly Parallel Programs
• Static Elimination of Array Bounds Checks
• Static Pointer Validation Checks • Result:
• Safety Guarantees• No Efficiency Compromises
![Page 62: Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.](https://reader036.fdocuments.in/reader036/viewer/2022062407/56649cef5503460f949bdacb/html5/thumbnails/62.jpg)
Context
• Mainstream Parallelizing Compilers• Loop Nests, Dense Matrices• Affine Access Functions• Key Problem:Solving Diophantine Equations
• Compilers for Divide and Conquer Algorithms• Recursion, Dense Arrays (dynamic)• Pointers, Pointer Arithmetic• Key Problems: Pointer Analysis, Symbolic
Region Analysis, Solving Linear Programs