High-Level Loop Transformations and Polyhedral Compilation

287
May 30, 2017 1 / 82 High-Level Loop Transformations and Polyhedral Compilation Sven Verdoolaege Polly Labs and KU Leuven (affiliated researcher) May 30, 2017

Transcript of High-Level Loop Transformations and Polyhedral Compilation

Page 1: High-Level Loop Transformations and Polyhedral Compilation

May 30, 2017 1 / 82

High-Level Loop Transformations andPolyhedral Compilation

Sven Verdoolaege

Polly Labs and KU Leuven (affiliated researcher)

May 30, 2017

Page 2: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations May 30, 2017 3 / 82

Outline1 Loop Transformations

Loop DistributionLoop FusionLoop Tiling

2 Polyhedral CompilationIntroductionPolyhedral ModelSchedulesOperationsSoftware

3 PPCGOverviewModel ExtractionDependence AnalysisSchedulingDevice Mapping

Page 3: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Distribution May 30, 2017 4 / 82

Loop DistributionL: for (int i = 1; i < 100; ++i) {

A[i] = f(i);B[i] = A[i] + A[i - 1];

}

Can this loop be parallelized?

Requirement:writes of iteration do not conflict with reads/writes of other iterationLr1s: W pAr1sq RpAr1sq RpAr0sq W pBr1sqLr2s: W pAr2sq RpAr2sq RpAr1sq W pBr2sq

Loop distribution

changes meaning!

L1: for (int i = 1; i < 100; ++i)A[i] = f(i);

L2: for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i - 1];

Page 4: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Distribution May 30, 2017 4 / 82

Loop DistributionL: for (int i = 1; i < 100; ++i) {

A[i] = f(i);B[i] = A[i] + A[i - 1];

}

Can this loop be parallelized?

Requirement:writes of iteration do not conflict with reads/writes of other iteration

Lr1s: W pAr1sq RpAr1sq RpAr0sq W pBr1sqLr2s: W pAr2sq RpAr2sq RpAr1sq W pBr2sq

Loop distribution

changes meaning!

L1: for (int i = 1; i < 100; ++i)A[i] = f(i);

L2: for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i - 1];

Page 5: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Distribution May 30, 2017 4 / 82

Loop DistributionL: for (int i = 1; i < 100; ++i) {

A[i] = f(i);B[i] = A[i] + A[i - 1];

}

Can this loop be parallelized?

Requirement:writes of iteration do not conflict with reads/writes of other iterationLr1s: W pAr1sq RpAr1sq RpAr0sq W pBr1sqLr2s: W pAr2sq RpAr2sq RpAr1sq W pBr2sq

Loop distribution

changes meaning!

L1: for (int i = 1; i < 100; ++i)A[i] = f(i);

L2: for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i - 1];

Page 6: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Distribution May 30, 2017 4 / 82

Loop DistributionL: for (int i = 1; i < 100; ++i) {

A[i] = f(i);B[i] = A[i] + A[i - 1];

}

Can this loop be parallelized?

Requirement:writes of iteration do not conflict with reads/writes of other iterationLr1s: W pAr1sq RpAr1sq RpAr0sq W pBr1sqLr2s: W pAr2sq RpAr2sq RpAr1sq W pBr2sq

Loop distribution

changes meaning!

L1: for (int i = 1; i < 100; ++i)A[i] = f(i);

L2: for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i - 1];

Page 7: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Distribution May 30, 2017 4 / 82

Loop DistributionL: for (int i = 1; i < 100; ++i) {

A[i] = f(i);B[i] = A[i] + A[i - 1];

}

Can this loop be parallelized?

Requirement:writes of iteration do not conflict with reads/writes of other iterationLr1s: W pAr1sq RpAr1sq RpAr0sq W pBr1sqLr2s: W pAr2sq RpAr2sq RpAr1sq W pBr2sq

Loop distribution

changes meaning!

L1: for (int i = 1; i < 100; ++i)A[i] = f(i);

L2: for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i - 1];

No conflicts between iterations of L1 ñ can be run in parallelNo conflicts between iterations of L2 ñ can be run in parallel

Page 8: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Distribution May 30, 2017 5 / 82

Loop DistributionL: for (int i = 1; i < 100; ++i) {

A[i] = f(i);B[i] = A[i] + A[i - 1];

}

Can this loop be parallelized?

Requirement:writes of iteration do not conflict with reads/writes of other iterationLr1s: W pAr1sq RpAr1sq RpAr0sq W pBr1sqLr2s: W pAr2sq RpAr2sq RpAr1sq W pBr2sq

Loop distribution

changes meaning!

L1: for (int i = 1; i < 100; ++i)A[i] = f(i);

L2: for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i - 1];

Page 9: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Distribution May 30, 2017 5 / 82

Loop DistributionL: for (int i = 1; i < 100; ++i) {

A[i] = f(i);B[i] = A[i] + A[i + 1];

}

Can this loop be parallelized?

Requirement:writes of iteration do not conflict with reads/writes of other iterationLr1s: W pAr1sq RpAr1sq RpAr2sq W pBr1sqLr2s: W pAr2sq RpAr2sq RpAr3sq W pBr2sq

Loop distribution

changes meaning!

L1: for (int i = 1; i < 100; ++i)A[i] = f(i);

L2: for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i + 1];

Page 10: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Distribution May 30, 2017 5 / 82

Loop DistributionL: for (int i = 1; i < 100; ++i) {

A[i] = f(i);B[i] = A[i] + A[i + 1];

}

Can this loop be parallelized?

Requirement:writes of iteration do not conflict with reads/writes of other iteration

Lr1s: W pAr1sq RpAr1sq RpAr2sq W pBr1sqLr2s: W pAr2sq RpAr2sq RpAr3sq W pBr2sq

Loop distribution

changes meaning!

L1: for (int i = 1; i < 100; ++i)A[i] = f(i);

L2: for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i + 1];

Page 11: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Distribution May 30, 2017 5 / 82

Loop DistributionL: for (int i = 1; i < 100; ++i) {

A[i] = f(i);B[i] = A[i] + A[i + 1];

}

Can this loop be parallelized?

Requirement:writes of iteration do not conflict with reads/writes of other iterationLr1s: W pAr1sq RpAr1sq RpAr2sq W pBr1sqLr2s: W pAr2sq RpAr2sq RpAr3sq W pBr2sq

Loop distribution

changes meaning!

L1: for (int i = 1; i < 100; ++i)A[i] = f(i);

L2: for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i + 1];

Page 12: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Distribution May 30, 2017 5 / 82

Loop DistributionL: for (int i = 1; i < 100; ++i) {

A[i] = f(i);B[i] = A[i] + A[i + 1];

}

Can this loop be parallelized?

Requirement:writes of iteration do not conflict with reads/writes of other iterationLr1s: W pAr1sq RpAr1sq RpAr2sq W pBr1sqLr2s: W pAr2sq RpAr2sq RpAr3sq W pBr2sq

Loop distribution

changes meaning!

L1: for (int i = 1; i < 100; ++i)A[i] = f(i);

L2: for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i + 1];

Page 13: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Distribution May 30, 2017 5 / 82

Loop DistributionL: for (int i = 1; i < 100; ++i) {

A[i] = f(i);B[i] = A[i] + A[i + 1];

}

Can this loop be parallelized?

Requirement:writes of iteration do not conflict with reads/writes of other iterationLr1s: W pAr1sq RpAr1sq RpAr2sq W pBr1sqLr2s: W pAr2sq RpAr2sq RpAr3sq W pBr2sq

Loop distribution changes meaning!

L1: for (int i = 1; i < 100; ++i)A[i] = f(i);

L2: for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i + 1];

before distribution, Lr1s reads Ar2s value written before code fragmentafter distribution, L2r1s reads Ar2s value written by L1r2s

Page 14: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Fusion May 30, 2017 6 / 82

Loop FusionL1: for (int i = 0; i < 100; ++i)

A[i] = f(i);L2: for (int i = 0; i < 100; ++i)

B[i] = g(A[i]);

Assume A does not fit in the cacheñ elements get evicted and reloaded for use in L2

Loop fusion

(changes execution order ñ may not preserve meaning)

for (int i = 0; i < 100; ++i) {A[i] = f(i);B[i] = g(A[i]);

}ñ elements of A get reused immediatelyñ better locality

If A not needed outside code fragmentñ array can be replaced by a scalarñ memory compaction

Page 15: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Fusion May 30, 2017 6 / 82

Loop FusionL1: for (int i = 0; i < 100; ++i)

A[i] = f(i);L2: for (int i = 0; i < 100; ++i)

B[i] = g(A[i]);

Assume A does not fit in the cacheñ elements get evicted and reloaded for use in L2

Loop fusion

(changes execution order ñ may not preserve meaning)

for (int i = 0; i < 100; ++i) {A[i] = f(i);B[i] = g(A[i]);

}ñ elements of A get reused immediatelyñ better locality

If A not needed outside code fragmentñ array can be replaced by a scalarñ memory compaction

Page 16: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Fusion May 30, 2017 6 / 82

Loop FusionL1: for (int i = 0; i < 100; ++i)

A[i] = f(i);L2: for (int i = 0; i < 100; ++i)

B[i] = g(A[i]);

Assume A does not fit in the cacheñ elements get evicted and reloaded for use in L2

Loop fusion

(changes execution order ñ may not preserve meaning)

for (int i = 0; i < 100; ++i) {A[i] = f(i);B[i] = g(A[i]);

}ñ elements of A get reused immediatelyñ better locality

If A not needed outside code fragmentñ array can be replaced by a scalarñ memory compaction

Page 17: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Fusion May 30, 2017 6 / 82

Loop FusionL1: for (int i = 0; i < 100; ++i)

A[i] = f(i);L2: for (int i = 0; i < 100; ++i)

B[i] = g(A[i]);

Assume A does not fit in the cacheñ elements get evicted and reloaded for use in L2

Loop fusion

(changes execution order ñ may not preserve meaning)

for (int i = 0; i < 100; ++i) {A

[i]

= f(i);B[i] = g(A

[i]

);}ñ elements of A get reused immediatelyñ better locality

If A not needed outside code fragmentñ array can be replaced by a scalarñ memory compaction

Page 18: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Fusion May 30, 2017 6 / 82

Loop FusionL1: for (int i = 0; i < 100; ++i)

A[i] = f(i);L2: for (int i = 0; i < 100; ++i)

B[i] = g(A[i]);

Assume A does not fit in the cacheñ elements get evicted and reloaded for use in L2

Loop fusion (changes execution order ñ may not preserve meaning)for (int i = 0; i < 100; ++i) {

A

[i]

= f(i);B[i] = g(A

[i]

);}ñ elements of A get reused immediatelyñ better locality

If A not needed outside code fragmentñ array can be replaced by a scalarñ memory compaction

Page 19: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 20: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 21: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 22: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 23: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 24: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 25: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 26: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 27: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 28: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 29: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 30: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 31: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 32: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 33: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 34: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 35: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 36: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 37: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 38: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 39: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 40: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 41: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 42: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 43: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 44: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 45: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 46: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 47: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

Loop tiling

(changes execution order ñ may not preserve meaning)

for (int ti = 0; ti < 8; ti += 4)for (int tj = 0; tj < 8; tj += 4)

for (int i = ti; i < ti + 4; ++i)for (int j = tj; j < tj + 4; ++j)

C[i][j] = A[i] * B[j];

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 48: High-Level Loop Transformations and Polyhedral Compilation

Loop Transformations Loop Tiling May 30, 2017 7 / 82

Loop Tiling [17, 35]

L1: for (int i = 0; i < 8; ++i)L2: for (int j = 0; j < 8; ++j)

C[i][j] = A[i] * B[j];

Assume B does not fit in the cacheñ elements get (re)loaded and evicted in every iteration of L1

Loop tiling (changes execution order ñ may not preserve meaning)

for (int ti = 0; ti < 8; ti += 4)for (int tj = 0; tj < 8; tj += 4)

for (int i = ti; i < ti + 4; ++i)for (int j = tj; j < tj + 4; ++j)

C[i][j] = A[i] * B[j];

B

A C

computed element

load into cache

already in cache

ñ compute C in tiles, e.g., 4ˆ 4

Page 49: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation May 30, 2017 8 / 82

Outline1 Loop Transformations

Loop DistributionLoop FusionLoop Tiling

2 Polyhedral CompilationIntroductionPolyhedral ModelSchedulesOperationsSoftware

3 PPCGOverviewModel ExtractionDependence AnalysisSchedulingDevice Mapping

Page 50: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Introduction May 30, 2017 9 / 82

Motivation

Computer architectures are becoming more difficult to programefficiently

§ multiple levels of parallelism§ non-uniform memory architectures

ñ Advanced compiler optimizations are required§ hierarchical partitioning and reordering of operations(e.g., parallelization, loop fusion, . . . )

§ mapping to different processing units§ memory transfers between processing units

ñ Global view of individual operations is requiredñ Polyhedral Model

Page 51: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Introduction May 30, 2017 10 / 82

Polyhedral Compilation — Examplefor (t = 0; t < T; t++)

for (i = 1; i < N - 1; i++)A[(t+1)%2][i] = A[t%2][i-1] + A[t%2][i+1];

t

i

ñ

1 1 1

3 3 3

5 5 5

0 0 0

2 2 2

4 4 4

6 6 6

t

i

1 Extract polyhedral modelñ each dynamic instance represented by pt, iq pair

2 Compute dependences

ñ iteration t “ 2, i “ 3 depends on iteration t “ 1, i “ 4

3 Compute schedule respecting dependences

ñ tiles with same number can be executed in parallelñ rows within tiles can be executed in parallel

Page 52: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Introduction May 30, 2017 10 / 82

Polyhedral Compilation — Examplefor (t = 0; t < T; t++)

for (i = 1; i < N - 1; i++)A[(t+1)%2][i] = A[t%2][i-1] + A[t%2][i+1];

t

i

ñ

1 1 1

3 3 3

5 5 5

0 0 0

2 2 2

4 4 4

6 6 6

t

i

1 Extract polyhedral modelñ each dynamic instance represented by pt, iq pair

2 Compute dependences

ñ iteration t “ 2, i “ 3 depends on iteration t “ 1, i “ 4

3 Compute schedule respecting dependences

ñ tiles with same number can be executed in parallelñ rows within tiles can be executed in parallel

Page 53: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Introduction May 30, 2017 10 / 82

Polyhedral Compilation — Examplefor (t = 0; t < T; t++)

for (i = 1; i < N - 1; i++)A[(t+1)%2][i] = A[t%2][i-1] + A[t%2][i+1];

t

i

ñ

1 1 1

3 3 3

5 5 5

0 0 0

2 2 2

4 4 4

6 6 6

t

i

1 Extract polyhedral modelñ each dynamic instance represented by pt, iq pair

2 Compute dependences

ñ iteration t “ 2, i “ 3 depends on iteration t “ 1, i “ 4

3 Compute schedule respecting dependences

ñ tiles with same number can be executed in parallelñ rows within tiles can be executed in parallel

Page 54: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Introduction May 30, 2017 10 / 82

Polyhedral Compilation — Examplefor (t = 0; t < T; t++)

for (i = 1; i < N - 1; i++)A[(t+1)%2][i] = A[t%2][i-1] + A[t%2][i+1];

t

i

ñ

1 1 1

3 3 3

5 5 5

0 0 0

2 2 2

4 4 4

6 6 6

t

i

1 Extract polyhedral modelñ each dynamic instance represented by pt, iq pair

2 Compute dependences

ñ iteration t “ 2, i “ 3 depends on iteration t “ 1, i “ 43 Compute schedule respecting dependences

ñ tiles with same number can be executed in parallelñ rows within tiles can be executed in parallel

Page 55: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Introduction May 30, 2017 10 / 82

Polyhedral Compilation — Examplefor (t = 0; t < T; t++)

for (i = 1; i < N - 1; i++)A[(t+1)%2][i] = A[t%2][i-1] + A[t%2][i+1];

t

i

ñ

1 1 1

3 3 3

5 5 5

0 0 0

2 2 2

4 4 4

6 6 6

t

i

1 Extract polyhedral modelñ each dynamic instance represented by pt, iq pair

2 Compute dependences

ñ iteration t “ 2, i “ 3 depends on iteration t “ 1, i “ 43 Compute schedule respecting dependences

ñ tiles with same number can be executed in parallelñ rows within tiles can be executed in parallel

Page 56: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Introduction May 30, 2017 10 / 82

Polyhedral Compilation — Examplefor (t = 0; t < T; t++)

for (i = 1; i < N - 1; i++)A[(t+1)%2][i] = A[t%2][i-1] + A[t%2][i+1];

t

i

ñ

1 1 1

3 3 3

5 5 5

0 0 0

2 2 2

4 4 4

6 6 6

t

i

1 Extract polyhedral modelñ each dynamic instance represented by pt, iq pair

2 Compute dependencesñ iteration t “ 2, i “ 3 depends on iteration t “ 1, i “ 4

3 Compute schedule respecting dependences

ñ tiles with same number can be executed in parallelñ rows within tiles can be executed in parallel

Page 57: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Introduction May 30, 2017 10 / 82

Polyhedral Compilation — Examplefor (t = 0; t < T; t++)

for (i = 1; i < N - 1; i++)A[(t+1)%2][i] = A[t%2][i-1] + A[t%2][i+1];

t

i

ñ

1 1 1

3 3 3

5 5 5

0 0 0

2 2 2

4 4 4

6 6 6

t

i

1 Extract polyhedral modelñ each dynamic instance represented by pt, iq pair

2 Compute dependencesñ iteration t “ 2, i “ 3 depends on iteration t “ 1, i “ 4

3 Compute schedule respecting dependences

ñ tiles with same number can be executed in parallelñ rows within tiles can be executed in parallel

Page 58: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Introduction May 30, 2017 10 / 82

Polyhedral Compilation — Examplefor (t = 0; t < T; t++)

for (i = 1; i < N - 1; i++)A[(t+1)%2][i] = A[t%2][i-1] + A[t%2][i+1];

t

i

ñ

1 1 1

3 3 3

5 5 5

0 0 0

2 2 2

4 4 4

6 6 6

t

i

1 Extract polyhedral modelñ each dynamic instance represented by pt, iq pair

2 Compute dependencesñ iteration t “ 2, i “ 3 depends on iteration t “ 1, i “ 4

3 Compute schedule respecting dependencesñ tiles with same number can be executed in parallelñ rows within tiles can be executed in parallel

Page 59: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 11 / 82

Polyhedral Model [27]

Key featuresinstance basedñ statement instancesñ array elements

compact representation based on polyhedra or similar objectsñ Presburger sets and relations

defined by Presburger formula

ñ . . .Main constituents of program representation

Instance Setñ the set of all statement instances

Access Relationsñ the array elements accessed by a statement instance

Dependencesñ the statement instances that depend on a statement instance

Scheduleñ the relative execution order of statement instances

Contextñ constraints on parameters

Page 60: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 12 / 82

Polyhedral Model — Example

for (i = 0; i < 3; ++i)S: B[i] = f(A[i]);for (i = 0; i < 3; ++i)T: C[i] = g(B[2 - i]);

S[], T[]

S[0],S[1],S[2] T[0],T[1],T[2]

S[0]

T[0]

B[0]

S[1]

T[1]

B[1]

S[2]

T[2]

B[2]

for (c = 0; c < 3; ++c) {B[c] = f(A[c]);C[2 - c] = g(B[c]);

}

S[0]T[2],S[1]T[1],S[2]T[0]

S[], T[]

input code input execution order

model

new code new execution order

Page 61: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 12 / 82

Polyhedral Model — Example

for (i = 0; i < 3; ++i)S: B[i] = f(A[i]);for (i = 0; i < 3; ++i)T: C[i] = g(B[2 - i]);

S[], T[]

S[0],S[1],S[2] T[0],T[1],T[2]

S[0]

T[0]

B[0]

S[1]

T[1]

B[1]

S[2]

T[2]

B[2]

for (c = 0; c < 3; ++c) {B[c] = f(A[c]);C[2 - c] = g(B[c]);

}

S[0]T[2],S[1]T[1],S[2]T[0]

S[], T[]

input code input execution order

model

new code new execution order

Page 62: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 12 / 82

Polyhedral Model — Example

for (i = 0; i < 3; ++i)S: B[i] = f(A[i]);for (i = 0; i < 3; ++i)T: C[i] = g(B[2 - i]);

S[], T[]

S[0],S[1],S[2] T[0],T[1],T[2]

S[0]

T[0]

B[0]

S[1]

T[1]

B[1]

S[2]

T[2]

B[2]

for (c = 0; c < 3; ++c) {B[c] = f(A[c]);C[2 - c] = g(B[c]);

}

S[0]T[2],S[1]T[1],S[2]T[0]

S[], T[]

input code input execution order

model

new code new execution order

Page 63: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 12 / 82

Polyhedral Model — Example

for (i = 0; i < 3; ++i)S: B[i] = f(A[i]);for (i = 0; i < 3; ++i)T: C[i] = g(B[2 - i]);

S[], T[]

S[0],S[1],S[2] T[0],T[1],T[2]

S[0]

T[0]

B[0]

S[1]

T[1]

B[1]

S[2]

T[2]

B[2]

for (c = 0; c < 3; ++c) {B[c] = f(A[c]);C[2 - c] = g(B[c]);

}

S[0]T[2],S[1]T[1],S[2]T[0]

S[], T[]

input code input execution order

model

new code new execution order

Page 64: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 12 / 82

Polyhedral Model — Example

for (i = 0; i < 3; ++i)S: B[i] = f(A[i]);for (i = 0; i < 3; ++i)T: C[i] = g(B[2 - i]);

S[], T[]

S[0],S[1],S[2] T[0],T[1],T[2]

S[0]

T[0]

B[0]

S[1]

T[1]

B[1]

S[2]

T[2]

B[2]

for (c = 0; c < 3; ++c) {B[c] = f(A[c]);C[2 - c] = g(B[c]);

}

S[0]T[2],S[1]T[1],S[2]T[0]

S[], T[]

input code input execution order

model

new code new execution order

Page 65: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 12 / 82

Polyhedral Model — Example

for (i = 0; i < 3; ++i)S: B[i] = f(A[i]);for (i = 0; i < 3; ++i)T: C[i] = g(B[2 - i]);

S[], T[]

S[0],S[1],S[2] T[0],T[1],T[2]

S[0]

T[0]

B[0]

S[1]

T[1]

B[1]

S[2]

T[2]

B[2]

for (c = 0; c < 3; ++c) {B[c] = f(A[c]);C[2 - c] = g(B[c]);

}

S[0]T[2],S[1]T[1],S[2]T[0]

S[], T[]

input code input execution order

model

new code new execution order

Page 66: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 12 / 82

Polyhedral Model — Example

for (i = 0; i < 3; ++i)S: B[i] = f(A[i]);for (i = 0; i < 3; ++i)T: C[i] = g(B[2 - i]);

t Sris u, t Tris u

t Sris Ñ ris u t Tris Ñ ris u

S[0]

T[0]

B[0]

S[1]

T[1]

B[1]

S[2]

T[2]

B[2]

for (c = 0; c < 3; ++c) {B[c] = f(A[c]);C[2 - c] = g(B[c]);

}

S[0]T[2],S[1]T[1],S[2]T[0]

S[], T[]

input code input execution order

model

new code new execution order

Page 67: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 12 / 82

Polyhedral Model — Example

for (i = 0; i < 3; ++i)S: B[i] = f(A[i]);for (i = 0; i < 3; ++i)T: C[i] = g(B[2 - i]);

t Sris u, t Tris u

t Sris Ñ ris u t Tris Ñ ris u

S[0]

T[0]

B[0]

S[1]

T[1]

B[1]

S[2]

T[2]

B[2]

for (c = 0; c < 3; ++c) {B[c] = f(A[c]);C[2 - c] = g(B[c]);

}

t Sris Ñ ris; Tris Ñ r2´ is u

t Sris u, t Tris u

input code input execution order

model

new code new execution order

Page 68: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 13 / 82

Polyhedral Model [27]

Key featuresinstance basedñ statement instancesñ array elements

compact representation based on polyhedra or similar objectsñ Presburger sets and relations

defined by Presburger formula

ñ . . .Main constituents of program representation

Instance Setñ the set of all statement instances

Access Relationsñ the array elements accessed by a statement instance

Dependencesñ the statement instances that depend on a statement instance

Scheduleñ the relative execution order of statement instances

Contextñ constraints on parameters

Page 69: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 13 / 82

Polyhedral Model [27]

Key featuresinstance basedñ statement instancesñ array elements

compact representation based on polyhedra or similar objectsñ Presburger sets and relations defined by Presburger formulañ . . .

Main constituents of program representationInstance Setñ the set of all statement instances

Access Relationsñ the array elements accessed by a statement instance

Dependencesñ the statement instances that depend on a statement instance

Scheduleñ the relative execution order of statement instances

Contextñ constraints on parameters

Page 70: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 13 / 82

Polyhedral Model [27]

Key featuresinstance basedñ statement instancesñ array elements

compact representation based on polyhedra or similar objectsñ Presburger sets and relations defined by Presburger formulañ . . .

quasi-

affine expression

(no multiplication)

§ variable§ constant integer number§ constant symbol§ addition (`), subtraction (´)

§ integer division by integer constant d (t¨{du)Presburger formula

§ true§ quasi-affine expression§ less-than-or-equal relation (ď)§ equality (“)§ first order logic connectives: ^, _, , D, @

Main constituents of program representationInstance Setñ the set of all statement instances

Access Relationsñ the array elements accessed by a statement instance

Dependencesñ the statement instances that depend on a statement instance

Scheduleñ the relative execution order of statement instances

Contextñ constraints on parameters

Page 71: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 13 / 82

Polyhedral Model [27]

Key featuresinstance basedñ statement instancesñ array elements

compact representation based on polyhedra or similar objectsñ Presburger sets and relations defined by Presburger formulañ . . .quasi-affine expression

(no multiplication)

§ variable§ constant integer number§ constant symbol§ addition (`), subtraction (´)§ integer division by integer constant d (t¨{du)

Presburger formula§ true§ quasi-affine expression§ less-than-or-equal relation (ď)§ equality (“)§ first order logic connectives: ^, _, , D, @

Main constituents of program representationInstance Setñ the set of all statement instances

Access Relationsñ the array elements accessed by a statement instance

Dependencesñ the statement instances that depend on a statement instance

Scheduleñ the relative execution order of statement instances

Contextñ constraints on parameters

Page 72: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 13 / 82

Polyhedral Model [27]

Key featuresinstance basedñ statement instancesñ array elements

compact representation based on polyhedra or similar objectsñ Presburger sets and relations defined by Presburger formulañ . . .quasi-affine expression (no multiplication)

§ variable§ constant integer number§ constant symbol§ addition (`), subtraction (´)§ integer division by integer constant d (t¨{du)

Presburger formula§ true§ quasi-affine expression§ less-than-or-equal relation (ď)§ equality (“)§ first order logic connectives: ^, _, , D, @

Main constituents of program representationInstance Setñ the set of all statement instances

Access Relationsñ the array elements accessed by a statement instance

Dependencesñ the statement instances that depend on a statement instance

Scheduleñ the relative execution order of statement instances

Contextñ constraints on parameters

Page 73: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 13 / 82

Polyhedral Model [27]

Key featuresinstance basedñ statement instancesñ array elements

compact representation based on polyhedra or similar objectsñ Presburger sets and relations defined by Presburger formulañ . . .quasi-affine expression (no multiplication)

§ variable§ constant integer number§ constant symbol§ addition (`), subtraction (´)§ integer division by integer constant d (t¨{du)

Presburger formula§ true§ quasi-affine expression§ less-than-or-equal relation (ď)§ equality (“)§ first order logic connectives: ^, _, , D, @

Main constituents of program representationInstance Setñ the set of all statement instances

Access Relationsñ the array elements accessed by a statement instance

Dependencesñ the statement instances that depend on a statement instance

Scheduleñ the relative execution order of statement instances

Contextñ constraints on parameters

Page 74: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 14 / 82

Parametric Example: Matrix Multiplicationfor (int i = 0; i < M; i++)

for (int j = 0; j < N; j++) {S1: C[i][j] = 0;

for (int k = 0; k < K; k++)S2: C[i][j] = C[i][j] + A[i][k] * B[k][j];

}

Instance Set (set of statement instances)

t S1ri , js : 0 ď i ă M ^ 0 ď j ă N;

S2ri , j , ks : 0 ď i ă M ^ 0 ď j ă N ^ 0 ď k ă K u

Access Relations (accessed array elements; W : write, R : read)

W “ t S1ri , js Ñ Cri , js; S2ri , j , ks Ñ Cri , js u

R “ t S2ri , j , ks Ñ Cri , js; S2ri , j , ks Ñ Ari , ks; S2ri , j , ks Ñ Brk , js u

Page 75: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 15 / 82

Schedule Representation [30]

Schedule S keeps track of relative execution order of statement instances

ñ for each pair of statement instances i and j, schedule determines§ i executed before j (i ăS j),§ i executed after j (j ăS i), or§ i and j may be executed simultaneously

Schedule trees form a combined hierarchical schedule representationMain constructs:

§ affine schedule: instances are executed according to affine function

§ band: nested sequence of affine functions called its members;combined multi-dimensional affine function is calledthe partial schedule of the band

§ sequence: partitions instances through child filters executed in order

Order of instances determined by outermost node that separates themDeriving schedule tree from AST

§ for loop ñ affine schedule corresponding to loop iterator§ compound statement ñ sequence

Page 76: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 15 / 82

Schedule Representation [30]

Schedule S keeps track of relative execution order of statement instances

ñ for each pair of statement instances i and j, schedule determines§ i executed before j (i ăS j),§ i executed after j (j ăS i), or§ i and j may be executed simultaneously

Schedule trees form a combined hierarchical schedule representationMain constructs:

§ affine schedule: instances are executed according to affine function

§ band: nested sequence of affine functions called its members;combined multi-dimensional affine function is calledthe partial schedule of the band

§ sequence: partitions instances through child filters executed in order

Order of instances determined by outermost node that separates themDeriving schedule tree from AST

§ for loop ñ affine schedule corresponding to loop iterator§ compound statement ñ sequence

Page 77: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 16 / 82

Parametric Example: Matrix Multiplicationfor (int i = 0; i < M; i++)

for (int j = 0; j < N; j++) {S1: C[i][j] = 0;

for (int k = 0; k < K; k++)S2: C[i][j] = C[i][j] + A[i][k] * B[k][j];

}

S1ri , js Ñ ris; S2ri , j , ks Ñ ris

S1ri , js Ñ rjs; S2ri , j , ks Ñ rjs

sequence

S1ri , js

S1ri , js Ñ r0s

S2ri , j , ks

S2ri , j , ks Ñ rks

filters

affine functions

Page 78: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 16 / 82

Parametric Example: Matrix Multiplicationfor (int i = 0; i < M; i++)

for (int j = 0; j < N; j++) {S1: C[i][j] = 0;

for (int k = 0; k < K; k++)S2: C[i][j] = C[i][j] + A[i][k] * B[k][j];

}

S1ri , js Ñ ris; S2ri , j , ks Ñ ris

S1ri , js Ñ rjs; S2ri , j , ks Ñ rjs

sequence

S1ri , js

S1ri , js Ñ r0s

S2ri , j , ks

S2ri , j , ks Ñ rks

filters

affine functions

Page 79: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 16 / 82

Parametric Example: Matrix Multiplicationfor (int i = 0; i < M; i++)

for (int j = 0; j < N; j++) {S1: C[i][j] = 0;

for (int k = 0; k < K; k++)S2: C[i][j] = C[i][j] + A[i][k] * B[k][j];

}

S1ri , js Ñ ris; S2ri , j , ks Ñ ris

S1ri , js Ñ rjs; S2ri , j , ks Ñ rjs

sequence

S1ri , js

S1ri , js Ñ r0s

S2ri , j , ks

S2ri , j , ks Ñ rks

filters

affine functions

Page 80: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 16 / 82

Parametric Example: Matrix Multiplicationfor (int i = 0; i < M; i++)

for (int j = 0; j < N; j++) {S1: C[i][j] = 0;

for (int k = 0; k < K; k++)S2: C[i][j] = C[i][j] + A[i][k] * B[k][j];

}

S1ri , js Ñ ris; S2ri , j , ks Ñ ris

S1ri , js Ñ rjs; S2ri , j , ks Ñ rjs

sequence

S1ri , js

S1ri , js Ñ r0s

S2ri , j , ks

S2ri , j , ks Ñ rks

filters

affine functions

Page 81: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 16 / 82

Parametric Example: Matrix Multiplicationfor (int i = 0; i < M; i++)

for (int j = 0; j < N; j++) {S1: C[i][j] = 0;

for (int k = 0; k < K; k++)S2: C[i][j] = C[i][j] + A[i][k] * B[k][j];

}

S1ri , js Ñ ris; S2ri , j , ks Ñ ris

S1ri , js Ñ rjs; S2ri , j , ks Ñ rjs

sequence

S1ri , js

S1ri , js Ñ r0s

S2ri , j , ks

S2ri , j , ks Ñ rks

filters

affine functions

Page 82: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 17 / 82

Schedule Representation [30]

Schedule S keeps track of relative execution order of statement instances

ñ for each pair of statement instances i and j, schedule determines§ i executed before j (i ăS j),§ i executed after j (j ăS i), or§ i and j may be executed simultaneously

Schedule trees form a combined hierarchical schedule representationMain constructs:

§ affine schedule: instances are executed according to affine function

§ band: nested sequence of affine functions called its members;combined multi-dimensional affine function is calledthe partial schedule of the band

§ sequence: partitions instances through child filters executed in order

Order of instances determined by outermost node that separates themDeriving schedule tree from AST

§ for loop ñ affine schedule corresponding to loop iterator§ compound statement ñ sequence

Page 83: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 17 / 82

Schedule Representation [30]

Schedule S keeps track of relative execution order of statement instances

ñ for each pair of statement instances i and j, schedule determines§ i executed before j (i ăS j),§ i executed after j (j ăS i), or§ i and j may be executed simultaneously

Schedule trees form a combined hierarchical schedule representationMain constructs:

§ affine schedule: instances are executed according to affine function§ band: nested sequence of affine functions called its members;combined multi-dimensional affine function is calledthe partial schedule of the band

§ sequence: partitions instances through child filters executed in order

Order of instances determined by outermost node that separates themDeriving schedule tree from AST

§ for loop ñ affine schedule corresponding to loop iterator§ compound statement ñ sequence

Page 84: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 18 / 82

Named Presburger Relation Schedules

Schedule tree with single (band) node

Flattening a schedule treetwo nested band nodesñ replace by single band node with concatenated partial schedule

sequence with as children either leaves ortrees consisting of a single band nodeñ treat leaves as zero-dimensional band nodesñ pad lower-dimensional bands (e.g., with zero)ñ construct one-dimensional band assigning increasing values to childrenñ combine one-dimensional band with children

Page 85: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 18 / 82

Named Presburger Relation Schedules

Schedule tree with single (band) node

Flattening a schedule treetwo nested band nodesñ replace by single band node with concatenated partial schedule

sequence with as children either leaves ortrees consisting of a single band nodeñ treat leaves as zero-dimensional band nodesñ pad lower-dimensional bands (e.g., with zero)ñ construct one-dimensional band assigning increasing values to childrenñ combine one-dimensional band with children

Page 86: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 19 / 82

Parametric Example: Matrix Multiplicationfor (int i = 0; i < M; i++)

for (int j = 0; j < N; j++) {S1: C[i][j] = 0;

for (int k = 0; k < K; k++)S2: C[i][j] = C[i][j] + A[i][k] * B[k][j];

}

S1ri , js Ñ ris; S2ri , j , ks Ñ ris

S1ri , js Ñ rjs; S2ri , j , ks Ñ rjs

sequence

S1ri , js

S1ri , js Ñ r0s

S2ri , j , ks

S2ri , j , ks Ñ rks

Page 87: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 19 / 82

Parametric Example: Matrix Multiplicationfor (int i = 0; i < M; i++)

for (int j = 0; j < N; j++) {S1: C[i][j] = 0;

for (int k = 0; k < K; k++)S2: C[i][j] = C[i][j] + A[i][k] * B[k][j];

}

S1ri , js Ñ ris; S2ri , j , ks Ñ ris

S1ri , js Ñ rjs; S2ri , j , ks Ñ rjs

sequence

S1ri , js

S1ri , js Ñ r0s

S2ri , j , ks

S2ri , j , ks Ñ rks

Page 88: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 19 / 82

Parametric Example: Matrix Multiplicationfor (int i = 0; i < M; i++)

for (int j = 0; j < N; j++) {S1: C[i][j] = 0;

for (int k = 0; k < K; k++)S2: C[i][j] = C[i][j] + A[i][k] * B[k][j];

}

S1ri , js Ñ ris; S2ri , j , ks Ñ ris

S1ri , js Ñ rjs; S2ri , j , ks Ñ rjs

S1ri , js Ñ r0, 0s; S2ri , j , ks Ñ r1, ks

S1ri , js

S1ri , js Ñ r0s

S2ri , j , ks

S2ri , j , ks Ñ rks

Page 89: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 19 / 82

Parametric Example: Matrix Multiplicationfor (int i = 0; i < M; i++)

for (int j = 0; j < N; j++) {S1: C[i][j] = 0;

for (int k = 0; k < K; k++)S2: C[i][j] = C[i][j] + A[i][k] * B[k][j];

}

S1ri , js Ñ ris; S2ri , j , ks Ñ ris

S1ri , js Ñ rj , 0, 0s; S2ri , j , ks Ñ rj , 1, ks

S1ri , js Ñ r0, 0s; S2ri , j , ks Ñ r1, ks

S1ri , js

S1ri , js Ñ r0s

S2ri , j , ks

S2ri , j , ks Ñ rks

Page 90: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 19 / 82

Parametric Example: Matrix Multiplicationfor (int i = 0; i < M; i++)

for (int j = 0; j < N; j++) {S1: C[i][j] = 0;

for (int k = 0; k < K; k++)S2: C[i][j] = C[i][j] + A[i][k] * B[k][j];

}

S1ri , js Ñ ri , j , 0, 0s; S2ri , j , ks Ñ ri , j , 1, ks

S1ri , js Ñ rj , 0, 0s; S2ri , j , ks Ñ rj , 1, ks

S1ri , js Ñ r0, 0s; S2ri , j , ks Ñ r1, ks

S1ri , js

S1ri , js Ñ r0s

S2ri , j , ks

S2ri , j , ks Ñ rks

Page 91: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Polyhedral Model May 30, 2017 20 / 82

Loop Transformations and the Polyhedral Model

Loop transformations result indifferent execution order of statement instancesñ different schedule

Polyhedral model can be used toevaluate a schedule and/orconstruct a schedule

Polyhedral schedules can represent (combinations of)loop distributionloop fusionloop tiling. . .

Page 92: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 21 / 82

Schedule Properties

ValidityNew schedule should preserve meaning

ParallelismCan the iterations of a given loop be executed in parallel?LocalityStatement instances scheduled closely to each otherTilabilityCan a given schedule band be tiled?

Page 93: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 22 / 82

Schedule Validity [3]

New schedule should preserve meaning

R(a) W(a) R(a) W(b) W(a) W(a)

Internal restrictionsNo read of a value may be scheduled before the write of the valueNo other write to same memory location may be scheduled in between

External restrictions (on non-temporaries)No write may be scheduled before initial read from a memory locationNo write may be scheduled after last write to a memory location

Sufficient conditions:Every read of a memory location is scheduled after every precedingwrite to the same memory locationEvery write to a memory location is scheduled after every precedingread or write to the same memory location

Page 94: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 22 / 82

Schedule Validity [3]

New schedule should preserve meaning

R(a) W(a) R(a) W(b) W(a) W(a)

Internal restrictionsNo read of a value may be scheduled before the write of the valueNo other write to same memory location may be scheduled in between

External restrictions (on non-temporaries)No write may be scheduled before initial read from a memory locationNo write may be scheduled after last write to a memory location

Sufficient conditions:Every read of a memory location is scheduled after every precedingwrite to the same memory locationEvery write to a memory location is scheduled after every precedingread or write to the same memory location

Page 95: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 22 / 82

Schedule Validity [3]

New schedule should preserve meaning

R(a) W(a) R(a) W(b) W(a) W(a)

Internal restrictionsNo read of a value may be scheduled before the write of the valueNo other write to same memory location may be scheduled in between

External restrictions (on non-temporaries)No write may be scheduled before initial read from a memory locationNo write may be scheduled after last write to a memory location

Sufficient conditions:Every read of a memory location is scheduled after every precedingwrite to the same memory locationEvery write to a memory location is scheduled after every precedingread or write to the same memory location

Page 96: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 22 / 82

Schedule Validity [3]

New schedule should preserve meaning

R(a) W(a) R(a) W(b) W(a) W(a)

Internal restrictionsNo read of a value may be scheduled before the write of the valueNo other write to same memory location may be scheduled in between

External restrictions (on non-temporaries)No write may be scheduled before initial read from a memory locationNo write may be scheduled after last write to a memory location

Sufficient conditions:Every read of a memory location is scheduled after every precedingwrite to the same memory locationEvery write to a memory location is scheduled after every precedingread or write to the same memory location

Page 97: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 22 / 82

Schedule Validity [3]

New schedule should preserve meaning

R(a) W(a) R(a) W(b) W(a) W(a)

Internal restrictionsNo read of a value may be scheduled before the write of the valueNo other write to same memory location may be scheduled in between

External restrictions (on non-temporaries)No write may be scheduled before initial read from a memory locationNo write may be scheduled after last write to a memory location

Sufficient conditions:Every read of a memory location is scheduled after every precedingwrite to the same memory locationEvery write to a memory location is scheduled after every precedingread or write to the same memory location

Page 98: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 22 / 82

Schedule Validity [3]

New schedule should preserve meaning

R(a) W(a) R(a) W(b) W(a) W(a)

Internal restrictionsNo read of a value may be scheduled before the write of the valueNo other write to same memory location may be scheduled in between

External restrictions (on non-temporaries)No write may be scheduled before initial read from a memory locationNo write may be scheduled after last write to a memory location

Sufficient conditions:Every read of a memory location is scheduled after every precedingwrite to the same memory locationEvery write to a memory location is scheduled after every precedingread or write to the same memory location

Page 99: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 22 / 82

Schedule Validity [3]

New schedule should preserve meaning

R(a) W(a) R(a) W(b) W(a) W(a)

Internal restrictionsNo read of a value may be scheduled before the write of the valueNo other write to same memory location may be scheduled in between

External restrictions (on non-temporaries)No write may be scheduled before initial read from a memory locationNo write may be scheduled after last write to a memory location

Sufficient conditions:Every read of a memory location is scheduled after every precedingwrite to the same memory locationEvery write to a memory location is scheduled after every precedingread or write to the same memory location

Page 100: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 22 / 82

Schedule Validity [3]

New schedule should preserve meaning

R(a) W(a) R(a) W(b) W(a) W(a)

Internal restrictionsNo read of a value may be scheduled before the write of the valueNo other write to same memory location may be scheduled in between

External restrictions (on non-temporaries)No write may be scheduled before initial read from a memory locationNo write may be scheduled after last write to a memory location

Sufficient conditions:Every read of a memory location is scheduled after every precedingwrite to the same memory locationEvery write to a memory location is scheduled after every precedingread or write to the same memory location

Page 101: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 23 / 82

Dependences

Sufficient conditions for validity of schedule S :Every read of a memory location is scheduled after every precedingwrite to the same memory locationEvery write to a memory location is scheduled after every precedingread or write to the same memory location

Dependence relation D: pairs of statement instancesaccessing the same memory locationof which at least one is a writewith the first executed before the second in original code

Sufficient condition:@iÑ j P D : i ăS j

Page 102: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 24 / 82

Dependence AnalysisRecall: sufficient conditions for validity of schedule S :

@iÑ j P D : i ăS j

Dependence relation D: pairs of statement instancesaccessing the same memory locationof which at least one is a writewith the first executed before the second in original code

Computation:

D “``

W´1 ˝ R˘

Y`

W´1 ˝W˘

Y`

R´1 ˝W˘˘

X păS0q

W : write access relationR : read access relationS0: original schedule

instances data

order

W , RS0

ăS0

Page 103: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 24 / 82

Dependence AnalysisRecall: sufficient conditions for validity of schedule S :

@iÑ j P D : i ăS j

Dependence relation D: pairs of statement instancesaccessing the same memory locationof which at least one is a writewith the first executed before the second in original code

Computation:

D “``

W´1 ˝ R˘

Y`

W´1 ˝W˘

Y`

R´1 ˝W˘˘

X păS0q

W : write access relationR : read access relationS0: original schedule

instances data

order

W , RS0

ăS0

Page 104: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 25 / 82

Local ValiditySchedule validity:

@iÑ j P D : i ăS j

Consider subset of local dependences LAt outermost node: L “ D

Current nodeband node with partial schedule f

@iÑ j P L : f piqďlex f pjq

Carried dependences: iÑ j P L : f piq ‰ f pjqñ no longer need to be considered in nested nodesRemaining dependences: L1 “ t iÑ j P L : f piq “ f pjq u

sequence node with child position p and filters Fk

@iÑ j P L : ppiq ď ppjq

Carried dependences: iÑ j P L : ppiq ‰ ppjqRemaining dependences in child c : L1 “ t iÑ j P L : i, j P Fc uleaf node: L “ H

Page 105: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 25 / 82

Local ValiditySchedule validity:

@iÑ j P D : i ăS j

Consider subset of local dependences LAt outermost node: L “ DCurrent node

band node with partial schedule f

@iÑ j P L : f piqďlex f pjq

Carried dependences: iÑ j P L : f piq ‰ f pjqñ no longer need to be considered in nested nodesRemaining dependences: L1 “ t iÑ j P L : f piq “ f pjq u

sequence node with child position p and filters Fk

@iÑ j P L : ppiq ď ppjq

Carried dependences: iÑ j P L : ppiq ‰ ppjqRemaining dependences in child c : L1 “ t iÑ j P L : i, j P Fc uleaf node: L “ H

Page 106: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 25 / 82

Local ValiditySchedule validity:

@iÑ j P D : i ăS j

Consider subset of local dependences LAt outermost node: L “ DCurrent node

band node with partial schedule f

@iÑ j P L : f piqďlex f pjq

Carried dependences: iÑ j P L : f piq ‰ f pjqñ no longer need to be considered in nested nodesRemaining dependences: L1 “ t iÑ j P L : f piq “ f pjq usequence node with child position p and filters Fk

@iÑ j P L : ppiq ď ppjq

Carried dependences: iÑ j P L : ppiq ‰ ppjqRemaining dependences in child c : L1 “ t iÑ j P L : i, j P Fc u

leaf node: L “ H

Page 107: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 25 / 82

Local ValiditySchedule validity:

@iÑ j P D : i ăS j

Consider subset of local dependences LAt outermost node: L “ DCurrent node

band node with partial schedule f

@iÑ j P L : f piqďlex f pjq

Carried dependences: iÑ j P L : f piq ‰ f pjqñ no longer need to be considered in nested nodesRemaining dependences: L1 “ t iÑ j P L : f piq “ f pjq usequence node with child position p and filters Fk

@iÑ j P L : ppiq ď ppjq

Carried dependences: iÑ j P L : ppiq ‰ ppjqRemaining dependences in child c : L1 “ t iÑ j P L : i, j P Fc uleaf node: L “ H

Page 108: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 26 / 82

Loop Distribution Validityfor (int i = 1; i < 100; ++i) {S: A[i] = f(i);T: B[i] = A[i] + A[i - 1];}

t Sris Ñ ris; Tris Ñ ris u

t Sris u, t Tris u

Dependences:

t Sris Ñ Tris : 1 ď i ă 100; Sris Ñ Tri ` 1s : 1 ď i , i ` 1 ă 100 u

Loop distribution

for (int i = 1; i < 100; ++i)A[i] = f(i);

for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i - 1];

t Sris u, t Tris u

t Sris Ñ ris ut Tris Ñ ris u

t Sris u, t Tris usatisfied: t Sris Ñ Tris : 1 ď i ă 100 u

Page 109: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 26 / 82

Loop Distribution Validityfor (int i = 1; i < 100; ++i) {S: A[i] = f(i);T: B[i] = A[i] + A[i - 1];}

t Sris Ñ ris; Tris Ñ ris u

t Sris u, t Tris u

Dependences:

t Sris Ñ Tris : 1 ď i ă 100; Sris Ñ Tri ` 1s : 1 ď i , i ` 1 ă 100 u

Loop distribution

for (int i = 1; i < 100; ++i)A[i] = f(i);

for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i - 1];

t Sris u, t Tris u

t Sris Ñ ris ut Tris Ñ ris u

t Sris u, t Tris usatisfied: t Sris Ñ Tris : 1 ď i ă 100 u

Page 110: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 26 / 82

Loop Distribution Validityfor (int i = 1; i < 100; ++i) {S: A[i] = f(i);T: B[i] = A[i] + A[i - 1];}

t Sris Ñ ris; Tris Ñ ris u

t Sris u, t Tris u

Dependences:

t Sris Ñ Tris : 1 ď i ă 100; Sris Ñ Tri ` 1s : 1 ď i , i ` 1 ă 100 u

t Sris Ñ ris; Tris Ñ ris usatisfied: t Sris Ñ Tris : 1 ď i ă 100; Sris Ñ Tri ` 1s : 1 ď i , i ` 1 ă 100 ucarried: t Sris Ñ Tri ` 1s : 1 ď i , i ` 1 ă 100 u

t Sris u, t Tris usatisfied: t Sris Ñ Tris : 1 ď i ă 100 ucarried: t Sris Ñ Tris : 1 ď i ă 100 uLoop distribution

for (int i = 1; i < 100; ++i)A[i] = f(i);

for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i - 1];

t Sris u, t Tris u

t Sris Ñ ris ut Tris Ñ ris u

t Sris u, t Tris usatisfied: t Sris Ñ Tris : 1 ď i ă 100 u

Page 111: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 26 / 82

Loop Distribution Validityfor (int i = 1; i < 100; ++i) {S: A[i] = f(i);T: B[i] = A[i] + A[i - 1];}

t Sris Ñ ris; Tris Ñ ris u

t Sris u, t Tris u

Dependences:

t Sris Ñ Tris : 1 ď i ă 100; Sris Ñ Tri ` 1s : 1 ď i , i ` 1 ă 100 u

t Sris Ñ ris; Tris Ñ ris usatisfied: t Sris Ñ Tris : 1 ď i ă 100; Sris Ñ Tri ` 1s : 1 ď i , i ` 1 ă 100 ucarried: t Sris Ñ Tri ` 1s : 1 ď i , i ` 1 ă 100 ut Sris u, t Tris usatisfied: t Sris Ñ Tris : 1 ď i ă 100 ucarried: t Sris Ñ Tris : 1 ď i ă 100 u

Loop distribution

for (int i = 1; i < 100; ++i)A[i] = f(i);

for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i - 1];

t Sris u, t Tris u

t Sris Ñ ris ut Tris Ñ ris u

t Sris u, t Tris usatisfied: t Sris Ñ Tris : 1 ď i ă 100 u

Page 112: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 26 / 82

Loop Distribution Validityfor (int i = 1; i < 100; ++i) {S: A[i] = f(i);T: B[i] = A[i] + A[i - 1];}

t Sris Ñ ris; Tris Ñ ris u

t Sris u, t Tris u

Dependences:

t Sris Ñ Tris : 1 ď i ă 100; Sris Ñ Tri ` 1s : 1 ď i , i ` 1 ă 100 u

Loop distribution

for (int i = 1; i < 100; ++i)A[i] = f(i);

for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i - 1];

t Sris u, t Tris u

t Sris Ñ ris ut Tris Ñ ris u

t Sris u, t Tris usatisfied: t Sris Ñ Tris : 1 ď i ă 100 u

Page 113: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 26 / 82

Loop Distribution Validityfor (int i = 1; i < 100; ++i) {S: A[i] = f(i);T: B[i] = A[i] + A[i - 1];}

t Sris Ñ ris; Tris Ñ ris u

t Sris u, t Tris u

Dependences:

t Sris Ñ Tris : 1 ď i ă 100; Sris Ñ Tri ` 1s : 1 ď i , i ` 1 ă 100 u

Loop distribution

for (int i = 1; i < 100; ++i)A[i] = f(i);

for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i - 1];

t Sris u, t Tris u

t Sris Ñ ris ut Tris Ñ ris u

t Sris u, t Tris usatisfied: t Sris Ñ Tris : 1 ď i ă 100; Sris Ñ Tri ` 1s : 1 ď i , i ` 1 ă 100 ucarried: t Sris Ñ Tris : 1 ď i ă 100; Sris Ñ Tri ` 1s : 1 ď i , i ` 1 ă 100 u

Page 114: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 26 / 82

Loop Distribution Validityfor (int i = 1; i < 100; ++i) {S: A[i] = f(i);T: B[i] = A[i] + A[i - 1];}

t Sris Ñ ris; Tris Ñ ris u

t Sris u, t Tris u

Dependences:t Sris Ñ Tris : 1 ď i ă 100; u

Loop distribution

for (int i = 1; i < 100; ++i)A[i] = f(i);

for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i 1];

t Sris u, t Tris u

t Sris Ñ ris ut Tris Ñ ris u

t Sris u, t Tris usatisfied: t Sris Ñ Tris : 1 ď i ă 100 u

Page 115: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 26 / 82

Loop Distribution Validityfor (int i = 1; i < 100; ++i) {S: A[i] = f(i);T: B[i] = A[i] + A[i + 1];}

t Sris Ñ ris; Tris Ñ ris u

t Sris u, t Tris u

Dependences:t Sris Ñ Tris : 1 ď i ă 100; u

Loop distribution

for (int i = 1; i < 100; ++i)A[i] = f(i);

for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i 1];

t Sris u, t Tris u

t Sris Ñ ris ut Tris Ñ ris u

t Sris u, t Tris usatisfied: t Sris Ñ Tris : 1 ď i ă 100 u

Page 116: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 26 / 82

Loop Distribution Validityfor (int i = 1; i < 100; ++i) {S: A[i] = f(i);T: B[i] = A[i] + A[i + 1];}

t Sris Ñ ris; Tris Ñ ris u

t Sris u, t Tris u

Dependences:

t Sris Ñ Tris : 1 ď i ă 100; Tris Ñ Sri ` 1s : 1 ď i , i ` 1 ă 100 u

Loop distribution

for (int i = 1; i < 100; ++i)A[i] = f(i);

for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i 1];

t Sris u, t Tris u

t Sris Ñ ris ut Tris Ñ ris u

t Sris u, t Tris usatisfied: t Sris Ñ Tris : 1 ď i ă 100 u

Page 117: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 26 / 82

Loop Distribution Validityfor (int i = 1; i < 100; ++i) {S: A[i] = f(i);T: B[i] = A[i] + A[i + 1];}

t Sris Ñ ris; Tris Ñ ris u

t Sris u, t Tris u

Dependences:

t Sris Ñ Tris : 1 ď i ă 100; Tris Ñ Sri ` 1s : 1 ď i , i ` 1 ă 100 u

Loop distribution

for (int i = 1; i < 100; ++i)A[i] = f(i);

for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i + 1];

t Sris u, t Tris u

t Sris Ñ ris ut Tris Ñ ris u

t Sris u, t Tris usatisfied: t Sris Ñ Tris : 1 ď i ă 100 u

Page 118: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 26 / 82

Loop Distribution Validityfor (int i = 1; i < 100; ++i) {S: A[i] = f(i);T: B[i] = A[i] + A[i + 1];}

t Sris Ñ ris; Tris Ñ ris u

t Sris u, t Tris u

Dependences:

t Sris Ñ Tris : 1 ď i ă 100; Tris Ñ Sri ` 1s : 1 ď i , i ` 1 ă 100 u

Loop distribution

for (int i = 1; i < 100; ++i)A[i] = f(i);

for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i + 1];

t Sris u, t Tris u

t Sris Ñ ris ut Tris Ñ ris u

t Sris u, t Tris usatisfied: t Sris Ñ Tris : 1 ď i ă 100 uviolated: t Tris Ñ Sri ` 1s : 1 ď i , i ` 1 ă 100 u

Page 119: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 27 / 82

Schedule Properties

ValidityNew schedule should preserve meaning

ParallelismCan the iterations of a given loop be executed in parallel?LocalityStatement instances scheduled closely to each otherTilabilityCan a given schedule band be tiled?

Page 120: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 27 / 82

Schedule Properties

ValidityNew schedule should preserve meaningParallelismCan the iterations of a given loop be executed in parallel?

LocalityStatement instances scheduled closely to each otherTilabilityCan a given schedule band be tiled?

Page 121: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 28 / 82

Parallel Loops and Parallel Band Members

Recall:Iterations of a given loop can be executed in parallel ifwrites of iteration do not conflict with reads/writes of other iteration

iff there is no dependence between distinct iterations(for any given iteration of the outer loops)

A band member with affine function f is parallel if

@iÑ j P L : f piq “ f pjq

with L the local dependences

Page 122: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 28 / 82

Parallel Loops and Parallel Band Members

Recall:Iterations of a given loop can be executed in parallel ifwrites of iteration do not conflict with reads/writes of other iterationiff there is no dependence between distinct iterations

(for any given iteration of the outer loops)

A band member with affine function f is parallel if

@iÑ j P L : f piq “ f pjq

with L the local dependences

Page 123: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 28 / 82

Parallel Loops and Parallel Band Members

Recall:Iterations of a given loop can be executed in parallel ifwrites of iteration do not conflict with reads/writes of other iterationiff there is no dependence between distinct iterations(for any given iteration of the outer loops)

A band member with affine function f is parallel if

@iÑ j P L : f piq “ f pjq

with L the local dependences

Page 124: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 28 / 82

Parallel Loops and Parallel Band Members

Recall:Iterations of a given loop can be executed in parallel ifwrites of iteration do not conflict with reads/writes of other iterationiff there is no dependence between distinct iterations(for any given iteration of the outer loops)

A band member with affine function f is parallel if

@iÑ j P L : f piq “ f pjq

with L the local dependences

Page 125: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 29 / 82

Loop Distribution and Parallelismfor (int i = 1; i < 100; ++i) {S: A[i] = f(i);T: B[i] = A[i] + A[i - 1];}

t Sris Ñ ris; Tris Ñ ris u

t Sris u, t Tris u

Dependences:

t Sris Ñ Tris : 1 ď i ă 100; Sris Ñ Tri ` 1s : 1 ď i , i ` 1 ă 100 u

Loop distribution

for (int i = 1; i < 100; ++i)A[i] = f(i);

for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i - 1];

t Sris u, t Tris u

t Sris Ñ ris ut Tris Ñ ris u

t Sris Ñ ris ulocal: Hconflict: Hñ parallel

t Tris Ñ ris ulocal: Hconflict: Hñ parallel

Page 126: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 29 / 82

Loop Distribution and Parallelismfor (int i = 1; i < 100; ++i) {S: A[i] = f(i);T: B[i] = A[i] + A[i - 1];}

t Sris Ñ ris; Tris Ñ ris u

t Sris u, t Tris u

Dependences:

t Sris Ñ Tris : 1 ď i ă 100; Sris Ñ Tri ` 1s : 1 ď i , i ` 1 ă 100 u

t Sris Ñ ris; Tris Ñ ris ulocal: t Sris Ñ Tris : 1 ď i ă 100; Sris Ñ Tri ` 1s : 1 ď i , i ` 1 ă 100 uconflict: t Sris Ñ Tri ` 1s : 1 ď i , i ` 1 ă 100 uñ not parallel

Loop distribution

for (int i = 1; i < 100; ++i)A[i] = f(i);

for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i - 1];

t Sris u, t Tris u

t Sris Ñ ris ut Tris Ñ ris u

t Sris Ñ ris ulocal: Hconflict: Hñ parallel

t Tris Ñ ris ulocal: Hconflict: Hñ parallel

Page 127: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 29 / 82

Loop Distribution and Parallelismfor (int i = 1; i < 100; ++i) {S: A[i] = f(i);T: B[i] = A[i] + A[i - 1];}

t Sris Ñ ris; Tris Ñ ris u

t Sris u, t Tris u

Dependences:

t Sris Ñ Tris : 1 ď i ă 100; Sris Ñ Tri ` 1s : 1 ď i , i ` 1 ă 100 u

Loop distribution

for (int i = 1; i < 100; ++i)A[i] = f(i);

for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i - 1];

t Sris u, t Tris u

t Sris Ñ ris ut Tris Ñ ris u

t Sris Ñ ris ulocal: Hconflict: Hñ parallel

t Tris Ñ ris ulocal: Hconflict: Hñ parallel

Page 128: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 29 / 82

Loop Distribution and Parallelismfor (int i = 1; i < 100; ++i) {S: A[i] = f(i);T: B[i] = A[i] + A[i - 1];}

t Sris Ñ ris; Tris Ñ ris u

t Sris u, t Tris u

Dependences:

t Sris Ñ Tris : 1 ď i ă 100; Sris Ñ Tri ` 1s : 1 ď i , i ` 1 ă 100 u

Loop distribution

for (int i = 1; i < 100; ++i)A[i] = f(i);

for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i - 1];

t Sris u, t Tris u

t Sris Ñ ris ut Tris Ñ ris u

t Sris Ñ ris ulocal: Hconflict: Hñ parallel

t Tris Ñ ris ulocal: Hconflict: Hñ parallel

Page 129: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 29 / 82

Loop Distribution and Parallelismfor (int i = 1; i < 100; ++i) {S: A[i] = f(i);T: B[i] = A[i] + A[i - 1];}

t Sris Ñ ris; Tris Ñ ris u

t Sris u, t Tris u

Dependences:

t Sris Ñ Tris : 1 ď i ă 100; Sris Ñ Tri ` 1s : 1 ď i , i ` 1 ă 100 u

Loop distribution

for (int i = 1; i < 100; ++i)A[i] = f(i);

for (int i = 1; i < 100; ++i)B[i] = A[i] + A[i - 1];

t Sris u, t Tris u

t Sris Ñ ris ut Tris Ñ ris u

t Sris Ñ ris ulocal: Hconflict: Hñ parallel

t Tris Ñ ris ulocal: Hconflict: Hñ parallel

Page 130: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 30 / 82

Parallelism Examplefor (int i = 1; i < 6; ++i)

for (int j = 0; j < 6; ++j)S: A[i][j] = f(A[i - 1][[j + 1]);

Dependences:

t Sri , js Ñ Sri ` 1, j ´ 1s : 1 ď i , i ` 1 ă 6^ 0 ď j , j ´ 1 ă 6 u

i

j

1

2

12 original schedule:Sri , js Ñ ri , jsnew schedule:Sri , js Ñ ri ` j , ispi ` jq-direction is outer parallel

Decomposition: loop skewing + loop interchange

ri , js ri , i ` js ri ` j , is

Page 131: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 30 / 82

Parallelism Examplefor (int i = 1; i < 6; ++i)

for (int j = 0; j < 6; ++j)S: A[i][j] = f(A[i - 1][[j + 1]);

Dependences:

t Sri , js Ñ Sri ` 1, j ´ 1s : 1 ď i , i ` 1 ă 6^ 0 ď j , j ´ 1 ă 6 u

i

j

1

2

12 original schedule:Sri , js Ñ ri , jsnew schedule:Sri , js Ñ ri ` j , ispi ` jq-direction is outer parallel

Decomposition: loop skewing + loop interchange

ri , js ri , i ` js ri ` j , is

Page 132: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 30 / 82

Parallelism Examplefor (int i = 1; i < 6; ++i)

for (int j = 0; j < 6; ++j)S: A[i][j] = f(A[i - 1][[j + 1]);

Dependences:

t Sri , js Ñ Sri ` 1, j ´ 1s : 1 ď i , i ` 1 ă 6^ 0 ď j , j ´ 1 ă 6 u

i

j

1

2

12 original schedule:Sri , js Ñ ri , jsnew schedule:Sri , js Ñ ri ` j , ispi ` jq-direction is outer parallel

Decomposition: loop skewing + loop interchange

ri , js ri , i ` js ri ` j , is

Page 133: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 30 / 82

Parallelism Examplefor (int i = 1; i < 6; ++i)

for (int j = 0; j < 6; ++j)S: A[i][j] = f(A[i - 1][[j + 1]);

Dependences:

t Sri , js Ñ Sri ` 1, j ´ 1s : 1 ď i , i ` 1 ă 6^ 0 ď j , j ´ 1 ă 6 u

i

j

1

2

12 original schedule:Sri , js Ñ ri , jsnew schedule:Sri , js Ñ ri ` j , ispi ` jq-direction is outer parallel

Decomposition: loop skewing + loop interchange

ri , js ri , i ` js ri ` j , is

Page 134: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 31 / 82

Schedule Properties

ValidityNew schedule should preserve meaningParallelismCan the iterations of a given loop be executed in parallel?

LocalityStatement instances scheduled closely to each otherTilabilityCan a given schedule band be tiled?

Page 135: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 31 / 82

Schedule Properties

ValidityNew schedule should preserve meaningParallelismCan the iterations of a given loop be executed in parallel?LocalityStatement instances scheduled closely to each other

TilabilityCan a given schedule band be tiled?

Page 136: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 32 / 82

LocalityStatement instances i and j that reuse memoryñ scheduled closely to each other: f pjq ´ f piq small

Types of locality:temporal localityñ instances that access the same memory elementspatial localityñ instances that access adjacent memory elements

Sometimes further distinction made:self localityñ pair of instances from same statementgroup localityñ any pair of statement instances

Temporal locality often restricted topairs of writes and reads that refer to the same valueñ dataflow

Page 137: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 32 / 82

LocalityStatement instances i and j that reuse memoryñ scheduled closely to each other: f pjq ´ f piq small

Types of locality:temporal localityñ instances that access the same memory elementspatial localityñ instances that access adjacent memory elements

Sometimes further distinction made:self localityñ pair of instances from same statementgroup localityñ any pair of statement instances

Temporal locality often restricted topairs of writes and reads that refer to the same valueñ dataflow

Page 138: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 32 / 82

LocalityStatement instances i and j that reuse memoryñ scheduled closely to each other: f pjq ´ f piq small

Types of locality:temporal localityñ instances that access the same memory elementspatial localityñ instances that access adjacent memory elements

Sometimes further distinction made:self localityñ pair of instances from same statementgroup localityñ any pair of statement instances

Temporal locality often restricted topairs of writes and reads that refer to the same valueñ dataflow

Page 139: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 32 / 82

LocalityStatement instances i and j that reuse memoryñ scheduled closely to each other: f pjq ´ f piq small

Types of locality:temporal localityñ instances that access the same memory elementspatial localityñ instances that access adjacent memory elements

Sometimes further distinction made:self localityñ pair of instances from same statementgroup localityñ any pair of statement instances

Temporal locality often restricted topairs of writes and reads that refer to the same valueñ dataflow

Page 140: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 33 / 82

Array Dataflow Analysis [14]

Given a read from an array element, what was the last write tothe same array element before the read?

for (i = 0; i < N; ++i)for (j = 0; j < N - i; ++j)

F: a[i+j] = f(a[i+j]);for (i = 0; i < N; ++i)G: g(a[i]);

F

G

a

A1

A2

Access relations:A1 “ t Fri , js Ñ ari ` js : 0 ď i ă N ^ 0 ď j ă N ´ i uA2 “ t Gris Ñ aris : 0 ď i ă N u

Map to all writes: R2 “ A1´1 ˝A2 “ t Gris Ñ Fri 1, i ´ i 1s : 0 ď i 1 ď i ă N u

Map to all preceding writes:R 1 “ R2 X păSq

´1 “ t Gris Ñ Fri 1, i ´ i 1s : 0 ď i 1 ď i ă N uLast preceding write: R “ maxăS

R 1 “ t Gris Ñ Fri , 0s : 0 ď i ă N u

Page 141: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 33 / 82

Array Dataflow Analysis [14]

Given a read from an array element, what was the last write tothe same array element before the read?

for (i = 0; i < N; ++i)for (j = 0; j < N - i; ++j)

F: a[i+j] = f(a[i+j]);for (i = 0; i < N; ++i)G: g(a[i]);

F

G

a

A1

A2

Access relations:A1 “ t Fri , js Ñ ari ` js : 0 ď i ă N ^ 0 ď j ă N ´ i uA2 “ t Gris Ñ aris : 0 ď i ă N u

Map to all writes: R2 “ A1´1 ˝A2 “ t Gris Ñ Fri 1, i ´ i 1s : 0 ď i 1 ď i ă N u

Map to all preceding writes:R 1 “ R2 X păSq

´1 “ t Gris Ñ Fri 1, i ´ i 1s : 0 ď i 1 ď i ă N uLast preceding write: R “ maxăS

R 1 “ t Gris Ñ Fri , 0s : 0 ď i ă N u

Page 142: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 33 / 82

Array Dataflow Analysis [14]

Given a read from an array element, what was the last write tothe same array element before the read?

for (i = 0; i < N; ++i)for (j = 0; j < N - i; ++j)

F: a[i+j] = f(a[i+j]);for (i = 0; i < N; ++i)G: g(a[i]);

F

G

a

A1

A2

Access relations:A1 “ t Fri , js Ñ ari ` js : 0 ď i ă N ^ 0 ď j ă N ´ i uA2 “ t Gris Ñ aris : 0 ď i ă N u

Map to all writes: R2 “ A1´1 ˝A2 “ t Gris Ñ Fri 1, i ´ i 1s : 0 ď i 1 ď i ă N u

Map to all preceding writes:R 1 “ R2 X păSq

´1 “ t Gris Ñ Fri 1, i ´ i 1s : 0 ď i 1 ď i ă N uLast preceding write: R “ maxăS

R 1 “ t Gris Ñ Fri , 0s : 0 ď i ă N u

Page 143: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 33 / 82

Array Dataflow Analysis [14]

Given a read from an array element, what was the last write tothe same array element before the read?

for (i = 0; i < N; ++i)for (j = 0; j < N - i; ++j)

F: a[i+j] = f(a[i+j]);for (i = 0; i < N; ++i)G: g(a[i]);

F

G

a

A1

A2

Access relations:A1 “ t Fri , js Ñ ari ` js : 0 ď i ă N ^ 0 ď j ă N ´ i uA2 “ t Gris Ñ aris : 0 ď i ă N u

Map to all writes: R2 “ A1´1 ˝A2 “ t Gris Ñ Fri 1, i ´ i 1s : 0 ď i 1 ď i ă N u

Map to all preceding writes:R 1 “ R2 X păSq

´1 “ t Gris Ñ Fri 1, i ´ i 1s : 0 ď i 1 ď i ă N uLast preceding write: R “ maxăS

R 1 “ t Gris Ñ Fri , 0s : 0 ď i ă N u

Page 144: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 33 / 82

Array Dataflow Analysis [14]

Given a read from an array element, what was the last write tothe same array element before the read?

for (i = 0; i < N; ++i)for (j = 0; j < N - i; ++j)

F: a[i+j] = f(a[i+j]);for (i = 0; i < N; ++i)G: g(a[i]);

F

G

a

A1

A2

Access relations:A1 “ t Fri , js Ñ ari ` js : 0 ď i ă N ^ 0 ď j ă N ´ i uA2 “ t Gris Ñ aris : 0 ď i ă N u

Map to all writes: R2 “ A1´1 ˝A2 “ t Gris Ñ Fri 1, i ´ i 1s : 0 ď i 1 ď i ă N u

Map to all preceding writes:R 1 “ R2 X păSq

´1 “ t Gris Ñ Fri 1, i ´ i 1s : 0 ď i 1 ď i ă N u

Last preceding write: R “ maxăSR 1 “ t Gris Ñ Fri , 0s : 0 ď i ă N u

Page 145: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 33 / 82

Array Dataflow Analysis [14]

Given a read from an array element, what was the last write tothe same array element before the read?

for (i = 0; i < N; ++i)for (j = 0; j < N - i; ++j)

F: a[i+j] = f(a[i+j]);for (i = 0; i < N; ++i)G: g(a[i]);

F

G

a

A1

A2

Access relations:A1 “ t Fri , js Ñ ari ` js : 0 ď i ă N ^ 0 ď j ă N ´ i uA2 “ t Gris Ñ aris : 0 ď i ă N u

Map to all writes: R2 “ A1´1 ˝A2 “ t Gris Ñ Fri 1, i ´ i 1s : 0 ď i 1 ď i ă N u

Map to all preceding writes:R 1 “ R2 X păSq

´1 “ t Gris Ñ Fri 1, i ´ i 1s : 0 ď i 1 ď i ă N uLast preceding write: R “ maxăS

R 1 “ t Gris Ñ Fri , 0s : 0 ď i ă N u

Page 146: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 34 / 82

Schedule Properties

ValidityNew schedule should preserve meaningParallelismCan the iterations of a given loop be executed in parallel?LocalityStatement instances scheduled closely to each other

TilabilityCan a given schedule band be tiled?

Page 147: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 34 / 82

Schedule Properties

ValidityNew schedule should preserve meaningParallelismCan the iterations of a given loop be executed in parallel?LocalityStatement instances scheduled closely to each otherTilabilityCan a given schedule band be tiled?

Page 148: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 35 / 82

Tiling a BandInput:

band of affine schedule functions

f1, f2, . . . , fn

tile sizesT1,T2, . . . ,Tn

Steps (conceptually)1 divide each direction into chunks of size Ti (strip-mining)

tf1{T1u , f1, tf2{T2u , f2, . . . , tfn{Tnu , fn

does not change execution order ñ always valid2 combine the chunking (interchange)

tf1{T1u , tf2{T2u , . . . , tfn{Tnu , f1, f2, . . . , fn

sufficient condition for interchange:all members are valid for local dependences at (top of) bandñ permutable band

Page 149: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 35 / 82

Tiling a BandInput:

band of affine schedule functions

f1, f2, . . . , fn

tile sizesT1,T2, . . . ,Tn

Steps (conceptually)1 divide each direction into chunks of size Ti (strip-mining)

tf1{T1u , f1, tf2{T2u , f2, . . . , tfn{Tnu , fn

does not change execution order ñ always valid2 combine the chunking (interchange)

tf1{T1u , tf2{T2u , . . . , tfn{Tnu , f1, f2, . . . , fn

sufficient condition for interchange:all members are valid for local dependences at (top of) bandñ permutable band

Page 150: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 35 / 82

Tiling a BandInput:

band of affine schedule functions

f1, f2, . . . , fn

tile sizesT1,T2, . . . ,Tn

Steps (conceptually)1 divide each direction into chunks of size Ti (strip-mining)

tf1{T1u , f1, tf2{T2u , f2, . . . , tfn{Tnu , fn

does not change execution order ñ always valid

2 combine the chunking (interchange)

tf1{T1u , tf2{T2u , . . . , tfn{Tnu , f1, f2, . . . , fn

sufficient condition for interchange:all members are valid for local dependences at (top of) bandñ permutable band

Page 151: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 35 / 82

Tiling a BandInput:

band of affine schedule functions

f1, f2, . . . , fn

tile sizesT1,T2, . . . ,Tn

Steps (conceptually)1 divide each direction into chunks of size Ti (strip-mining)

tf1{T1u , f1, tf2{T2u , f2, . . . , tfn{Tnu , fn

does not change execution order ñ always valid2 combine the chunking (interchange)

tf1{T1u , tf2{T2u , . . . , tfn{Tnu , f1, f2, . . . , fn

sufficient condition for interchange:all members are valid for local dependences at (top of) bandñ permutable band

Page 152: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 35 / 82

Tiling a BandInput:

band of affine schedule functions

f1, f2, . . . , fn

tile sizesT1,T2, . . . ,Tn

Steps (conceptually)1 divide each direction into chunks of size Ti (strip-mining)

tf1{T1u , f1, tf2{T2u , f2, . . . , tfn{Tnu , fn

does not change execution order ñ always valid2 combine the chunking (interchange)

tf1{T1u , tf2{T2u , . . . , tfn{Tnu , f1, f2, . . . , fn

sufficient condition for interchange:all members are valid for local dependences at (top of) bandñ permutable band

Page 153: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 36 / 82

Loop Tiling Examplefor (int i = 0; i < 8; ++i)

for (int j = 0; j < 8; ++j)S: C[i][j] = A[i] * B[j];

1 strip-mine2 interchange

Sri , js Ñ iSri , js Ñ j

Page 154: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 36 / 82

Loop Tiling Examplefor (int i = 0; i < 8; ++i)

for (int j = 0; j < 8; ++j)S: C[i][j] = A[i] * B[j];

1 strip-mine2 interchange

Sri , js Ñ iSri , js Ñ j

Page 155: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 36 / 82

Loop Tiling Examplefor (int i = 0; i < 8; ++i)

for (int j = 0; j < 8; ++j)S: C[i][j] = A[i] * B[j];

1 strip-mine

2 interchange

Sri , js Ñ 4 ti{4u

Sri , js Ñ iSri , js Ñ 4 tj{4u

Sri , js Ñ j

for (int ti = 0; ti < 8; ti += 4)for (int i = ti; i < ti + 4; ++i)

for (int tj = 0; tj < 8; tj += 4)for (int j = tj; j < tj + 4; ++j)

C[i][j] = A[i] * B[j];

Page 156: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Schedules May 30, 2017 36 / 82

Loop Tiling Examplefor (int i = 0; i < 8; ++i)

for (int j = 0; j < 8; ++j)S: C[i][j] = A[i] * B[j];

1 strip-mine2 interchange

Sri , js Ñ 4 ti{4u

Sri , js Ñ 4 tj{4u

Sri , js Ñ iSri , js Ñ j

for (int ti = 0; ti < 8; ti += 4)for (int tj = 0; tj < 8; tj += 4)

for (int i = ti; i < ti + 4; ++i)for (int j = tj; j < tj + 4; ++j)

C[i][j] = A[i] * B[j];

Page 157: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 37 / 82

Operations on Polyhedral Model

Model Extraction§ Input: AST§ Output: instance set, access relations, original schedule

Dependence analysis§ Input: instance set, access relations, original schedule§ Output: dependence relations

Scheduling§ Input: instance set, dependence relations§ Output: schedule

AST generation (polyhedral scanning, code generation)§ Input: instance set, schedule§ Output: AST

Data layout transformations§ Input: access relations, dependence relations§ Output: transformed access relations

Page 158: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 38 / 82

Polyhedral Model Requirements

Requirements for basic polyhedral model: “regular” codeStatic controlñ control does not depend on input dataAffineñ all relevant expressions are (quasi-)affineNo Aliasingñ essentially no pointer manipulations

Note:polyhedral model may be approximation of input that does not strictlysatisfy all requirementsmany extensions are available

Page 159: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 38 / 82

Polyhedral Model Requirements

Requirements for basic polyhedral model: “regular” codeStatic controlñ control does not depend on input dataAffineñ all relevant expressions are (quasi-)affineNo Aliasingñ essentially no pointer manipulations

Note:polyhedral model may be approximation of input that does not strictlysatisfy all requirementsmany extensions are available

Page 160: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 39 / 82

Aliasing [1]

Some possible ways of handling aliasing:

use an input language that does not permit aliasingpretend the problem does not existrequire user to ensure absence of aliasingñ e.g., use restrict keywordhandle as may-writeñ may lead to too many dependencescheck aliasing at run-timeñ use original code in case of aliasing

Page 161: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 40 / 82

Operations on Polyhedral Model

Model Extraction§ Input: AST§ Output: instance set, access relations, original schedule

Dependence analysis§ Input: instance set, access relations, original schedule§ Output: dependence relations

Scheduling§ Input: instance set, dependence relations§ Output: schedule

AST generation (polyhedral scanning, code generation)§ Input: instance set, schedule§ Output: AST

Data layout transformations§ Input: access relations, dependence relations§ Output: transformed access relations

Page 162: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 40 / 82

Operations on Polyhedral Model

Model Extraction§ Input: AST§ Output: instance set, access relations, original schedule

Dependence analysis§ Input: instance set, access relations, original schedule§ Output: dependence relations

Scheduling§ Input: instance set, dependence relations§ Output: schedule

AST generation (polyhedral scanning, code generation)§ Input: instance set, schedule§ Output: AST

Data layout transformations§ Input: access relations, dependence relations§ Output: transformed access relations

Page 163: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 40 / 82

Operations on Polyhedral Model

Model Extraction§ Input: AST§ Output: instance set, access relations, original schedule

Dependence analysis§ Input: instance set, access relations, original schedule§ Output: dependence relations

Scheduling§ Input: instance set, dependence relations§ Output: schedule

AST generation (polyhedral scanning, code generation)§ Input: instance set, schedule§ Output: AST

Data layout transformations§ Input: access relations, dependence relations§ Output: transformed access relations

Page 164: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 41 / 82

Polyhedral Scheduling [10, 15]

Polyhedral model can be used toevaluate a schedule and/orconstruct a schedule

Some popular polyhedral schedulers:Feautrier

§ maximal inner parallelismñ carry as many dependences as possible at outer bands

Pluto§ tilable bands§ locality: f pjq ´ f piq smallñ parallelism as extreme case: f pjq ´ f piq “ 0

Many other scheduling algorithms have been proposed

Page 165: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 41 / 82

Polyhedral Scheduling [10, 15]

Polyhedral model can be used toevaluate a schedule and/orconstruct a schedule

Some popular polyhedral schedulers:Feautrier

§ maximal inner parallelismñ carry as many dependences as possible at outer bands

Pluto§ tilable bands§ locality: f pjq ´ f piq smallñ parallelism as extreme case: f pjq ´ f piq “ 0

Many other scheduling algorithms have been proposed

Page 166: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 42 / 82

Operations on Polyhedral Model

Model Extraction§ Input: AST§ Output: instance set, access relations, original schedule

Dependence analysis§ Input: instance set, access relations, original schedule§ Output: dependence relations

Scheduling§ Input: instance set, dependence relations§ Output: schedule

AST generation (polyhedral scanning, code generation)§ Input: instance set, schedule§ Output: AST

Data layout transformations§ Input: access relations, dependence relations§ Output: transformed access relations

Page 167: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 42 / 82

Operations on Polyhedral Model

Model Extraction§ Input: AST§ Output: instance set, access relations, original schedule

Dependence analysis§ Input: instance set, access relations, original schedule§ Output: dependence relations

Scheduling§ Input: instance set, dependence relations§ Output: schedule

AST generation (polyhedral scanning, code generation)§ Input: instance set, schedule§ Output: AST

Data layout transformations§ Input: access relations, dependence relations§ Output: transformed access relations

Page 168: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 42 / 82

Operations on Polyhedral Model

Model Extraction§ Input: AST§ Output: instance set, access relations, original schedule

Dependence analysis§ Input: instance set, access relations, original schedule§ Output: dependence relations

Scheduling§ Input: instance set, dependence relations§ Output: schedule

AST generation (polyhedral scanning, code generation)§ Input: instance set, schedule§ Output: AST

Data layout transformations§ Input: access relations, dependence relations§ Output: transformed access relations

Page 169: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 43 / 82

Data layout transformations [12, 13]

Memory compactionReuse memory locations to store different data

ñ apply non-injective mapping to array elementsñ reduce memory requirementsñ extreme case: replace array by scalar

for (int i = 0; i < 100; ++i) {A[i] = f(i);B[i] = g(A[i]);

}

ExpansionUse different memory locations to store different data

ñ map different accesses to memory element to distinct locationsñ increase scheduling freedom (e.g., more parallelism)

Page 170: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 43 / 82

Data layout transformations [12, 13]

Memory compactionReuse memory locations to store different data

ñ apply non-injective mapping to array elementsñ reduce memory requirementsñ extreme case: replace array by scalar

for (int i = 0; i < 100; ++i) {A[i] = f(i);B[i] = g(A[i]);

}

ExpansionUse different memory locations to store different data

ñ map different accesses to memory element to distinct locationsñ increase scheduling freedom (e.g., more parallelism)

Page 171: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 43 / 82

Data layout transformations [12, 13]

Memory compactionReuse memory locations to store different data

ñ apply non-injective mapping to array elementsñ reduce memory requirementsñ extreme case: replace array by scalar

for (int i = 0; i < 100; ++i) {A

[i]

= f(i);B[i] = g(A

[i]

);}

ExpansionUse different memory locations to store different data

ñ map different accesses to memory element to distinct locationsñ increase scheduling freedom (e.g., more parallelism)

Page 172: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 43 / 82

Data layout transformations [12, 13]

Memory compactionReuse memory locations to store different data

ñ apply non-injective mapping to array elementsñ reduce memory requirementsñ extreme case: replace array by scalar

for (int i = 0; i < 100; ++i) {A

[i]

= f(i);B[i] = g(A

[i]

);}

ExpansionUse different memory locations to store different data

ñ map different accesses to memory element to distinct locationsñ increase scheduling freedom (e.g., more parallelism)

Page 173: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 44 / 82

False Dependencesfor (int i = 0; i < n; ++i) {S: t = f1(A[i]);T: B[i] = f2(t);}Dependences

read-after-write (“true”): t Sris Ñ Tri 1s : i 1 ě i u

write-after-read (“anti”): t Tris Ñ Sri 1s : i 1 ą i u

write-after-write (“output”): t Sris Ñ Sri 1s : i 1 ą i u

“false”

False dependences not from dataflow, but from reuse of memory location t

Possible solution: expansion/privatizationfor (int i = 0; i < n; ++i) {S: t[i] = f1(A[i]);T: B[i] = f2(t[i]);}

dataflow (subset of “true” dependences): t Sris Ñ Tris u

Page 174: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 44 / 82

False Dependencesfor (int i = 0; i < n; ++i) {S: t = f1(A[i]);T: B[i] = f2(t);}Dependences

read-after-write (“true”): t Sris Ñ Tri 1s : i 1 ě i u

write-after-read (“anti”): t Tris Ñ Sri 1s : i 1 ą i u

write-after-write (“output”): t Sris Ñ Sri 1s : i 1 ą i u“false”

False dependences not from dataflow, but from reuse of memory location t

Possible solution: expansion/privatizationfor (int i = 0; i < n; ++i) {S: t[i] = f1(A[i]);T: B[i] = f2(t[i]);}

dataflow (subset of “true” dependences): t Sris Ñ Tris u

Page 175: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 44 / 82

False Dependencesfor (int i = 0; i < n; ++i) {S: t = f1(A[i]);T: B[i] = f2(t);}Dependences

read-after-write (“true”): t Sris Ñ Tri 1s : i 1 ě i u

write-after-read (“anti”): t Tris Ñ Sri 1s : i 1 ą i u

write-after-write (“output”): t Sris Ñ Sri 1s : i 1 ą i u“false”

False dependences not from dataflow, but from reuse of memory location t

Possible solution: expansion/privatizationfor (int i = 0; i < n; ++i) {S: t[i] = f1(A[i]);T: B[i] = f2(t[i]);}

dataflow (subset of “true” dependences): t Sris Ñ Tris u

Page 176: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 45 / 82

ExpansionAssume:

instance sets and access relations are static and exactñ each read has exactly one corresponding writesingle read and write per statementñ expanded array indexed by statement instance of write

for (int i = 0; i < n; ++i) {S: t = f1(A[i]);T: B[i] = f2(t);}

Dataflow: t Sris Ñ Tris u

for (int i = 0; i < n; ++i) {S: S[i] = f1(A[i]);T: B[i] = f2(S[i]);}

ñ only remaining dependences are dataflow induced

Page 177: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 45 / 82

ExpansionAssume:

instance sets and access relations are static and exactñ each read has exactly one corresponding writesingle read and write per statementñ expanded array indexed by statement instance of write

for (int i = 0; i < n; ++i) {S: t = f1(A[i]);T: B[i] = f2(t);}

Dataflow: t Sris Ñ Tris u

for (int i = 0; i < n; ++i) {S: S[i] = f1(A[i]);T: B[i] = f2(S[i]);}

ñ only remaining dependences are dataflow induced

Page 178: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 45 / 82

ExpansionAssume:

instance sets and access relations are static and exactñ each read has exactly one corresponding writesingle read and write per statementñ expanded array indexed by statement instance of write

for (int i = 0; i < n; ++i) {S: t = f1(A[i]);T: B[i] = f2(t);}

Dataflow: t Sris Ñ Tris u

for (int i = 0; i < n; ++i) {S: S[i] = f1(A[i]);T: B[i] = f2(S[i]);}

ñ only remaining dependences are dataflow induced

Page 179: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 46 / 82

Maximal Static Expansion [5]

for (int i = 0; i < n; ++i) {S1: t = f1(i);S2: A[i] = t;S3: t = f2(i);S4: if (f3(i))S5: t = f4(i);S6: B[i] = t;}

t1[i] = f1(i);A[i] = t1[i];t2[i] = f2(i);if (f3(i))

t2[i] = f4(i);B[i] = t2[i];

Dataflow cannot be determined independently of run-time information

ñ approximate dataflowt S1ris Ñ S2ris; S3ris Ñ S6ris; S5ris Ñ S6ris u

ñ a read may be associated to more than one writeñ corresponding equivalence classes should not be expanded apart

Page 180: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 46 / 82

Maximal Static Expansion [5]

for (int i = 0; i < n; ++i) {S1: t = f1(i);S2: A[i] = t;S3: t = f2(i);S4: if (f3(i))S5: t = f4(i);S6: B[i] = t;}

t1[i] = f1(i);A[i] = t1[i];t2[i] = f2(i);if (f3(i))

t2[i] = f4(i);B[i] = t2[i];

Dataflow cannot be determined independently of run-time information

ñ approximate dataflowt S1ris Ñ S2ris; S3ris Ñ S6ris; S5ris Ñ S6ris u

ñ a read may be associated to more than one writeñ corresponding equivalence classes should not be expanded apart

Page 181: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 46 / 82

Maximal Static Expansion [5]

for (int i = 0; i < n; ++i) {S1: t = f1(i);S2: A[i] = t;S3: t = f2(i);S4: if (f3(i))S5: t = f4(i);S6: B[i] = t;}

t1[i] = f1(i);A[i] = t1[i];t2[i] = f2(i);if (f3(i))

t2[i] = f4(i);B[i] = t2[i];

Dataflow cannot be determined independently of run-time information

ñ approximate dataflowt S1ris Ñ S2ris; S3ris Ñ S6ris; S5ris Ñ S6ris u

ñ a read may be associated to more than one writeñ corresponding equivalence classes should not be expanded apart

Page 182: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 46 / 82

Maximal Static Expansion [5]

for (int i = 0; i < n; ++i) {S1: t = f1(i);S2: A[i] = t;S3: t = f2(i);S4: if (f3(i))S5: t = f4(i);S6: B[i] = t;}

t1[i] = f1(i);A[i] = t1[i];t2[i] = f2(i);if (f3(i))

t2[i] = f4(i);B[i] = t2[i];

Dataflow cannot be determined independently of run-time information

ñ approximate dataflowt S1ris Ñ S2ris; S3ris Ñ S6ris; S5ris Ñ S6ris u

ñ a read may be associated to more than one writeñ corresponding equivalence classes should not be expanded apart

Page 183: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 47 / 82

Approximate Dataflow Analysis

How to compute dataflow in presence of data dependent control?

Two approachesDirect computation

§ distinguish between may- and must-writes

Derived from exact run-time dependent dataflow§ compute exact dataflow in terms of run-time information§ exploit properties of run-time information§ project out run-time information

Page 184: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 47 / 82

Approximate Dataflow Analysis

How to compute dataflow in presence of data dependent control?

Two approachesDirect computation

§ distinguish between may- and must-writesDerived from exact run-time dependent dataflow

§ compute exact dataflow in terms of run-time information§ exploit properties of run-time information§ project out run-time information

Page 185: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 47 / 82

Approximate Dataflow Analysis

How to compute dataflow in presence of data dependent control?

Two approachesDirect computation

§ distinguish between may- and must-writesDerived from exact run-time dependent dataflow

§ compute exact dataflow in terms of run-time information§ exploit properties of run-time information§ project out run-time information

Page 186: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 48 / 82

May WritesKeep track of whether write is possible or definite

Must-writesArray elements are definitely written by statement instanceMay-writesArray elements are possibly written by statement instance

§ statement instance not necessarily executedfor (i = 0; i < n; ++i)

if (A[i] > 0)S: B[i] = A[i];May-write: t Sris Ñ Bris u

§ array element not necessarily accessedint A[N];/* ... */T: A[B[0]] = 5;May-write: t Trs Ñ Aras : 0 ď a ă N u

Must-write access relation is subset of may-write access relation

Page 187: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 48 / 82

May WritesKeep track of whether write is possible or definite

Must-writesArray elements are definitely written by statement instanceMay-writesArray elements are possibly written by statement instance

§ statement instance not necessarily executedfor (i = 0; i < n; ++i)

if (A[i] > 0)S: B[i] = A[i];May-write: t Sris Ñ Bris u

§ array element not necessarily accessedint A[N];/* ... */T: A[B[0]] = 5;May-write: t Trs Ñ Aras : 0 ď a ă N u

Must-write access relation is subset of may-write access relation

Page 188: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 48 / 82

May WritesKeep track of whether write is possible or definite

Must-writesArray elements are definitely written by statement instanceMay-writesArray elements are possibly written by statement instance

§ statement instance not necessarily executedfor (i = 0; i < n; ++i)

if (A[i] > 0)S: B[i] = A[i];May-write: t Sris Ñ Bris u

§ array element not necessarily accessedint A[N];/* ... */T: A[B[0]] = 5;May-write: t Trs Ñ Aras : 0 ď a ă N u

Must-write access relation is subset of may-write access relation

Page 189: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 49 / 82

Approximate Dataflow — Direct Computation

Read-after-write dependences§ write and read access same memory location§ write executed before the read

ñ Approximate dataflow analysis with no must-writesDataflow dependences

§ write and read access same memory location§ write executed before the read§ no intermediate write to same memory locationñ intermediate write kills dependence

Approximate dataflow dependences§ may-write and read access same memory location§ may-write executed before the read§ no intermediate must-write to same memory locationñ intermediate must-write kills dependence

Page 190: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 49 / 82

Approximate Dataflow — Direct Computation

Read-after-write dependences§ write and read access same memory location§ write executed before the read

ñ Approximate dataflow analysis with no must-writes

Dataflow dependences§ write and read access same memory location§ write executed before the read§ no intermediate write to same memory locationñ intermediate write kills dependence

Approximate dataflow dependences§ may-write and read access same memory location§ may-write executed before the read§ no intermediate must-write to same memory locationñ intermediate must-write kills dependence

Page 191: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 49 / 82

Approximate Dataflow — Direct Computation

Read-after-write dependences§ write and read access same memory location§ write executed before the read

ñ Approximate dataflow analysis with no must-writes

Dataflow dependences§ write and read access same memory location§ write executed before the read§ no intermediate write to same memory locationñ intermediate write kills dependence

Approximate dataflow dependences§ may-write and read access same memory location§ may-write executed before the read§ no intermediate must-write to same memory locationñ intermediate must-write kills dependence

Page 192: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 49 / 82

Approximate Dataflow — Direct Computation

Read-after-write dependences§ write and read access same memory location§ write executed before the read

ñ Approximate dataflow analysis with no must-writesDataflow dependences

§ write and read access same memory location§ write executed before the read§ no intermediate write to same memory locationñ intermediate write kills dependence

Approximate dataflow dependences§ may-write and read access same memory location§ may-write executed before the read§ no intermediate must-write to same memory locationñ intermediate must-write kills dependence

Page 193: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 50 / 82

Approximate Dataflow Analysis

How to compute dataflow in presence of data dependent control?

Two approachesDirect computation

§ distinguish between may- and must-writesDerived from exact run-time dependent dataflow

§ compute exact dataflow in terms of run-time information§ exploit properties of run-time information§ project out run-time information

Page 194: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 50 / 82

Approximate Dataflow Analysis

How to compute dataflow in presence of data dependent control?

Two approachesDirect computation

§ distinguish between may- and must-writesDerived from exact run-time dependent dataflow

§ compute exact dataflow in terms of run-time information§ exploit properties of run-time information§ project out run-time information

Page 195: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 51 / 82

Run-time Dependent Dataflow Analysis [6, 32]

Approaches“fuzzy array dataflow analysis”“on-demand-parametric array dataflow analysis”

for (int i = 0; i < n; ++i) {S1: t = f1(i);S2: A[i] = t;S3: t = f2(i);S4: if (f3(i))S5: t = f4(i);S6: B[i] = t;}

Run-time dependent dataflowt S1ris Ñ S2ris; S3ris Ñ S6ris : βS5S6 “ 0; S5ris Ñ S6ris : βS5S6 “ 1 uβPC : any potential source instance P is executed for sink C

λPC : last potential source instance P executed for sink C

Approximate dataflow (project out β and λ)t S1ris Ñ S2ris; S3ris Ñ S6ris; S5ris Ñ S6ris u

Page 196: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 51 / 82

Run-time Dependent Dataflow Analysis [6, 32]

Approaches“fuzzy array dataflow analysis”“on-demand-parametric array dataflow analysis”

for (int i = 0; i < n; ++i) {S1: t = f1(i);S2: A[i] = t;S3: t = f2(i);S4: if (f3(i))S5: t = f4(i);S6: B[i] = t;}

Run-time dependent dataflowt S1ris Ñ S2ris; S3ris Ñ S6ris : βS5S6 “ 0; S5ris Ñ S6ris : βS5S6 “ 1 uβPC : any potential source instance P is executed for sink C

λPC : last potential source instance P executed for sink C

Approximate dataflow (project out β and λ)t S1ris Ñ S2ris; S3ris Ñ S6ris; S5ris Ñ S6ris u

Page 197: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 51 / 82

Run-time Dependent Dataflow Analysis [6, 32]

Approaches“fuzzy array dataflow analysis”“on-demand-parametric array dataflow analysis”

for (int i = 0; i < n; ++i) {S1: t = f1(i);S2: A[i] = t;S3: t = f2(i);S4: if (f3(i))S5: t = f4(i);S6: B[i] = t;}

Run-time dependent dataflowt S1ris Ñ S2ris; S3ris Ñ S6ris : βS5S6 “ 0; S5ris Ñ S6ris : βS5S6 “ 1 uβPC : any potential source instance P is executed for sink C

λPC : last potential source instance P executed for sink C

Approximate dataflow (project out β and λ)t S1ris Ñ S2ris; S3ris Ñ S6ris; S5ris Ñ S6ris u

Page 198: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 51 / 82

Run-time Dependent Dataflow Analysis [6, 32]

Approaches“fuzzy array dataflow analysis”“on-demand-parametric array dataflow analysis”

for (int i = 0; i < n; ++i) {S1: t = f1(i);S2: A[i] = t;S3: t = f2(i);S4: if (f3(i))S5: t = f4(i);S6: B[i] = t;}

Run-time dependent dataflowt S1ris Ñ S2ris; S3ris Ñ S6ris : βS5S6 “ 0; S5ris Ñ S6ris : βS5S6 “ 1 uβPC : any potential source instance P is executed for sink C

λPC : last potential source instance P executed for sink C

Approximate dataflow (project out β and λ)t S1ris Ñ S2ris; S3ris Ñ S6ris; S5ris Ñ S6ris u

Page 199: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 51 / 82

Run-time Dependent Dataflow Analysis [6, 32]

Approaches“fuzzy array dataflow analysis”“on-demand-parametric array dataflow analysis”

for (int i = 0; i < n; ++i) {S1: t = f1(i);S2: A[i] = t;S3: t = f2(i);S4: if (f3(i))S5: t = f4(i);S6: B[i] = t;}

Run-time dependent dataflowt S1ris Ñ S2ris; S3ris Ñ S6ris : βS5S6 “ 0; S5ris Ñ S6ris : βS5S6 “ 1 uβPC : any potential source instance P is executed for sink C

λPC : last potential source instance P executed for sink C

Approximate dataflow (project out β and λ)t S1ris Ñ S2ris; S3ris Ñ S6ris; S5ris Ñ S6ris u

Page 200: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 52 / 82

Representing Dynamic ConditionsN1: n = f();

for (int k = 0; k < 100; ++k) {M: m = g();

for (int i = 0; i < m; ++i)for (int j = 0; j < n; ++j)

A: a[j][i] = g();N2: n = f();

}What is instance set (restricted to A statement)?

t Ark, i , js : 0 ď k ă 100^ 0 ď i ă m^ 0 ď j ă n u?ñ no, m and n cannot be treated as symbolic constants

(they are modified inside k-loop)

tArk, i , js : 0 ď k ă 100̂ 0 ď i ă valueOf_mpk q̂ 0 ď j ă valueOf_npkqu?ñ requires uninterpreted functions (of arity ą 0)Alternative: use overapproximation of instance set and keep track of

which elements are executed

Instance set: t Ark , i , js : 0 ď k ă 100^ 0 ď i ^ 0 ď j uFilter:

§ Filter access relations: reader Ñ [writer Ñ array element]‹ F A

1 “

Ark, i , js Ñ rMrks Ñ mrss(

‹ F A2 “

Ar0, i , js Ñ rN1rs Ñ nrss; Ark, i , js Ñ rN2rk ´ 1s Ñ nrsq : k ě 1(

§ Filter value relation:V A “ t Ark, i , js Ñ rm, ns : 0 ď k ď 99^ 0 ď i ă m ^ 0 ď j ă n u

Statement instance is executed iff values written by corresponding writeaccesses (through filter access relations) satisfy filter value relation

Page 201: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 52 / 82

Representing Dynamic ConditionsN1: n = f();

for (int k = 0; k < 100; ++k) {M: m = g();

for (int i = 0; i < m; ++i)for (int j = 0; j < n; ++j)

A: a[j][i] = g();N2: n = f();

}What is instance set (restricted to A statement)?t Ark, i , js : 0 ď k ă 100^ 0 ď i ă m^ 0 ď j ă n u?

ñ no, m and n cannot be treated as symbolic constants(they are modified inside k-loop)

tArk, i , js : 0 ď k ă 100̂ 0 ď i ă valueOf_mpk q̂ 0 ď j ă valueOf_npkqu?ñ requires uninterpreted functions (of arity ą 0)Alternative: use overapproximation of instance set and keep track of

which elements are executed

Instance set: t Ark , i , js : 0 ď k ă 100^ 0 ď i ^ 0 ď j uFilter:

§ Filter access relations: reader Ñ [writer Ñ array element]‹ F A

1 “

Ark, i , js Ñ rMrks Ñ mrss(

‹ F A2 “

Ar0, i , js Ñ rN1rs Ñ nrss; Ark, i , js Ñ rN2rk ´ 1s Ñ nrsq : k ě 1(

§ Filter value relation:V A “ t Ark, i , js Ñ rm, ns : 0 ď k ď 99^ 0 ď i ă m ^ 0 ď j ă n u

Statement instance is executed iff values written by corresponding writeaccesses (through filter access relations) satisfy filter value relation

Page 202: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 52 / 82

Representing Dynamic ConditionsN1: n = f();

for (int k = 0; k < 100; ++k) {M: m = g();

for (int i = 0; i < m; ++i)for (int j = 0; j < n; ++j)

A: a[j][i] = g();N2: n = f();

}What is instance set (restricted to A statement)?t Ark, i , js : 0 ď k ă 100^ 0 ď i ă m^ 0 ď j ă n u?ñ no, m and n cannot be treated as symbolic constants

(they are modified inside k-loop)

tArk, i , js : 0 ď k ă 100̂ 0 ď i ă valueOf_mpk q̂ 0 ď j ă valueOf_npkqu?ñ requires uninterpreted functions (of arity ą 0)Alternative: use overapproximation of instance set and keep track of

which elements are executed

Instance set: t Ark , i , js : 0 ď k ă 100^ 0 ď i ^ 0 ď j uFilter:

§ Filter access relations: reader Ñ [writer Ñ array element]‹ F A

1 “

Ark, i , js Ñ rMrks Ñ mrss(

‹ F A2 “

Ar0, i , js Ñ rN1rs Ñ nrss; Ark, i , js Ñ rN2rk ´ 1s Ñ nrsq : k ě 1(

§ Filter value relation:V A “ t Ark, i , js Ñ rm, ns : 0 ď k ď 99^ 0 ď i ă m ^ 0 ď j ă n u

Statement instance is executed iff values written by corresponding writeaccesses (through filter access relations) satisfy filter value relation

Page 203: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 52 / 82

Representing Dynamic ConditionsN1: n = f();

for (int k = 0; k < 100; ++k) {M: m = g();

for (int i = 0; i < m; ++i)for (int j = 0; j < n; ++j)

A: a[j][i] = g();N2: n = f();

}What is instance set (restricted to A statement)?t Ark, i , js : 0 ď k ă 100^ 0 ď i ă m^ 0 ď j ă n u?ñ no, m and n cannot be treated as symbolic constants

(they are modified inside k-loop)

tArk, i , js : 0 ď k ă 100̂ 0 ď i ă valueOf_mpk q̂ 0 ď j ă valueOf_npkqu?

ñ requires uninterpreted functions (of arity ą 0)Alternative: use overapproximation of instance set and keep track of

which elements are executed

Instance set: t Ark , i , js : 0 ď k ă 100^ 0 ď i ^ 0 ď j uFilter:

§ Filter access relations: reader Ñ [writer Ñ array element]‹ F A

1 “

Ark, i , js Ñ rMrks Ñ mrss(

‹ F A2 “

Ar0, i , js Ñ rN1rs Ñ nrss; Ark, i , js Ñ rN2rk ´ 1s Ñ nrsq : k ě 1(

§ Filter value relation:V A “ t Ark, i , js Ñ rm, ns : 0 ď k ď 99^ 0 ď i ă m ^ 0 ď j ă n u

Statement instance is executed iff values written by corresponding writeaccesses (through filter access relations) satisfy filter value relation

Page 204: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 52 / 82

Representing Dynamic ConditionsN1: n = f();

for (int k = 0; k < 100; ++k) {M: m = g();

for (int i = 0; i < m; ++i)for (int j = 0; j < n; ++j)

A: a[j][i] = g();N2: n = f();

}What is instance set (restricted to A statement)?t Ark, i , js : 0 ď k ă 100^ 0 ď i ă m^ 0 ď j ă n u?ñ no, m and n cannot be treated as symbolic constants

(they are modified inside k-loop)

tArk, i , js : 0 ď k ă 100̂ 0 ď i ă valueOf_mpk q̂ 0 ď j ă valueOf_npkqu?ñ requires uninterpreted functions (of arity ą 0)

Alternative: use overapproximation of instance set and keep track ofwhich elements are executed

Instance set: t Ark , i , js : 0 ď k ă 100^ 0 ď i ^ 0 ď j uFilter:

§ Filter access relations: reader Ñ [writer Ñ array element]‹ F A

1 “

Ark, i , js Ñ rMrks Ñ mrss(

‹ F A2 “

Ar0, i , js Ñ rN1rs Ñ nrss; Ark, i , js Ñ rN2rk ´ 1s Ñ nrsq : k ě 1(

§ Filter value relation:V A “ t Ark, i , js Ñ rm, ns : 0 ď k ď 99^ 0 ď i ă m ^ 0 ď j ă n u

Statement instance is executed iff values written by corresponding writeaccesses (through filter access relations) satisfy filter value relation

Page 205: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 52 / 82

Representing Dynamic ConditionsN1: n = f();

for (int k = 0; k < 100; ++k) {M: m = g();

for (int i = 0; i < m; ++i)for (int j = 0; j < n; ++j)

A: a[j][i] = g();N2: n = f();

}What is instance set (restricted to A statement)?t Ark, i , js : 0 ď k ă 100^ 0 ď i ă m^ 0 ď j ă n u?ñ no, m and n cannot be treated as symbolic constants

(they are modified inside k-loop)

tArk, i , js : 0 ď k ă 100̂ 0 ď i ă valueOf_mpk q̂ 0 ď j ă valueOf_npkqu?ñ requires uninterpreted functions (of arity ą 0)Alternative: use overapproximation of instance set and keep track of

which elements are executed

Instance set: t Ark , i , js : 0 ď k ă 100^ 0 ď i ^ 0 ď j uFilter:

§ Filter access relations: reader Ñ [writer Ñ array element]‹ F A

1 “

Ark, i , js Ñ rMrks Ñ mrss(

‹ F A2 “

Ar0, i , js Ñ rN1rs Ñ nrss; Ark, i , js Ñ rN2rk ´ 1s Ñ nrsq : k ě 1(

§ Filter value relation:V A “ t Ark, i , js Ñ rm, ns : 0 ď k ď 99^ 0 ď i ă m ^ 0 ď j ă n u

Statement instance is executed iff values written by corresponding writeaccesses (through filter access relations) satisfy filter value relation

Page 206: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 52 / 82

Representing Dynamic ConditionsN1: n = f();

for (int k = 0; k < 100; ++k) {M: m = g();

for (int i = 0; i < m; ++i)for (int j = 0; j < n; ++j)

A: a[j][i] = g();N2: n = f();

}Instance set: t Ark , i , js : 0 ď k ă 100^ 0 ď i ^ 0 ď j uFilter:

§ Filter access relations: reader Ñ [writer Ñ array element]‹ F A

1 “

Ark, i , js Ñ rMrks Ñ mrss(

‹ F A2 “

Ar0, i , js Ñ rN1rs Ñ nrss; Ark, i , js Ñ rN2rk ´ 1s Ñ nrsq : k ě 1(

§ Filter value relation:V A “ t Ark , i , js Ñ rm, ns : 0 ď k ď 99^ 0 ď i ă m ^ 0 ď j ă n u

Statement instance is executed iff values written by corresponding writeaccesses (through filter access relations) satisfy filter value relation

Page 207: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 52 / 82

Representing Dynamic ConditionsN1: n = f();

for (int k = 0; k < 100; ++k) {M: m = g();

for (int i = 0; i < m; ++i)for (int j = 0; j < n; ++j)

A: a[j][i] = g();N2: n = f();

}Instance set: t Ark , i , js : 0 ď k ă 100^ 0 ď i ^ 0 ď j uFilter:

§ Filter access relations: reader Ñ [writer Ñ array element]‹ F A

1 “

Ark, i , js Ñ rMrks Ñ mrss(

‹ F A2 “

Ar0, i , js Ñ rN1rs Ñ nrss; Ark, i , js Ñ rN2rk ´ 1s Ñ nrsq : k ě 1(

§ Filter value relation:V A “ t Ark , i , js Ñ rm, ns : 0 ď k ď 99^ 0 ď i ă m ^ 0 ď j ă n u

Statement instance is executed iff values written by corresponding writeaccesses (through filter access relations) satisfy filter value relation

Page 208: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 53 / 82

Parametric Array Dataflow Analysiswhile (1) {N: n = f();

a = g();if (n < 100)

H: a = h();if (n > 200)

T: t(a);}

I “ t Hris : i ě 0; Tris : i ě 0 uF H “ t Hris Ñ rNris Ñ nrss u

V H “ t Hris Ñ rns : i ě 0^ n ă 100 uF T “ t Tris Ñ rNris Ñ nrss u

V T “ t Tris Ñ rns : i ě 0^ n ą 200 u

Is there any dataflow between potential source and sink at inner level?

M “ t Tris Ñ Hris uF H ˝M Ď F T

ñ filter elements accessed by any potential source instance associated tosink instance forms subset of filter elements accessed by sink instance

ñ constraints on filter values at sink also apply at corresponding potentialsource: V T ˝M´1 “ t Hris Ñ rns : i ě 0^ n ą 200 u

`

V T ˝M´1˘

X V H “ Hñ there can be no dataflow at inner level

potential source

sink

Page 209: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 53 / 82

Parametric Array Dataflow Analysiswhile (1) {N: n = f();

a = g();if (n < 100)

H: a = h();if (n > 200)

T: t(a);}

I “ t Hris : i ě 0; Tris : i ě 0 uF H “ t Hris Ñ rNris Ñ nrss u

V H “ t Hris Ñ rns : i ě 0^ n ă 100 uF T “ t Tris Ñ rNris Ñ nrss u

V T “ t Tris Ñ rns : i ě 0^ n ą 200 u

Is there any dataflow between potential source and sink at inner level?

M “ t Tris Ñ Hris uF H ˝M Ď F T

ñ filter elements accessed by any potential source instance associated tosink instance forms subset of filter elements accessed by sink instance

ñ constraints on filter values at sink also apply at corresponding potentialsource: V T ˝M´1 “ t Hris Ñ rns : i ě 0^ n ą 200 u

`

V T ˝M´1˘

X V H “ Hñ there can be no dataflow at inner level

potential source

sink

Page 210: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 53 / 82

Parametric Array Dataflow Analysiswhile (1) {N: n = f();

a = g();if (n < 100)

H: a = h();if (n > 200)

T: t(a);}

I “ t Hris : i ě 0; Tris : i ě 0 uF H “ t Hris Ñ rNris Ñ nrss u

V H “ t Hris Ñ rns : i ě 0^ n ă 100 uF T “ t Tris Ñ rNris Ñ nrss u

V T “ t Tris Ñ rns : i ě 0^ n ą 200 u

Is there any dataflow between potential source and sink at inner level?M “ t Tris Ñ Hris u

F H ˝M Ď F T

ñ filter elements accessed by any potential source instance associated tosink instance forms subset of filter elements accessed by sink instance

ñ constraints on filter values at sink also apply at corresponding potentialsource: V T ˝M´1 “ t Hris Ñ rns : i ě 0^ n ą 200 u

`

V T ˝M´1˘

X V H “ Hñ there can be no dataflow at inner level

potential source

sink

Page 211: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 53 / 82

Parametric Array Dataflow Analysiswhile (1) {N: n = f();

a = g();if (n < 100)

H: a = h();if (n > 200)

T: t(a);}

I “ t Hris : i ě 0; Tris : i ě 0 uF H “ t Hris Ñ rNris Ñ nrss u

V H “ t Hris Ñ rns : i ě 0^ n ă 100 uF T “ t Tris Ñ rNris Ñ nrss u

V T “ t Tris Ñ rns : i ě 0^ n ą 200 u

Is there any dataflow between potential source and sink at inner level?M “ t Tris Ñ Hris uF H ˝M Ď F T

ñ filter elements accessed by any potential source instance associated tosink instance forms subset of filter elements accessed by sink instance

ñ constraints on filter values at sink also apply at corresponding potentialsource: V T ˝M´1 “ t Hris Ñ rns : i ě 0^ n ą 200 u

`

V T ˝M´1˘

X V H “ Hñ there can be no dataflow at inner level

potential source

sink

Page 212: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 53 / 82

Parametric Array Dataflow Analysiswhile (1) {N: n = f();

a = g();if (n < 100)

H: a = h();if (n > 200)

T: t(a);}

I “ t Hris : i ě 0; Tris : i ě 0 uF H “ t Hris Ñ rNris Ñ nrss u

V H “ t Hris Ñ rns : i ě 0^ n ă 100 uF T “ t Tris Ñ rNris Ñ nrss u

V T “ t Tris Ñ rns : i ě 0^ n ą 200 u

Is there any dataflow between potential source and sink at inner level?M “ t Tris Ñ Hris uF H ˝M Ď F T

ñ filter elements accessed by any potential source instance associated tosink instance forms subset of filter elements accessed by sink instance

ñ constraints on filter values at sink also apply at corresponding potentialsource: V T ˝M´1 “ t Hris Ñ rns : i ě 0^ n ą 200 u

`

V T ˝M´1˘

X V H “ Hñ there can be no dataflow at inner level

potential source

sink

Page 213: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 53 / 82

Parametric Array Dataflow Analysiswhile (1) {N: n = f();

a = g();if (n < 100)

H: a = h();if (n > 200)

T: t(a);}

I “ t Hris : i ě 0; Tris : i ě 0 uF H “ t Hris Ñ rNris Ñ nrss u

V H “ t Hris Ñ rns : i ě 0^ n ă 100 uF T “ t Tris Ñ rNris Ñ nrss u

V T “ t Tris Ñ rns : i ě 0^ n ą 200 u

Is there any dataflow between potential source and sink at inner level?M “ t Tris Ñ Hris uF H ˝M Ď F T

ñ filter elements accessed by any potential source instance associated tosink instance forms subset of filter elements accessed by sink instance

ñ constraints on filter values at sink also apply at corresponding potentialsource: V T ˝M´1 “ t Hris Ñ rns : i ě 0^ n ą 200 u

`

V T ˝M´1˘

X V H “ Hñ there can be no dataflow at inner level

potential source

sink

Page 214: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 54 / 82

Polyhedral Process Networks [24]

Main purpose: extract task level parallelism from dataflow graph

statement Ñ processflow dependence Ñ communication channel

ñ requires dataflow analysisProcesses are mapped to parallel hardware (e.g., FPGA)

Example:

for (int i = 0; i < n; ++i) {S: t = f1(A[i]);T: B[i] = f2(t);}

for (int i = 0; i < n; ++i)write(fifo , f1(A[i]));

for (int i = 0; i < n; ++i)B[i] = f2(read(fifo ));

Page 215: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 54 / 82

Polyhedral Process Networks [24]

Main purpose: extract task level parallelism from dataflow graph

statement Ñ processflow dependence Ñ communication channel

ñ requires dataflow analysisProcesses are mapped to parallel hardware (e.g., FPGA)

Example:

for (int i = 0; i < n; ++i) {S: t = f1(A[i]);T: B[i] = f2(t);}

for (int i = 0; i < n; ++i)write(fifo , f1(A[i]));

for (int i = 0; i < n; ++i)B[i] = f2(read(fifo ));

Page 216: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 54 / 82

Polyhedral Process Networks [24]

Main purpose: extract task level parallelism from dataflow graph

statement Ñ processflow dependence Ñ communication channel

ñ requires dataflow analysisProcesses are mapped to parallel hardware (e.g., FPGA)

Example:

for (int i = 0; i < n; ++i) {S: t = f1(A[i]);T: B[i] = f2(t);}

for (int i = 0; i < n; ++i)write(fifo , f1(A[i]));

for (int i = 0; i < n; ++i)B[i] = f2(read(fifo ));

Page 217: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 55 / 82

Process Networks with Dynamic Control

for (int i = 0; i < n; ++i) {S1: t = f1(i);S2: A[i] = t;S3: t = f2(i);S4: if (f3(i))S5: t = f4(i);S6: B[i] = t;}

Run-time dependent dataflow:t S1ris Ñ S2ris; S3ris Ñ S6ris : βS5

S6 “ 0;

S5ris Ñ S6ris : βS5S6 “ 1; S4ris Ñ S5ris u

f1

out_1ND_0

in_0ND_1

ED_1

f2

out_1ND_2

in_2ND_5

ED_2

f3

out_1ND_3

in_0ND_4

ED_0

f4

out_2ND_4dc0_ND_4_b

in_0ND_5

ED_3dc0_ND_5_b

CED_4

in_0ND_5

Page 218: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Operations May 30, 2017 55 / 82

Process Networks with Dynamic Control

for (int i = 0; i < n; ++i) {S1: t = f1(i);S2: A[i] = t;S3: t = f2(i);S4: if (f3(i))S5: t = f4(i);S6: B[i] = t;}

Run-time dependent dataflow:t S1ris Ñ S2ris; S3ris Ñ S6ris : βS5

S6 “ 0;

S5ris Ñ S6ris : βS5S6 “ 1; S4ris Ñ S5ris u

f1

out_1ND_0

in_0ND_1

ED_1

f2

out_1ND_2

in_2ND_5

ED_2

f3

out_1ND_3

in_0ND_4

ED_0

f4

out_2ND_4dc0_ND_4_b

in_0ND_5

ED_3dc0_ND_5_b

CED_4

in_0ND_5

Page 219: High-Level Loop Transformations and Polyhedral Compilation

Polyhedral Compilation Software May 30, 2017 56 / 82

Polyhedral Software [4, 7, 8, 9, 10, 11, 16, 18, 19, 20, 21, 22, 23, 29, 31, 34]

http://polyhedral.info/software.html

Core set manipulation libraries§ integer sets: isl, omega(+), . . .§ rational sets: PolyLib, PPL, . . .

Model extraction§ clan, pet, . . .

Dependence analysis§ petit, candl, isl, FADA, . . .

Scheduler libraries§ LetSee, isl, . . .

AST generation§ omega(+), CLooG, isl, . . .

Source-to-source polyhedral compilers§ Pluto, PoCC, PPCG, . . .

Compilers using polyhedral compilation§ gcc/graphite, LLVM/Polly, . . .

Page 220: High-Level Loop Transformations and Polyhedral Compilation

PPCG May 30, 2017 57 / 82

Outline1 Loop Transformations

Loop DistributionLoop FusionLoop Tiling

2 Polyhedral CompilationIntroductionPolyhedral ModelSchedulesOperationsSoftware

3 PPCGOverviewModel ExtractionDependence AnalysisSchedulingDevice Mapping

Page 221: High-Level Loop Transformations and Polyhedral Compilation

PPCG Overview May 30, 2017 58 / 82

CARP Project (2011–2015)Design tools and techniques to aidCorrect and Efficient Accelerator Programming

GPUs CPUs FPGAs Other accelerators

OpenCL

pencilPlatform-Neutral

Compute Intermediate Language

Domain Specific Languages

Optimizing, auto-parallelizingpencil Ñ OpenCL compiler

DSL Ñ pencil compilers

Direct OpenCLprogramming

Direct pencilprogramming(hand writtenpencil code)

Page 222: High-Level Loop Transformations and Polyhedral Compilation

PPCG Overview May 30, 2017 58 / 82

CARP Project (2011–2015)Design tools and techniques to aidCorrect and Efficient Accelerator Programming

GPUs CPUs FPGAs Other accelerators

OpenCL

pencilPlatform-Neutral

Compute Intermediate Language

Domain Specific Languages

Optimizing, auto-parallelizingpencil Ñ OpenCL compiler

DSL Ñ pencil compilers

Direct OpenCLprogramming

Direct pencilprogramming(hand writtenpencil code)

Page 223: High-Level Loop Transformations and Polyhedral Compilation

PPCG Overview May 30, 2017 59 / 82

PPCG Overview [31]

pen

c

il

PPCG OpenCL

CUDA

OpenMP

PPCG:detect/expose parallelismmap parts of the code to an acceleratorcopy data to/from deviceintroduce local copies of data

pencil:C99 with restrictions and some extra builtins and pragmas pencil

Page 224: High-Level Loop Transformations and Polyhedral Compilation

PPCG Overview May 30, 2017 59 / 82

PPCG Overview [31]

pen

c

il

PPCG OpenCL

CUDA

OpenMP

PPCG:detect/expose parallelismmap parts of the code to an acceleratorcopy data to/from deviceintroduce local copies of data

pencil:C99 with restrictions and some extra builtins and pragmas pencil

Page 225: High-Level Loop Transformations and Polyhedral Compilation

PPCG Overview May 30, 2017 59 / 82

PPCG Overview [31]

pencil PPCG OpenCL

CUDA

OpenMP

PPCG:detect/expose parallelismmap parts of the code to an acceleratorcopy data to/from deviceintroduce local copies of data

pencil:C99 with restrictions and some extra builtins and pragmas pencil

Page 226: High-Level Loop Transformations and Polyhedral Compilation

PPCG Overview May 30, 2017 60 / 82

PPCG Internal Structure [31]

C extraction accessesschedule

instances

context

dependence analysis dependences

dead codeelimination

instancesschedule constraintsschedulingschedule

mapping to device schedule AST generation AST

OpenCL CUDA

Note: as currently implemented (version 0.07), not necessarily how it should be implemented

Page 227: High-Level Loop Transformations and Polyhedral Compilation

PPCG Overview May 30, 2017 61 / 82

Connection with other Libraries and Tools

LLVM imath GMP

clang isl NTL PolyLib

Polly pet barvinok

PPCG isa iscc

pencilcc

Licenses:BSD/MITLGPLGPL

isl: manipulates parametric affine sets and relationspet: extracts polyhedral model from clang ASTPPCG: Polyhedral Parallel Code Generatorpencilcc: pencil compiler

Page 228: High-Level Loop Transformations and Polyhedral Compilation

PPCG Model Extraction May 30, 2017 62 / 82

Instance SetRegion that needs to be extracted may be

marked by

#pragma scop#pragma endscop

autodetected (--pet-autodetect)

Internal structured dynamic control is encapsulated

for (int x = 0; x < n; ++x) {A: s = f();B: while (P(x, s))

s = g(s);C: h(s);}

Instance set: t Arxs : 0 ď x ă n; Brxs : 0 ď x ă n; Crxs : 0 ď x ă n u

Note: currently, internal order of accesses is lostñ possible loss of accuracy in dependence analysis

Page 229: High-Level Loop Transformations and Polyhedral Compilation

PPCG Model Extraction May 30, 2017 62 / 82

Instance SetRegion that needs to be extracted may be

marked by

#pragma scop#pragma endscop

autodetected (--pet-autodetect)

Internal structured dynamic control is encapsulated

for (int x = 0; x < n; ++x) {A: s = f();B: while (P(x, s))

s = g(s);C: h(s);}

Instance set: t Arxs : 0 ď x ă n; Brxs : 0 ď x ă n; Crxs : 0 ď x ă n u

Note: currently, internal order of accesses is lostñ possible loss of accuracy in dependence analysis

Page 230: High-Level Loop Transformations and Polyhedral Compilation

PPCG Model Extraction May 30, 2017 62 / 82

Instance SetRegion that needs to be extracted may be

marked by

#pragma scop#pragma endscop

autodetected (--pet-autodetect)

Internal structured dynamic control is encapsulated

for (int x = 0; x < n; ++x) {A: s = f();B: while (P(x, s))

s = g(s);C: h(s);}

Instance set: t Arxs : 0 ď x ă n; Brxs : 0 ď x ă n; Crxs : 0 ď x ă n u

Note: currently, internal order of accesses is lostñ possible loss of accuracy in dependence analysis

Page 231: High-Level Loop Transformations and Polyhedral Compilation

PPCG Model Extraction May 30, 2017 63 / 82

Inlining

Enabled through C99 inline keyword on function definition

inline void set_diagonal(int n,float A[const restrict static n][n], float v)

{for (int i = 0; i < n; ++i)

U: A[i][i] = v;}

void f(int n, float A[const restrict static n][n]){#pragma scopS: set_diagonal(n, A, 0.f);

for (int i = 0; i < n; ++i)for (int j = i + 1; j < n; ++j)

T: A[i][j] += A[i][j - 1] + 1;#pragma endscop}

Instance set: t Uris : 0 ď i ă n; Tri , js : 0 ď i ă j ă n u

Page 232: High-Level Loop Transformations and Polyhedral Compilation

PPCG Model Extraction May 30, 2017 63 / 82

Inlining

Enabled through C99 inline keyword on function definition

inline void set_diagonal(int n,float A[const restrict static n][n], float v)

{for (int i = 0; i < n; ++i)

U: A[i][i] = v;}

void f(int n, float A[const restrict static n][n]){#pragma scopS: set_diagonal(n, A, 0.f);

for (int i = 0; i < n; ++i)for (int j = i + 1; j < n; ++j)

T: A[i][j] += A[i][j - 1] + 1;#pragma endscop}

Instance set: t Uris : 0 ď i ă n; Tri , js : 0 ď i ă j ă n u

Page 233: High-Level Loop Transformations and Polyhedral Compilation

PPCG Model Extraction May 30, 2017 63 / 82

Inlining

Enabled through C99 inline keyword on function definition

inline void set_diagonal(int n,float A[const restrict static n][n], float v)

{for (int i = 0; i < n; ++i)

U: A[i][i] = v;}

void f(int n, float A[const restrict static n][n]){#pragma scopS: set_diagonal(n, A, 0.f);

for (int i = 0; i < n; ++i)for (int j = i + 1; j < n; ++j)

T: A[i][j] += A[i][j - 1] + 1;#pragma endscop}

Instance set: t Uris : 0 ď i ă n; Tri , js : 0 ď i ă j ă n u

Page 234: High-Level Loop Transformations and Polyhedral Compilation

PPCG Model Extraction May 30, 2017 64 / 82

Access Relations and Function Calls

void set_diagonal(int n,float A[const restrict static n][n], float v)

{for (int i = 0; i < n; ++i)

U: A[i][i] = v;}

void f(int n, float A[const restrict static n][n]){#pragma scopS: set_diagonal(n, A, 0.f);

for (int i = 0; i < n; ++i)for (int j = i + 1; j < n; ++j)

T: A[i][j] += A[i][j - 1] + 1;#pragma endscop}

May-write: t Srs Ñ Ari , is : 0 ď i ă n; Tri , js Ñ Ari , js : 0 ď i ă j ă n uMust-write: t Srs Ñ Ari , is : 0 ď i ă n; Tri , js Ñ Ari , js : 0 ď i ă j ă n u

Page 235: High-Level Loop Transformations and Polyhedral Compilation

PPCG Model Extraction May 30, 2017 64 / 82

Access Relations and Function Calls

void set_diagonal(int n,float A[const restrict static n][n], float v)

{for (int i = 0; i < n; ++i)

U: A[i][i] = v;}

void f(int n, float A[const restrict static n][n]){#pragma scopS: set_diagonal(n, A, 0.f);

for (int i = 0; i < n; ++i)for (int j = i + 1; j < n; ++j)

T: A[i][j] += A[i][j - 1] + 1;#pragma endscop}

May-write: t Srs Ñ Ari , is : 0 ď i ă n; Tri , js Ñ Ari , js : 0 ď i ă j ă n uMust-write: t Srs Ñ Ari , is : 0 ď i ă n; Tri , js Ñ Ari , js : 0 ď i ă j ă n u

Page 236: High-Level Loop Transformations and Polyhedral Compilation

PPCG Model Extraction May 30, 2017 65 / 82

Access Relations and Structures [26]

struct s {int a;int b;

};

int f(){

struct s a, b[10];

S: a.b = 57;T: a.a = 42;

for (int i = 0; i < 10; ++i)U: b[i] = a;}

Must-write:t Srs Ñ a_brars Ñ brss; Trs Ñ a_arars Ñ arss;

Uris Ñ b_arbris Ñ arss; Uris Ñ b_brbris Ñ brss u

Page 237: High-Level Loop Transformations and Polyhedral Compilation

PPCG Model Extraction May 30, 2017 65 / 82

Access Relations and Structures [26]

struct s {int a;int b;

};

int f(){

struct s a, b[10];

S: a.b = 57;T: a.a = 42;

for (int i = 0; i < 10; ++i)U: b[i] = a;}

Must-write:t Srs Ñ a_brars Ñ brss; Trs Ñ a_arars Ñ arss;

Uris Ñ b_arbris Ñ arss; Uris Ñ b_brbris Ñ brss u

Page 238: High-Level Loop Transformations and Polyhedral Compilation

PPCG Model Extraction May 30, 2017 66 / 82

Summary Functions [2, 26]

Analysis of accesses in called function may be inaccurate or even infeasiblemissing body (library function without source)unstructured controlaliasingpattern inside dynamic control is ignoredadditional information not explicitly expressed in code

ñ explicitly specify accesses in summary function pencil

Page 239: High-Level Loop Transformations and Polyhedral Compilation

PPCG Model Extraction May 30, 2017 67 / 82

Summary Function Exampleint f(int i); int maybe (); struct s { int a; };void set_odd_summary(int n, struct s A[static n]) {

for (int i = 1; i < n; i += 2)if (maybe ())

A[i].a = 0;}__attribute__ (( pencil_access(set_odd_summary )))void set_odd(int n, struct s A[static n]){

for (int i = 0; i < n; ++i)A[2 * f(i) + 1].a = i;

}void foo(int n, struct s B[static 2 * n]){#pragma scopS: set_odd (2 * n, B);#pragma endscop}

May-write: t Srs Ñ B_arBris Ñ arss : 0 ď i ă 2n u

Page 240: High-Level Loop Transformations and Polyhedral Compilation

PPCG Model Extraction May 30, 2017 67 / 82

Summary Function Exampleint f(int i); int maybe (); struct s { int a; };void set_odd_summary(int n, struct s A[static n]) {

for (int i = 1; i < n; i += 2)if (maybe ())

A[i].a = 0;}__attribute__ (( pencil_access(set_odd_summary )))void set_odd(int n, struct s A[static n]){

for (int i = 0; i < n; ++i)A[2 * f(i) + 1].a = i;

}void foo(int n, struct s B[static 2 * n]){#pragma scopS: set_odd (2 * n, B);#pragma endscop}

May-write: t Srs Ñ B_arBris Ñ arss : 0 ď i ă 2n u

Page 241: High-Level Loop Transformations and Polyhedral Compilation

PPCG Model Extraction May 30, 2017 67 / 82

Summary Function Exampleint f(int i); int maybe (); struct s { int a; };void set_odd_summary(int n, struct s A[static n]) {

for (int i = 1; i < n; i += 2)if (maybe ())

A[i].a = 0;}__attribute__ (( pencil_access(set_odd_summary )))void set_odd(int n, struct s A[static n]){

for (int i = 0; i < n; ++i)A[2 * f(i) + 1].a = i;

}void foo(int n, struct s B[static 2 * n]){#pragma scopS: set_odd (2 * n, B);#pragma endscop}

May-write: t Srs Ñ B_arBris Ñ arss : 0 ď i ă 2n u

Page 242: High-Level Loop Transformations and Polyhedral Compilation

PPCG Model Extraction May 30, 2017 67 / 82

Summary Function Exampleint f(int i); int maybe (); struct s { int a; };void set_odd_summary(int n, struct s A[static n]) {

for (int i = 1; i < n; i += 2)if (maybe ())

A[i].a = 0;}__attribute__ (( pencil_access(set_odd_summary )))void set_odd(int n, struct s A[static n]){

for (int i = 0; i < n; ++i)A[2 * f(i) + 1].a = i;

}void foo(int n, struct s B[static 2 * n]){#pragma scopS: set_odd (2 * n, B);#pragma endscop}

May-write: t Srs Ñ B_arBris Ñ arss : 0 ď i ă 2n u

Page 243: High-Level Loop Transformations and Polyhedral Compilation

PPCG Model Extraction May 30, 2017 67 / 82

Summary Function Exampleint f(int i); int maybe (); struct s { int a; };void set_odd_summary(int n, struct s A[static n]) {

for (int i = 1; i < n; i += 2)if (maybe ())

A[i].a = 0;}__attribute__ (( pencil_access(set_odd_summary )))void set_odd(int n, struct s A[static n]){

for (int i = 0; i < n; ++i)A[2 * f(i) + 1].a = i;

}void foo(int n, struct s B[static 2 * n]){#pragma scopS: set_odd (2 * n, B);#pragma endscop}

May-write: t Srs Ñ B_arBris Ñ arss : 0 ď i ă 2n ^ i mod 2 “ 1 u

Page 244: High-Level Loop Transformations and Polyhedral Compilation

PPCG Model Extraction May 30, 2017 68 / 82

Context

The context collects constraints on the symbolic constantsderived by pet

§ exclude values that result in undefined behavior‹ negative array sizes‹ out-of-bounds accesses‹ signed integer overflow

§ __builtin_assume or __pencil_assume pencil

ñ any constraint can be specifiedñ only quasi-affine constraints on symbolic constants are exploited

specified on PPCG command line§ --ctx§ --assume-non-negative-parameters

Main purpose: simplify generated AST

Page 245: High-Level Loop Transformations and Polyhedral Compilation

PPCG Dependence Analysis May 30, 2017 69 / 82

Dependence analysis in isl [27, 28]

isl contains generic dependence analysis engineñ determines dependence relations between “sources” and “sinks”

Input:Sink K : I Ñ D

May-source Y : I Ñ D

Kill L : I Ñ D

Schedule S on I ñ defines “before” and “intermediate”

Output:May-dependence relation: triples pi, k, aq

§ i has a may-source to a§ k has a sink to a§ i is scheduled before k§ there is no intermediate kill to a

May-no-source: sinks kÑ a with no kill to a before k

Page 246: High-Level Loop Transformations and Polyhedral Compilation

PPCG Dependence Analysis May 30, 2017 69 / 82

Dependence analysis in isl [27, 28]

isl contains generic dependence analysis engineñ determines dependence relations between “sources” and “sinks”

Input:Sink K : I Ñ D

May-source Y : I Ñ D

Kill L : I Ñ D

Schedule S on I ñ defines “before” and “intermediate”Output:

May-dependence relation: triples pi, k, aq§ i has a may-source to a§ k has a sink to a§ i is scheduled before k§ there is no intermediate kill to a

May-no-source: sinks kÑ a with no kill to a before k

Page 247: High-Level Loop Transformations and Polyhedral Compilation

PPCG Dependence Analysis May 30, 2017 69 / 82

Dependence analysis in isl [27, 28]

isl contains generic dependence analysis engineñ determines dependence relations between “sources” and “sinks”

Input:Sink K : I Ñ D

May-source Y : I Ñ D

Kill L : I Ñ D

Schedule S on I ñ defines “before” and “intermediate”Output:

May-dependence relation: triples pi, k, aq§ i has a may-source to a§ k has a sink to a§ i is scheduled before k§ there is no intermediate kill to a

May-no-source: sinks kÑ a with no kill to a before k

Page 248: High-Level Loop Transformations and Polyhedral Compilation

PPCG Dependence Analysis May 30, 2017 70 / 82

Dependence analysis in PPCG [28]

isl:May-dependence relation: triples pi, k, aq

§ i has a may-source to a§ k has a sink to a§ i is scheduled before k§ there is no intermediate kill to a

May-no-source: sinks kÑ a with no kill to a before k

PPCG (without live-range reordering):flow dependences (without a) and live-in (may-no-source)

§ sink: may-read§ may-source: may-write§ kill: must-write

or pure kill

false dependences (without a)§ sink: may-write§ may-source: may-read or may-write§ kill: must-write

killed writes (without k) (ñ removed from may-write to get live-out)§ sink: must-write

or pure kill

§ may-source: may-write

Page 249: High-Level Loop Transformations and Polyhedral Compilation

PPCG Dependence Analysis May 30, 2017 71 / 82

Live-Range Reordering [26, 28]

a = f1();f2(a);a = f3();f4(a);

: flow: false

Reordering rejected due to false dependences

allows such live-ranges to be reorderedusing somewhat different classification of dependencescomputed using different calls to the same dependence analysis engine

Page 250: High-Level Loop Transformations and Polyhedral Compilation

PPCG Dependence Analysis May 30, 2017 71 / 82

Live-Range Reordering [26, 28]

a = f1();f2(a);a = f3();f4(a);

: flow: false

Reordering rejected due to false dependences

allows such live-ranges to be reorderedusing somewhat different classification of dependencescomputed using different calls to the same dependence analysis engine

Page 251: High-Level Loop Transformations and Polyhedral Compilation

PPCG Dependence Analysis May 30, 2017 71 / 82

Live-Range Reordering [26, 28]

a = f1();f2(a);a = f3();f4(a);

: flow: false

Reordering rejected due to false dependences

Live-range reorderingallows such live-ranges to be reorderedusing somewhat different classification of dependencescomputed using different calls to the same dependence analysis engine

Page 252: High-Level Loop Transformations and Polyhedral Compilation

PPCG Dependence Analysis May 30, 2017 72 / 82

Pure Kills [26]

Basic idea:Must-writes kill dependences to earlier writesPure kills can also be usefulUsed only as kills during dependence analysis, not as source

Kills can be insertedautomatically by pet

§ Variable declared within SCoPñ kill at declarationñ kill at end of enclosing block (if within SCoP)

§ Variable declared in scope that contains SCoP, only used insideñ kill at end of SCoP

manually by the user§ __pencil_kill pencil

Page 253: High-Level Loop Transformations and Polyhedral Compilation

PPCG Dependence Analysis May 30, 2017 73 / 82

Dependence analysis in PPCG [28]

isl:May-dependence relation: triples pi, k, aq

§ i has a may-source to a§ k has a sink to a§ i is scheduled before k§ there is no intermediate kill to a

May-no-source: sinks kÑ a with no kill to a before k

PPCG (without live-range reordering):flow dependences (without a) and live-in (may-no-source)

§ sink: may-read§ may-source: may-write§ kill: must-write

or pure kill

false dependences (without a)§ sink: may-write§ may-source: may-read or may-write§ kill: must-write

killed writes (without k) (ñ removed from may-write to get live-out)§ sink: must-write

or pure kill

§ may-source: may-write

Page 254: High-Level Loop Transformations and Polyhedral Compilation

PPCG Dependence Analysis May 30, 2017 73 / 82

Dependence analysis in PPCG [28]

isl:May-dependence relation: triples pi, k, aq

§ i has a may-source to a§ k has a sink to a§ i is scheduled before k§ there is no intermediate kill to a

May-no-source: sinks kÑ a with no kill to a before k

PPCG (without live-range reordering):flow dependences (without a) and live-in (may-no-source)

§ sink: may-read§ may-source: may-write§ kill: must-write or pure kill

false dependences (without a)§ sink: may-write§ may-source: may-read or may-write§ kill: must-write

killed writes (without k) (ñ removed from may-write to get live-out)§ sink: must-write or pure kill§ may-source: may-write

Page 255: High-Level Loop Transformations and Polyhedral Compilation

PPCG Dependence Analysis May 30, 2017 74 / 82

Kill Examplevoid f(int n, int A[restrict static n],

int B[restrict static n]){

int t;#pragma scop

for (int i = 0; i < n; ++i) {t = A[i];B[i] = t;

}__pencil_kill(t);

#pragma endscop}

Without kill of t, compiler needs to assume t may be used after loopñ last write needs to remain lastñ limited scheduling freedom (even with live-range reordering)

Note: kill inserted automatically by pet (if t not used after SCoP)

Page 256: High-Level Loop Transformations and Polyhedral Compilation

PPCG Dependence Analysis May 30, 2017 74 / 82

Kill Examplevoid f(int n, int A[restrict static n],

int B[restrict static n]){

int t;#pragma scop

for (int i = 0; i < n; ++i) {t = A[i];B[i] = t;

}__pencil_kill(t);

#pragma endscop}

Without kill of t, compiler needs to assume t may be used after loopñ last write needs to remain lastñ limited scheduling freedom (even with live-range reordering)

Note: kill inserted automatically by pet (if t not used after SCoP)

Page 257: High-Level Loop Transformations and Polyhedral Compilation

PPCG Dependence Analysis May 30, 2017 74 / 82

Kill Examplevoid f(int n, int A[restrict static n],

int B[restrict static n]){

int t;#pragma scop

for (int i = 0; i < n; ++i) {t = A[i];B[i] = t;

}__pencil_kill(t);

#pragma endscop}

Without kill of t, compiler needs to assume t may be used after loopñ last write needs to remain lastñ limited scheduling freedom (even with live-range reordering)

Note: kill inserted automatically by pet (if t not used after SCoP)

Page 258: High-Level Loop Transformations and Polyhedral Compilation

PPCG Dependence Analysis May 30, 2017 75 / 82

Absence of Loop Carried Dependences [26]

void foo(int n, int A[restrict static n][n],int B[restrict static n][n])

{for (int i = 0; i < n; ++i)

#pragma pencil independentfor (int j = 0; j < n; ++j)

B[i][A[i][j]] = i + j;}

Assume each row of A has distinct elementsñ no loop-carried dependences, but PPCG cannot tellñ add #pragma pencil independent pencil

Note: not handled very efficiently in current version of PPCGñ only add when needed

Page 259: High-Level Loop Transformations and Polyhedral Compilation

PPCG Dependence Analysis May 30, 2017 75 / 82

Absence of Loop Carried Dependences [26]

void foo(int n, int A[restrict static n][n],int B[restrict static n][n])

{for (int i = 0; i < n; ++i)

#pragma pencil independentfor (int j = 0; j < n; ++j)

B[i][A[i][j]] = i + j;}

Assume each row of A has distinct elementsñ no loop-carried dependences, but PPCG cannot tellñ add #pragma pencil independent pencil

Note: not handled very efficiently in current version of PPCGñ only add when needed

Page 260: High-Level Loop Transformations and Polyhedral Compilation

PPCG Dependence Analysis May 30, 2017 75 / 82

Absence of Loop Carried Dependences [26]

void foo(int n, int A[restrict static n][n],int B[restrict static n][n])

{for (int i = 0; i < n; ++i)

#pragma pencil independentfor (int j = 0; j < n; ++j)

B[i][A[i][j]] = i + j;}

Assume each row of A has distinct elementsñ no loop-carried dependences, but PPCG cannot tellñ add #pragma pencil independent pencil

Note: not handled very efficiently in current version of PPCGñ only add when needed

Page 261: High-Level Loop Transformations and Polyhedral Compilation

PPCG Scheduling May 30, 2017 76 / 82

Optimization Criteria for PPCG

Two levels of parallelismñ blocks and threads (work groups and work items)ñ parallelism

In PPCG, second level obtained through tilingñ tilability

Reduced working set for some arraysñ mapping to shared memory or registers

Obtained through tilingñ tilability

Reduced data movementñ locality

Simple schedulesñ schedule used in several subsequent steps, including AST generationñ simplicity

Page 262: High-Level Loop Transformations and Polyhedral Compilation

PPCG Scheduling May 30, 2017 76 / 82

Optimization Criteria for PPCG

Two levels of parallelismñ blocks and threads (work groups and work items)ñ parallelism

In PPCG, second level obtained through tilingñ tilability

Reduced working set for some arraysñ mapping to shared memory or registers

Obtained through tilingñ tilability

Reduced data movementñ locality

Simple schedulesñ schedule used in several subsequent steps, including AST generationñ simplicity

Page 263: High-Level Loop Transformations and Polyhedral Compilation

PPCG Scheduling May 30, 2017 77 / 82

Scheduling Constraints [28]

Validity aÑ bñ statement instance b needs to be executed after añ f pbq ě f paq

Proximity aÑ bñ statement instance b preferably executed close to añ f pbq ´ f paq as small as possible

Coincidence aÑ bñ statement instance b preferably executed together with añ f pbq “ f paqñ band member only considered “coincident” if it coschedules all pairs

Conditional validity (live-range reordering)§ condition bÑ c (ø flow dependences)§ conditioned validity aÑ b, cÑ d (ø order dependences)

Schedule constraints only relevant if coscheduled by outer nodesOther schedule constraints are said to be carried by some outer node

Page 264: High-Level Loop Transformations and Polyhedral Compilation

PPCG Scheduling May 30, 2017 77 / 82

Scheduling Constraints [28]

Validity aÑ bñ statement instance b needs to be executed after añ f pbq ě f paq

Proximity aÑ bñ statement instance b preferably executed close to añ f pbq ´ f paq as small as possible

Coincidence aÑ bñ statement instance b preferably executed together with añ f pbq “ f paqñ band member only considered “coincident” if it coschedules all pairs

Conditional validity (live-range reordering)§ condition bÑ c (ø flow dependences)§ conditioned validity aÑ b, cÑ d (ø order dependences)

Schedule constraints only relevant if coscheduled by outer nodesOther schedule constraints are said to be carried by some outer node

Page 265: High-Level Loop Transformations and Polyhedral Compilation

PPCG Scheduling May 30, 2017 77 / 82

Scheduling Constraints [28]

Validity aÑ bñ statement instance b needs to be executed after añ f pbq ě f paq

Proximity aÑ bñ statement instance b preferably executed close to añ f pbq ´ f paq as small as possible

Coincidence aÑ bñ statement instance b preferably executed together with añ f pbq “ f paqñ band member only considered “coincident” if it coschedules all pairs

Conditional validity (live-range reordering)§ condition bÑ c (ø flow dependences)§ conditioned validity aÑ b, cÑ d (ø order dependences)

Schedule constraints only relevant if coscheduled by outer nodesOther schedule constraints are said to be carried by some outer node

Page 266: High-Level Loop Transformations and Polyhedral Compilation

PPCG Scheduling May 30, 2017 78 / 82

Dependences and Schedule Constraints [28]

Traditional dependencesflow dependencesñ validity constraintsñ proximity constraintsñ coincidence constraints (when parallelism is important)

false dependencesñ validity constraintsñ coincidence constraints (when parallelism is important)ñ proximity constraints (optional for memory reuse)

pairs of reads with shared write (“input dependences”)ñ proximity constraints (optional)

Live-range reorderingsomewhat different classification of dependencesslightly different mapping to schedule constraints

Current PPCGadds false dependences to proximity constraints for historical reasonsdoes not consider input dependencesuses live-range reordering by default

Page 267: High-Level Loop Transformations and Polyhedral Compilation

PPCG Scheduling May 30, 2017 79 / 82

Forced Outer Coincidence SchedulerRecall:

Feautrier§ maximal inner parallelismñ carry as many dependences as possible at outer bands

Pluto§ tilable bands§ locality: f pjq ´ f piq smallñ parallelism as extreme case: f pjq ´ f piq “ 0

PPCG uses variant of Pluto-algorithm with Feautrier fallbackñ force outer coincidence in each bandñ locally fall back to Feautrier if infeasible (single step)

Members in bands constructed by Pluto-algorithm are permutableñ if outer member cannot be coincident, then no member can be

Each step in Feautrier algorithm carries as many dependences as possibleñ subsequent application of Pluto more likely to find coincident member

Page 268: High-Level Loop Transformations and Polyhedral Compilation

PPCG Scheduling May 30, 2017 79 / 82

Forced Outer Coincidence SchedulerRecall:

Feautrier§ maximal inner parallelismñ carry as many dependences as possible at outer bands

Pluto§ tilable bands§ locality: f pjq ´ f piq smallñ parallelism as extreme case: f pjq ´ f piq “ 0

PPCG uses variant of Pluto-algorithm with Feautrier fallbackñ force outer coincidence in each bandñ locally fall back to Feautrier if infeasible (single step)

Members in bands constructed by Pluto-algorithm are permutableñ if outer member cannot be coincident, then no member can be

Each step in Feautrier algorithm carries as many dependences as possibleñ subsequent application of Pluto more likely to find coincident member

Page 269: High-Level Loop Transformations and Polyhedral Compilation

PPCG Device Mapping May 30, 2017 80 / 82

Device Mapping [31]

Input: schedule tree

If schedule tree contains no coincident band memberñ generate pure CPU code

Otherwise:select subtree for mapping to the deviceis entire schedule tree, except

§ coincidence-free children of outer set node§ coincidence-free initial children of outer sequence node

within selected subtree, generate kernels for§ outermost bands with coincident members§ maximal coincidence-free subtreesñ insert zero-dimensional band node

add data copying to/from device around selected subtreeadd device initialization and clean-up around entire schedule tree

Page 270: High-Level Loop Transformations and Polyhedral Compilation

PPCG Device Mapping May 30, 2017 80 / 82

Device Mapping [31]

Input: schedule tree

If schedule tree contains no coincident band memberñ generate pure CPU code

Otherwise:select subtree for mapping to the deviceselected subtree is entire schedule tree, except

§ coincidence-free children of outer set node§ coincidence-free initial children of outer sequence node

within selected subtree, generate kernels for§ outermost bands with coincident members§ maximal coincidence-free subtreesñ insert zero-dimensional band node

add data copying to/from device around selected subtreeadd device initialization and clean-up around entire schedule tree

Page 271: High-Level Loop Transformations and Polyhedral Compilation

PPCG Device Mapping May 30, 2017 81 / 82

Data Copying to/from DeviceCopy-out:

take may-writesremove writes only needed for dataflow inside selected subtreeapproximate to entire array

May-persist:elements that may need to be preserved by selected subtreeconsists of

§ elements that may need to be preserved by entire SCoPñ elements not definitely written and not definitely killed

§ elements in potential dataflow across selected subtree

May-not-written: pcopy-outXran may-persistq zmust-write

Copy-in: live-inYmay-not-written

Note: if array elements are structures, then entire structures are copied

Page 272: High-Level Loop Transformations and Polyhedral Compilation

PPCG Device Mapping May 30, 2017 81 / 82

Data Copying to/from DeviceCopy-out:

take may-writesremove writes only needed for dataflow inside selected subtreeapproximate to entire array

May-persist:elements that may need to be preserved by selected subtreeconsists of

§ elements that may need to be preserved by entire SCoPñ elements not definitely written and not definitely killed

§ elements in potential dataflow across selected subtree

May-not-written: pcopy-outXran may-persistq zmust-write

Copy-in: live-inYmay-not-written

Note: if array elements are structures, then entire structures are copied

Page 273: High-Level Loop Transformations and Polyhedral Compilation

PPCG Device Mapping May 30, 2017 81 / 82

Data Copying to/from DeviceCopy-out:

take may-writesremove writes only needed for dataflow inside selected subtreeapproximate to entire array

May-persist:elements that may need to be preserved by selected subtreeconsists of

§ elements that may need to be preserved by entire SCoPñ elements not definitely written and not definitely killed

§ elements in potential dataflow across selected subtree

May-not-written: pcopy-outXran may-persistq zmust-write

Copy-in: live-inYmay-not-written

Note: if array elements are structures, then entire structures are copied

Page 274: High-Level Loop Transformations and Polyhedral Compilation

PPCG Device Mapping May 30, 2017 81 / 82

Data Copying to/from DeviceCopy-out:

take may-writesremove writes only needed for dataflow inside selected subtreeapproximate to entire array

May-persist:elements that may need to be preserved by selected subtreeconsists of

§ elements that may need to be preserved by entire SCoPñ elements not definitely written and not definitely killed

§ elements in potential dataflow across selected subtree

May-not-written: pcopy-outXran may-persistq zmust-write

Copy-in: live-inYmay-not-written

Note: if array elements are structures, then entire structures are copied

Page 275: High-Level Loop Transformations and Polyhedral Compilation

PPCG Device Mapping May 30, 2017 81 / 82

Data Copying to/from DeviceCopy-out:

take may-writesremove writes only needed for dataflow inside selected subtreeapproximate to entire array

May-persist:elements that may need to be preserved by selected subtreeconsists of

§ elements that may need to be preserved by entire SCoPñ elements not definitely written and not definitely killed

§ elements in potential dataflow across selected subtree

May-not-written: pcopy-outXran may-persistq zmust-write

Copy-in: live-inYmay-not-written

Note: if array elements are structures, then entire structures are copied

Page 276: High-Level Loop Transformations and Polyhedral Compilation

PPCG Device Mapping May 30, 2017 82 / 82

Data Copying Example

__pencil_kill(A);for (int i = 0; i < n; i++)

if (B[i] > 0)A[i] = B[i];

A may be writtenñ A in copy-outA may also not be written (completely)

, but no data can flow across kill

ñ parts of A may (be expected to) surviveñ A also needs to be in copy-in

Page 277: High-Level Loop Transformations and Polyhedral Compilation

PPCG Device Mapping May 30, 2017 82 / 82

Data Copying Example

__pencil_kill(A);for (int i = 0; i < n; i++)

if (B[i] > 0)A[i] = B[i];

A may be writtenñ A in copy-outA may also not be written (completely), but no data can flow across killñ parts of A may (be expected to) surviveñ A also needs to be in copy-in

Page 278: High-Level Loop Transformations and Polyhedral Compilation

May 30, 2017 1 / 10

References I

[1] Péricles Alves, Fabian Gruber, Johannes Doerfert,Alexandros Lamprineas, Tobias Grosser, Fabrice Rastello, andFernando Magno Quintão Pereira. “Runtime PointerDisambiguation”. In: Proceedings of the 2015 ACM SIGPLANInternational Conference on Object-Oriented Programming, Systems,Languages, and Applications. OOPSLA 2015. Pittsburgh, PA, USA:ACM, 2015, pp. 589–606. doi: 10.1145/2814270.2814285.

[2] Riyadh Baghdadi, Ulysse Beaugnon, Albert Cohen, Tobias Grosser,Michael Kruse, Chandan Reddy, Sven Verdoolaege, Javed Absar,Sven van Haastregt, Alexey Kravets, Anton Lokhmotov, Adam Betts,Alastair F. Donaldson, Jeroen Ketema, Róbert Dávid, andElnar Hajiyev. “PENCIL: A Platform-Neutral Compute IntermediateLanguage for Accelerator Programming”. In: Proc. ParallelArchitectures and Compilation Techniques (PACT’15). Oct. 2015.doi: 10.1109/PACT.2015.17.

Page 279: High-Level Loop Transformations and Polyhedral Compilation

May 30, 2017 2 / 10

References II

[3] Riyadh Baghdadi, Albert Cohen, Sven Verdoolaege, andKonrad Trifunovic. “Improved loop tiling based on the removal ofspurious false dependences”. In: TACO 9.4 (2013), p. 52. doi:10.1145/2400682.2400711.

[4] Roberto Bagnara, Patricia M. Hill, and Enea Zaffanella. “The ParmaPolyhedra Library: Toward a Complete Set of Numerical Abstractionsfor the Analysis and Verification of Hardware and Software Systems”.In: Science of Computer Programming 72.1–2 (2008), pp. 3–21.

[5] Denis Barthou, Albert Cohen, and Jean-François Collard. “Maximalstatic expansion”. In: POPL ’98: Proceedings of the 25th ACMSIGPLAN-SIGACT symposium on Principles of programminglanguages. San Diego, California, United States: ACM, 1998,pp. 98–106. doi: 10.1145/268946.268955.

Page 280: High-Level Loop Transformations and Polyhedral Compilation

May 30, 2017 3 / 10

References III

[6] Denis Barthou, Jean-François Collard, and Paul Feautrier. “FuzzyArray Dataflow Analysis”. In: J. Parallel Distrib. Comput. 40.2(1997), pp. 210–226. doi: 10.1006/jpdc.1996.1261.

[7] Cédric Bastoul. Generating loops for scanning polyhedra. Tech. rep.2002/23. Versailles University, 2002.

[8] Cédric Bastoul. Extracting polyhedral representation from high levellanguages. Tech. rep. LRI, Paris-Sud University, May 2008.

[9] Marouane Belaoucha, Denis Barthou, Adrien Eliche, andSid-Ahmed-Ali Touati. “FADAlib: an open source C++ library forfuzzy array dataflow analysis”. In: Intl. Workshop on PracticalAspects of High-Level Parallel Programming. Vol. 1. 1. Amsterdam,The Netherlands, May 2010, pp. 2075–2084. doi:DOI:10.1016/j.procs.2010.04.232.

Page 281: High-Level Loop Transformations and Polyhedral Compilation

May 30, 2017 4 / 10

References IV

[10] Uday Bondhugula, Muthu Baskaran, Sriram Krishnamoorthy,J. Ramanujam, A. Rountev, and P. Sadayappan. “AutomaticTransformations for Communication-Minimized Parallelization andLocality Optimization in the Polyhedral Model”. In: InternationalConference on Compiler Construction (ETAPS CC). Apr. 2008. doi:10.1007/978-3-540-78791-4_9.

[11] Candl.http://icps.u-strasbg.fr/~bastoul/development/candl/.

[12] Alain Darte, Robert Schreiber, and Gilles Villard. “Lattice-BasedMemory Allocation”. In: IEEE Trans. Comput. 54.10 (2005),pp. 1242–1257. doi: 10.1109/TC.2005.167.

[13] Paul Feautrier. “Array expansion”. In: ICS ’88: Proceedings of the2nd international conference on Supercomputing. St. Malo, France:ACM Press, 1988, pp. 429–441. doi: 10.1145/55364.55406.

Page 282: High-Level Loop Transformations and Polyhedral Compilation

May 30, 2017 5 / 10

References V

[14] Paul Feautrier. “Dataflow analysis of array and scalar references”. In:International Journal of Parallel Programming 20.1 (1991),pp. 23–53. doi: 10.1007/BF01407931.

[15] Paul Feautrier. “Some Efficient Solutions to the Affine SchedulingProblem. Part II. Multidimensional Time”. In: International Journalof Parallel Programming 21.6 (Dec. 1992), pp. 389–420. doi:10.1007/BF01379404.

[16] Tobias Grosser, Armin Größlinger, and Christian Lengauer. “Polly -Performing polyhedral optimizations on a low-level intermediaterepresentation”. In: Parallel Processing Letters 22.04 (2012). doi:10.1142/S0129626412500107.

[17] François Irigoin and Rémi Triolet. “Supernode partitioning”. In: 15thAnnual ACM Symposium on Principles of Programming Languages.San Diego, California, Jan. 1988, pp. 319–329.

Page 283: High-Level Loop Transformations and Polyhedral Compilation

May 30, 2017 6 / 10

References VI[18] W. Kelly, V. Maslov, W. Pugh, E. Rosser, T. Shpeisman, and

D. Wonnacott. New user interface for Petit and other interfaces: userguide. Tech. rep. Available as petit/doc/petit.ps in the Omegadistribution. University of Maryland, Dec. 1996.

[19] Wayne Kelly, Vadim Maslov, William Pugh, Evan Rosser,Tatiana Shpeisman, and David Wonnacott. The Omega Library.Tech. rep. University of Maryland, Nov. 1996.

[20] The Polyhedral Compiler Collection.http://www.cse.ohio-state.edu/~pouchet/software/pocc/.2012.

[21] Louis-Noël Pouchet, Cédric Bastoul, and Albert Cohen. LetSee: theLEgal Transformation SpacE Explorator. Third InternationalSummer School on Advanced Computer Architecture andCompilation for Embedded Systems (ACACES’07), L’Aquila, Italia.Extended abstract, pp 247–251. July 2007.

Page 284: High-Level Loop Transformations and Polyhedral Compilation

May 30, 2017 7 / 10

References VII

[22] Konrad Trifunovic, Albert Cohen, David Edelsohn, Feng Li,Tobias Grosser, Harsha Jagasia, Razya Ladelsky, Sebastian Pop,Jan Sjödin, and Ramakrishna Upadrasta. “GRAPHITE two yearsafter: First lessons learned from real-world polyhedral compilation”.In: GCC Research Opportunities Workshop (GROW’10). 2010.

[23] Sven Verdoolaege. “isl: An Integer Set Library for the PolyhedralModel”. In: Mathematical Software - ICMS 2010. Ed. byKomei Fukuda, Joris Hoeven, Michael Joswig, and Nobuki Takayama.Vol. 6327. Lecture Notes in Computer Science. Springer, 2010,pp. 299–302. doi: 10.1007/978-3-642-15582-6_49.

[24] Sven Verdoolaege. “Polyhedral process networks”. In: Handbook ofSignal Processing Systems. Ed. by Shuvra Bhattacharrya,Ed Deprettere, Rainer Leupers, and Jarmo Takala. Springer, 2010,pp. 931–965. doi: 10.1007/978-1-4419-6345-1_33.

Page 285: High-Level Loop Transformations and Polyhedral Compilation

May 30, 2017 8 / 10

References VIII

[25] Sven Verdoolaege. “Counting Affine Calculator and Applications”.In: First International Workshop on Polyhedral CompilationTechniques (IMPACT’11). Chamonix, France, Apr. 2011. doi:10.13140/RG.2.1.2959.5601.

[26] Sven Verdoolaege. PENCIL support in pet and PPCG. Tech. rep.RT-457, version 2. INRIA Paris-Rocquencourt, May 2015. doi:10.13140/RG.2.1.4063.7926.

[27] Sven Verdoolaege. Presburger Formulas and Polyhedral Compilation.2016. doi: 10.13140/RG.2.1.1174.6323.

[28] Sven Verdoolaege and Albert Cohen. “Live-Range Reordering”. In:Proceedings of the sixth International Workshop on PolyhedralCompilation Techniques. Prague, Czech Republic, Jan. 2016. doi:10.13140/RG.2.1.3272.9680.

Page 286: High-Level Loop Transformations and Polyhedral Compilation

May 30, 2017 9 / 10

References IX

[29] Sven Verdoolaege and Tobias Grosser. “Polyhedral Extraction Tool”.In: Second International Workshop on Polyhedral CompilationTechniques (IMPACT’12). Paris, France, Jan. 2012. doi:10.13140/RG.2.1.4213.4562.

[30] Sven Verdoolaege, Serge Guelton, Tobias Grosser, and Albert Cohen.“Schedule Trees”. In: Proceedings of the 4th International Workshopon Polyhedral Compilation Techniques. Vienna, Austria, Jan. 2014.doi: 10.13140/RG.2.1.4475.6001.

[31] Sven Verdoolaege, Juan Carlos Juega, Albert Cohen,José Ignacio Gómez, Christian Tenllado, and Francky Catthoor.“Polyhedral parallel code generation for CUDA”. In: ACM Trans.Archit. Code Optim. 9.4 (2013), p. 54. doi:10.1145/2400682.2400713.

Page 287: High-Level Loop Transformations and Polyhedral Compilation

May 30, 2017 10 / 10

References X

[32] Sven Verdoolaege, Hristo Nikolov, and Todor Stefanov. “On DemandParametric Array Dataflow Analysis”. In: Third InternationalWorkshop on Polyhedral Compilation Techniques (IMPACT’13).Berlin, Germany, Jan. 2013. doi: 10.13140/RG.2.1.4737.7441.

[33] Sven Verdoolaege, Rachid Seghir, Kristof Beyls, Vincent Loechner,and Maurice Bruynooghe. “Counting integer points in parametricpolytopes using Barvinok’s rational functions”. In: Algorithmica 48.1(June 2007), pp. 37–66. doi: 10.1007/s00453-006-1231-0.

[34] Doran K. Wilde. A Library for doing polyhedral operations.Tech. rep. 785. IRISA, Rennes, France, 1993, 45 p.

[35] Jingling Xue. Loop tiling for parallelism. Kluwer AcademicPublishers, 2000.