1 Partitioning Loops with Variable Dependence Distances Yijun Yu and Erik D’Hollander Department...
-
Upload
brianna-turner -
Category
Documents
-
view
218 -
download
0
Transcript of 1 Partitioning Loops with Variable Dependence Distances Yijun Yu and Erik D’Hollander Department...
1
Partitioning Loops with Variable Dependence
Distances
Yijun Yu and Erik D’HollanderDepartment of Electronics and Information Systems
University of Ghent, Belgium
2
Introduction
1. Overview2. Dependence analysis:
pseudo distance matrix (PDM)
3. Loop transformations: unimodular and partitioning
4. Results5. Conclusion
3
1. Overview
• Loop with linear array subscripts• Solve dependence equation• Find all non-constant distances• Create maximally covering grid and base-vectors• Create the pseudo distance matrix, PDM
containing all base-vectors of the covering grid• Find independent loops or independent partitions,
based on the rank of PDM
4
Approach
Uniform or constantdistance
Variable or non-constantdistance
rank(H)<loop depth?Non-full rank
Full rank
Partitioning transformation
Unimodulartransformation
N Y
Dependence analysis:
H=PDM
det(H)> 1?
Loop parallelization
YN
Linear dependence equation
Loop transformation:
5
2. Dependence Analysis
4I1-I2+3=J1+J2-1
2I1+I2-2=J1-J2+2
f(i)=g(j)
iA+a = jB+b
4 2
1 1
A1 1
1 1
B3
2
a1
2
b
i=(I1,I2) j=(J1,J2)
A[f(I)]=……=A[g(I)]
L1: do I1= -N,N
L2: do I2= -N,N
A(4I1-I2+3,2I1+I2-2)=…
…=A(I1+I2-1,I1-I2+2)
enddoenddo
i j A[f(i)]=A[g(j)] d=|j-i|(1, -5) (3, 10) A[12, -5] (2,15)(3, 0) (9, 7) A[15, 4] (6, 7)(-3,-3) (-9, 4) A[-6,-11] (6, -7)
…
6
The dependence distance
iA a jB b
l
r
i tU
j tU
| | | ( ) | | |r l d j i t U U tF
1. The linear dependence equation:
2. Using Banerjee’s unimodular transformation U to obtain an echelon matrix S, the equation t S=(b-a) is solved, yielding:
3. The distance between dependent iterations i, j is:
• Ul and Ur are left, right halves of U• t has constant part t1 and unknown part t2
7
The distance set
1 2where const, variable 1 2t t t t t
'| |,
''
Fd tF F
F
1. From the dependence equation t S = (b-a), the solution vector t contains a constant and an arbitrary part:
1 '| where '' or
''i Z
t Fd xR d 0 x R F
F
2. Matrix F=Ur-Ul can be vertically separated into two sub-matrices:
3. The distance set of the dependence equations is:
8
Distances in the iteration space
• Iteration-space (i1,i2) of loop 1 with dep. eqns:4I1-I2+3=J1+J2-1
2I1+I2-2=J1-J2+2. • The arrows
(I1,I2)(J1,J2)represent the distance vectors between dependent iterations.
i1
i2
9
Distances base vectors
1. The dependence distance is non-constant for the reference pair, e.g. (2,15),(6,7),(6,-7), as highlighted.
2. However, the distance set is spanned by the grid generated by the base vectors(2,1) and (0,2).
3. For example, (2,15) = (2,1) + 7 (0,2),(6, 7) = 3(2,1) + 2 (0,2),(6, -7) = 3(2,1) - 5 (0,2).
i1
i2
10
The largest base vectorsThe distance set is the linear combination of the row vectors in R:
A lattice L(R) is a group of vectors generated by all the linear combinations of the independent row vectors of a matrix R.
We look for the smallest lattice L(R) (generating the largest grid) which covers the whole distance set:
In this way, possible spurious dependencies introduced by replacing the distance set with a lattice are minimized.
( )L R
( ) | iL R xR x
| i d xR d 0 x
11
Pseudo Distance Matrix (PDM)• A Hermite normal form HNF(R) is a full row rank matrix reduced
from the echelon form of R by unimodular transformation.
• Therefore H generates the same lattice as R does, that is, the smallest lattice. In addition, the HNF rows are base vectors.
• H is called the pseudo distance matrix (PDM), because it generates the distance set from its row vectors.
• Since the row vectors of H are constant, the techniques from the uniform distance dependence matrix may apply.
H = HNF (R)
L (H) = L (R)
12
Calculating the PDM
3 4
1 0
1 14 4
2 1
0 2
t t
d
1. Solving the linear dependence equations:
2. Expressing the distance set:
3 4
0 4
1 2 1
0 2
t t
d
3. Finding the largest base vectors:0 4
2 12 1
0 20 2
HNF
2 1PDM
0 2
1 2 1 24 2 3 1 1 1
1 1 2 1 1 2i i j j
d tF
d xR
HNF( )H R
13
3. Loop transformations: unimodular and partitioning
LegalityAny transformation should be legal, i.e.preserve the executing order of dependent iterations.
Transformations depending on rank(H):3.1 Unimodular transformation: non-full rank PDM3.2 Partitioning transformation: full rank PDM3.3 Combined approach
14
3.1 Unimodular transformation
• Given a non-full rank (r m) pseudo distance matrix H, a unimodular matrix T can be developed such that the first m-r columns of HT are zero.
• As a result, m-r outermost loops can be parallelized.
15
3.2 Partitioning transformation
• Given a full rank pseudo distance matrix H, the loop nest can be partitioned such that det(H) partitions are found.
• The partitioned parallelism is det(H).
16
3.3 Combined approach
• After a unimodular transformation on a non-full rank PDM, the transformed PDM matrix has a full rank sub-matrix, S.
• When the det(S)>1, additional parallelism can be found using loop partitioning transformation.
17
L’1: doall J1=-2N,2N
L’2: do J2=max(-N,-N-J1),
min(N,N-J1)
I1=J2
I2=J1+J2
A(3I1+1,2I1+I2-1)=…
…=A(I1+3,I2+1) enddoenddoall
4. Results (1) Non-full rank PDM
PDM=(2,2) (2,0) (0,2)
L1: do I1=-N,N
L2: do I2=-N,N
A(3I1+1,2I1+I2-1)=…
…=A(I1+3,I2+1) enddoenddo
1 1
0 1
0 1
1 0
1 1 0 1 1 1
0 1 1 0 1 0
T
18
NF-rank: Dependence graphsj1
j2
i2
19
4. Results(2) partitioning
L1: do I1=-N,N
L2: do I2=-N,N
A(4I1-I2+3,2I1+I2-2)=…
…=A(I1+I2-1,I1-I2+2)
enddoenddo
L’1: doall Io1=0,1
L’2: doall Io2=0,1
L’3: do I1=-N+mod(N+Io1,2), N-mod(N-Io1,2),2
io’2=Io2+(I1-Io1)/2
L’4: do I2=-N+mod(N+Io’2,2),
N-mod(N-Io’2,2),2
A(4I1-I2+3,2I1+I2-2)=…
…=A(I1+I2-1,I1-I2+2)
enddoenddo
enddoall enddoall
1/ 2 0
0 1
1 1
0 1
1 0
0 1/ 2
1 0
0 1
det 1
2 1PDM
0 2
det 4
1 1
0 2
1 0
0 2
20
F-rank partitioning: dependence graphs
21
4. Results (3) Combined
PDM=(2,2) (0, 2)
L’1: doall J1=-2N,2N
L’2: do J2=max(-N,-N-J1), min(N,N-J1)
I1=J2
I2=J1+J2
A(3I1+1,2I1+I2-1)=…
…=A(I1+3,I2+1) enddoenddoall
(0, 1)
L’’1: doall Jo2=0,1
L’’2: doall J1=-2N,2N
p2=max(-N,-N-J1)
q2=min(N,N-J1)
L’’3: do J2=p2+mod(Jo2-p2,2),
q2-mod(q2-Jo2,2),2 I1=J2
I2=J1+J2
A(3I1+1,2I1+I2-1)=…
…=A(I1+3,I2+1) enddoenddoall
enddoall
1 0
0 1/ 2
1 1
1 0
det 2 det 1
22
F-rank submatrix dependence graphj1
j2 j2
j2
j1j1
23
5. Conclusion
• The distances of the dependent iterations are non-constant when the array subscripts are linear.
• A pseudo distance matrix(PDM) with the largest base vectors of the distance space is computed from the linear dependence equations.
• Parallelism can still be exploited for these loops with variable distances by the unimodular and partitioning transformations that are derived from the PDM.