Cgo2007 P3 3 Birkbeck
description
Transcript of Cgo2007 P3 3 Birkbeck
A Dimension Abstraction Approach to Vectorization in Matlab
Neil BirkbeckJonathan LevesqueJose Nelson Amaral
Computing ScienceUniversity of Alberta
Edmonton, Alberta, Canada
Problem
Problem Statement:Generate equivalent, error-free vectorized
source code for Matlab source while utilizing higher level matrix operations when possible to improve efficiency.
Motivation
Loop-based code is slower than vector code in Matlab.Why?
interpretive overhead (type/shape checking,…)
resizing of arrays in loops
Vectorization also useful for compiled Matlab code, where optimized vector routines could be substituted.
n=1000;for i=1:n, A(i)=B(i)+C(i);end
n=1000;for i=1:n, A(i)=B(i)+C(i);end
n=1000;A(1:n)=B(1:n)+C(1:n);
n=1000;A(1:n)=B(1:n)+C(1:n);
5x faster!
Related Work
Data dependence vectorization Allen & Kennedy’s Codegen algorithm
Build data dependence graph Topological visit strongly connected components
Abstract Matrix Form (AMF) [Menon & Pingali] axioms used to transform array code take advantage of matrix multiplication Not clear if it is easily extensible or allows for vectorization
of irregular access (e.g., access to the diagonal)
Incorrect Vectorization
Example 1:for i=1:n,
a(i)=b(i)+c(i);
end
Pull out of loop.Index variable
substitution (i1:n)a(1:n)=b(1:n)+c(1:n)a(1:n)=b(1:n)+c(1:n)
Vectorization correct if a,b, and c are row vectors or column vectors
If this is not true the vectorized code will introduce an error!
Incorrect Vectorization Example 2:
for i=1:n, x(i)=y(i,h)*z(h,i);end
for i=1:n, x(i)=y(i,h)*z(h,i);end
Matlab is untyped Vectorization depends on whether h is
a vector or scalar. If h is a scalar:Otherwise:
x(1:n)=y(1:n,h).*z(h,1:n)’;
x(1:n)=sum(y(1:n,h).*z(h,1:n)’,2);
Overview of Solution
Vectorizable statement
Data dependence-basedvectorizer
Knowledge ofShape of variables
Propagate dimensionalityup parse tree
Dimensions Agree?
Leave statement in loopNo
Yes Perform Transformations
Output Vector statement
More Specifically
Represent dimensionality of expressions as list of symbols 1 or “*” (>1) Assume known for variables.
Type dim
scalar (1)
1xn vector (1,*)
nx1 vector (*,1),(*)
mxn matrix (*,*)
Examples:
Propagate up parse tree according to Matlab rules Compatibility:
dim(A)≈dim(B) when the lists are equivalent (after removal of redundant 1’s)
Vectorized Dimensionality
Vectorized dimensionality: representation of dimensions after vectorization of
a loop denoted dimi for loop with index variable i
Introduce new symbol ri for index variable i
for i=1:n, a(i)=10+i;end
exp dim(exp) vectorized dimi(exp)
10 (1) 10 (1)
i (1) 1:n (1,ri)
a(i) (1) a(1:n) (ri)
a (*) a (*)
Vectorized Dimensionality
Expressions with incompatible vectorized dimensionality should not be vectorized.
When do dimensionalities agree?Assignment expressions: elhs=erhs
dimi(elhs)≈dimi(erhs) || erhs≈(1)
Element-wise binary operators: e=elhsΘerhs
dimi(elhs) ≈(1)||dimi(erhs)≈(1)||dimi(elhs)≈dim(erhs)
Θ in {+,-,.*,…}
dimi,j(B)=(rj,ri)dimi,j(C)=(ri,rj)
Vectorization fails because (ri,rj) is not compatible with (rj,ri)
dimi,j(B)=(rj,ri)dimi,j(C)=(ri,rj)
Vectorization fails because (ri,rj) is not compatible with (rj,ri)
Vectorized Dimensionality
Rules very restrictive: Assume dim(A)=dim(B)=dim(C)=(*,*)
for i=1:100,
for j=1:100
A(i,j)=B(j,i)+C(i,j);
end
end
for i=1:100,
for j=1:100
A(i,j)=B(j,i)+C(i,j);
end
end
Transpose Transformation
Extension to utilize transpose when necessary is straightforward:For assignment:
if dimi(A)≈reverse(dimi(B)) then A=BT is allowable
for i=1:m,
for j=1:n
A(i,j)=B(j,i);
end
end
for i=1:m,
for j=1:n
A(i,j)=B(j,i);
end
end
dimi,j(A)=reverse(dimi,j(B))=(ri,rj)
A(1:m,1:n)=(B(1:n,1:m))’
dimi,j(A)=reverse(dimi,j(B))=(ri,rj)
A(1:m,1:n)=(B(1:n,1:m))’
Transpose Transformation
Extension to utilize transpose when necessary is straightforward:Similar for pointwise operations:
if dimi(A)≈reverse(dimi(B)) then AΘBT is allowable, propagate dimi(AΘBT)=dimi(A)
if dimi(reverse(A))≈dimi(A) then ATΘB is allowable, propagate dimi(ATΘB)=dimi(B)
Pattern Database Dimensionality disagreement at binary operators inhibits
vectorization. Recognizing patterns (consisting of operator type and
operand dimensionalities) can be used to identify a transformation enabling vectorization.
lhs operation rhs output(ri, rj) ΘΘ (ri,1) (ri, rj)
for i=1:m, for j=1:n, A(i,j)=B(i,j)+C(i); endend
for i=1:m, for j=1:n, A(i,j)=B(i,j)+C(i); endend
B(i,j)+C(i);B(i,j)+C(i); B(1:m,1:n)+repmat(C(1:m),1,n);B(1:m,1:n)+repmat(C(1:m),1,n);
Transformed Result
Pattern:
Pattern Database
Diagonal access pattern:
lhs operation rhs output(ri, ri) (index) (index) nil (1, ri)
Pattern:
for i=1:n, a(i)=A(i,i)*b(i);end
for i=1:n, a(i)=A(i,i)*b(i);end
a(1:n)=A((1:n)+size(A,1)*((1:n)-1)).*b(1:n);a(1:n)=A((1:n)+size(A,1)*((1:n)-1)).*b(1:n);
Column major indexing of A
Additive Reduction Statements
Additive-reduction statements use a loop variable to perform an accumulation. Not all loop nest index variables appear in
output dimensionality
for i1=…, for i2=…, … for ik=… A(J)=A(J)+E; … end endend
for i1=…, for i2=…, … for ik=… A(J)=A(J)+E; … end endend
Loop nest variables I={i1,i2,…,ik}J is a subset of Efor i=1:m,
for j=1:n, a(i)=a(i)+B(i,j); endend
for i=1:m, for j=1:n, a(i)=a(i)+B(i,j); endend
I={i,j} J={i}
for i=1:m a=a+b(i);end
I={i},J={}I-J={i}ρ(b(i))={}
ri in dimi(b(i))=(ri,1)Reduce: b(i)sum(b(i),1);Vectorize: a=a+sum(b(1:m));
for i=1:m a=a+10;end
I={i},J={}I-J={i}ρ(10)={}
ri not in dimi(10)Reduce: 10m*10, ρ(m*10)={ri}Vectorize: a=a+m*10;
Additive Reduction (Solution)
Maintain/propagate dimensionality and reduced variables for an expression. ρ(E) denotes the reduced variables for expression E
When checking statement A(J)=A(J)+E ensure dimi1,i2,…,ikA(J)≈dimi1,i2,…,ik(E) and ρ(E)=I-J any variable ri in I-J but not in ρ(E) must be reduced
Additive Reduction via Matrix Multiplication
Matrix multiplication can be used to perform reductions on e=elhs*erhs , provided:
1. dimi1,…,ik(elhs)=(Sl,rk)
2. dimi1,…,ik(erhs)=(rk,Sr)
3. rk is a reduction variable. Implies:
dimi1,…,ik(e)=(Sl,Sr) ρ(e)=union(ρ(elhs), ρ(erhs),{rk})
for i=1:m for j=1:n a(i)=a(i)+B(i,j)*x(j); endend
• j is used for reduction• dimi,j(B(i,j))=(ri,rj)• dimi,j (x(j))=(rj)
a(1:m)=a(1:m)+… B(1:m,1:n)*x(1:n);
ρ(a(i,j)*b(j)+sum(c(i,j),2))={rj}, dimi,j(a(i,j)*b(j)+sum(c(i,j),2)=(ri,rj)
ρ(a(i,j))={}, dimi,j(a(i,j))=(ri,rj)ρ(b(j))={}, dimi,j(b(j))=(rj)rj is reduction variable
Additive Reduction Example
Additive reduction example:for i=1:m,for i=1:m, for j=1:n,for j=1:n, d(i)=d(i)+a(i,j)*b(j)+c(i,j)d(i)=d(i)+a(i,j)*b(j)+c(i,j) endendendend ρ(c(i,j))={},
dimi,j(c(i,j))={ri,rj}
Need to reduce rj: c(i,j)sum(c(i,j),2);
Dimensionality and reduced variables agree, now replace index
variables:
ρ(a(i,j)*b(j))={rj},dimi,j(a(i,j)*b(j))=(ri)
Use matrix multiplication to reduce rj
d(1:m)=d(1:m)+a(1:m,1:n)*b(1:n)+sum(c(1:m,1:n),2);
Implementation Prototype
Pattern database and corresponding transformations are specified in modular end-user extensible manner.
Original Loop
Octave ParserEmbedded
ControlStatements
Create DDG
DimensionCheck
SuccessVectorize
Statement
Code Generator
VectorizerVectorized
Loop
no
yes
no
yes
Results
Source-to-source transformation Timing results averaged over 100 runs: Platform:
Matlab 7.2.0.2833.0 GHz Pentium D Processor
Results Histogram Equalization:
h=hist(im(:),[0:255]);%histogramheq=255*cumsum(h(:))/sum(h(:));for i=1:size(im,1), for j=1:size(im,2), im2(i,j)=heq(im(i,j)+1); endend
h=hist(im(:),[(0:255)]);heq=255*cumsum(h(:))/sum(h(:));im2(1:size(im,1),1:size(im,2))=... heq(im(1:size(im,1),1:size(im,2))+1);
Input source Vectorized Result
For monochrome 8-bit 800x600 image: original/vectorized:
Entire routine: 0.178s/0.114s (speedup: 1.56) Loop Portion only: 0.0814s/0.0176s (speedup: 4.6)
Results (Menon & Pingali Examples)X(i,1:p)=X(i,1:p)-L(i,1:i-1)*X(1:i-1,1:p);for k=1:p, for j=1:(i-1),
X(i,k)=X(i,k)-L(i,j)*X(j,k);end end
for i=1:N,for j=1:N phi(k)=phi(k)+a(i,j)*x_se(i)*f(j);end end
phi(k)=phi(k)+sum(a(1:N,1:N)’* x_se(1:N).*f(1:N),1);
for i=1:n,for j=1:n, for k=1:n,for l=1:n y(i)=y(i)+x(j)*A(i,k)* B(l,k)*C(l,j); end end end end
y(1:n)=y(1:n)+x(1:n)’*... (A(1:n,1:n)*B(1:n,1:n)’*C(1:n,1:n))’;
Settings Input time (s) Output time(s) speedup
i=500,p=5000 0.536s 0.030s 17
N=1000 0.174s 0.012s 14
n=40 0.622s 0.0001s 5000
Remaining Issues/Future Work
Each pattern transformation is local; no optimization over entire statement. e.g., we do not optimize and distribute transposes
Control flow within loop Function calls
functions are treated as pointwise operators (correct for many predefined arithmetic functions)
Incorporate our analysis directly with shape analysis
Summary
Contributions:A simple method to prevent incorrect
vectorization in MatlabA user extensible operator/dimensionality
pattern database can be used to improve vectorization
These patterns can make use of higher level semantics (e.g., matrix multiplication) or diagonal accesses in vectorization.
Acknowledgements
Funding provided by NSERC Grateful for reviewers comments and
suggestions
Thank You
Questions?