Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^...
Transcript of Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^...
Assign Optimization for Index-Reuse Tapes
Max Sagebaum, Tim Albring
Chair for Scientific Computing,TU Kaiserslautern
19th EuroAD WorkshopApril 2016
Overview
Motivation and problem description
Solution with tools from the c++ std library
AD specific solution
Results & measurements
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 2/ 23
Motivation - Active Copy (Status 12/2015)Each statement increases the statement stack by one and the Jacobi stack by thenumber of independent variables.
Special case: Copy statement
Act i veRea l c ;{
Act i veRea l x ;x = . . . ;c = x ;
}
If x is active (i.e. has an index) we call this an Active Copy.
Treated differently depending on the tape implementation:
Linear Indexing: Index of x is simply copied along with the value. Noinformation is written on the tape.
Nonlinear Indexing: Index will be marked as free if x is destroyed. We need towrite a new statement and Jacobi entry (1.0) so that c gets its own index andthe adjoint vector position is correctly set to zero in the reverse sweep.
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 3/ 23
Motivation - Example: SU2 (Status 12/2015)Under active development by Stanford University and by organizations allaround the world (TUKL, TU Delft, ...)
Comprises complete self-contained optimization environment based thecontinuous or discrete adjoint method.
Finite-Volume discretization of (U)RANS equations and turbulence modelsusing a highly modular C++ code-structure
Convergence acceleration using multi-grid and local time-stepping methods
Several space and time integration methods
Integrated AD support
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 4/ 23
Motivation - Fixed-Point Formulation for Multi-Disciplinary Design (Status12/2015)
– β ∈ D ⊂ Rp: design vector
– U ∈ U ⊂ Rn: state vector
– X ∈ X ⊂ Rm: computational mesh
– M(β) = X : mesh deformation equation
– J(U,X ): objective function
– R(U,X ) = 0: discretized state equation
R or rather G may not only contain the flow equa-tion, but any (weakly) coupled model, i.e.
Flow + turbulence
Flow + structure
Flow + aeroacoustics
Reformulation also possible for unsteady problems(dual-time stepping).
minβ
J(U(β),X (β))
s.t. R(U(β),X (β)) = 0
M(β) = X
minβ
J(U(β),X (β))
s.t. G(U(β),X (β)) = U
M(β) = X
Assuming R(U,X ) = 0 is
solved by a fixed-point iteration:
G(U∗,X ) = U∗ ⇔ R(U∗,X ) = 0
In case of Newton-type solver:
G(U,X ) := U − P(U,X )R(U,X ),
where P ≈ (∂R/∂U)−1.
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 5/ 23
Motivation - The Discrete Adjoint Solver (Status 12/2015)
Using the method of Lagrangian multiplier we define the Lagrangian function as:
L(β,U,X , U, X ) = J(U,X ) + UT (G(U,X )︸ ︷︷ ︸=:N, Shifted Lagrangian
−U) + XT (M(β)− X )
KKT conditions yield equations for adjoints U, X and sensitivity vector dL/dβ:
U =∂
∂UJ(U,X ) +
∂
∂UGT (U,X )U
=∂
∂UNT (U, U,X ) Adjoint equation
X =∂
∂XJ(U,X ) +
∂
∂XGT (U,X )U
=∂
∂XNT (U, U,X ) Mesh Adjoint equation
dL
dβ=
d
dβMT (β)X Design equation
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 6/ 23
Motivation - Test Case(Status 12/2015)
We have to account for a lot of different equations as well as time and spacediscretization methods.
Simplest case: steady 2D Euler equations with explicit Euler for pseudo-timestepping
Space discretization:
1. Centered Scheme + Artificial dissipation (JST)
2. Upwind Scheme + Limiter (Roe)
Airfoil in transonic flow conditions (10127 Elements)
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 7/ 23
Motivation - Tape Specific Performance (Status 12/2015)
Memory
State
men
ts
Jaco
bi entr
ies
Adjoin
t vect
or
Index
vect
or0
20
40
60
80
%o
fT
ota
l
JST - Linear Indexing (59.53MB Total)
JST - Nonlinear Indexing (72.19MB Total)
Roe - Linear Indexing (150MB Total)
Roe - Nonlinear Indexing (173.88MB Total)
Statements
Oth
er
Activ
eCop
y
Linea
r0
20
40
60
80
%o
fT
ota
l
Roe (6.87 Mio Total)
JST (2.87 Mio Total)
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 8/ 23
Problem analysis
Differences for linear indexing and nonlinear indexing:
linear indexing nonlinear indexingStatement memory (byte, n args.) 1 + 12 ∗ n 5 + 12 ∗ nCopy statement (byte) 0 17
⇒ Constant memory per statement increased
⇒ and additional statements need to be written
But large size reduction of the adjoint vector
Solution: Avoid the writing of additional statements?Use reference counting for the indices!
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 9/ 23
Soultion - std::shared ptr example1
#inc l u d e <memory>#inc l u d e <i o s t r e a m>
s t r u c t Foo {Foo ( ) { s t d : : cout << ”Foo . . .\ n” ; }˜Foo ( ) { s t d : : cout << ”˜Foo . . .\ n” ; }
};
i n t main ( ){{
s t d : : cout << ” c o n s t r u c t o r w i t h o b j e c t\n” ;s t d : : s h a r e d p t r<Foo> sh2 (new Foo ) ;s t d : : s h a r e d p t r<Foo> sh3 ( sh2 ) ;s t d : : cout << sh2 . u s e c o u n t ( ) << ’\n ’ ;s t d : : cout << sh3 . u s e c o u n t ( ) << ’\n ’ ;
}
}
Output:
constructor with object
Foo...
2
2
~Foo...
1Taken from: http://en.cppreference.com
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 10/ 23
Solution - Off-the-shelf solution with std::shared ptr
s t r u c t Act i veRea l {double p ; // p r ima ls t d : : s h a r e d p t r<i n t> i n d e x P t r ;
} ;
Analysis of the method:
Size of ActiveReal increases: 16 byte → 24 byte
Memory for each managed index: 24 byte + 4 byte(internal shared ptr structure + data for int)
2 additional dereferencing operations for each index access:
i n t i n d e x = ∗a . i n d e x P t r . i n t e r n a l S t r u c t u r e−>d a t a P o i n t e r ;
⇒ Lots of additional memory required
⇒ Access of indices will be slow
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 11/ 23
Solution - Overview CoDiPack
Focus on industrial applications
High memory requirementsHeterogeneous parallelisation strategiesRun time efficiency...
Clear user interfaces
Easy to use (for simple cases)
Easy to extend
Open source license (e.g. in combination with SU2)
Different tapes:
Jacobi tapingPrimal value taping (under development)Index reuseHigher order derivatives
Additional features:
External functionsVector mode...
Expression
ActiveReal TapeInterface
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 12/ 23
Solution - New index manager in CoDiPack
Expression
ActiveReal TapeInterface
IndexManager NonlinearIndexManager
LinearIndexManager
???
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 13/ 23
Solution - New index manager in CoDiPack
Idea: Perform the index counting in the index manager.
1
2
0· · ·2
4
2· · ·
IndexManager
indexUse array
struct ActiveReal:double p;int index;
indexUse[index] = number of ActiveReal that use this index
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 14/ 23
Solution - New index manager in CoDiPack
Idea: Perform the index counting in the index manager.
Introduce array for the index count
Update index use when indices are accessed
Special handling for index == 0 omitted
c l a s s IndexManager {i n t globalMaximumIndex ;i n t currentMaximumIndex ;
s t d : : v e c t o r<i n t> f r e e I n d i c e s ;
std::vector<int> indexUse ;
. . .} ;
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 15/ 23
Solution - New index manager in CoDiPackChanges with respect to the old index manger
Special handling for index == 0 omitted
i n l i n e vo id f r e e I n d e x ( i n t& i n d e x ) {indexUse[index] -= 1 ;
i f ( indexUse[index] == 0) { // on l y f r e e the i ndex i f i t not used any l o n g e r. . .}}
i n l i n e i n t c r e a t e I n d e x ( ) {. . .
checkIndexUseSize() ;indexUse[index] = 1 ;r e t u r n i n d e x ;
}
i n l i n e vo id a s s i g n I n d e x ( i n t& i n d e x ) {i f ( indexUse[index] >1) {
indexUse[index] -= 1 ;
i n d e x = t h i s−>c r e a t e I n d e x ( ) ;}}
i n l i n e vo id c o p y I n d e x ( i n t& l h s , const i n t& r h s ) {f r e e I n d e x ( l h s ) ;indexUse[rhs] += 1 ;
l h s = r h s ;}
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 16/ 23
Solution - New index manager in CoDiPack
Characteristics of the implementation:
No dereferencing operation in order to access the index of an ActiveReal
Size of ActiveReal unchanged(no change to the structure)
4 byte for each managed index
Reference counting performed in array
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 17/ 23
Results - SU2 Test Case
Same test case as in the motivation
Simplest SU2 case: steady 2D Euler equations with explicit Euler forpseudo-timestepping
Space discretization:
1. Centered Scheme + Artificial dissipation (JST)
2. Upwind Scheme + Limiter (Roe)
Airfoil in transonic flow conditions (10127 Elements)
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 18/ 23
Results - Tape Specific Performance
Memory
State
men
ts
Jaco
bi entr
ies
Adjoin
t vect
or
Index
vect
or0
20
40
60
80
100
%o
fT
ota
l
Jst - Linear Indexing (61MB Total)
Jst - Nonlinear Indexing (61MB Total)
Jst - Nonlinear Indexing assign opt (49MB Total)
Roe - Linear Indexing (152MB Total)
Roe - Nonlinear Indexing (154MB Total)
Roe - Nonlinear Indexing assign opt (124MB Total)
Entries
State
men
ts
Jaco
bies0
2 · 106
4 · 106
6 · 106
8 · 106
1 · 107
1.2 · 107
En
trie
s
Jst - Linear Indexing
Jst - Nonlinear Indexing
Jst - Nonlinear Indexing assign opt
Roe - Linear Indexing
Roe - Nonlinear Indexing
Roe - Nonlinear Indexing assign opt
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 19/ 23
Results - Speed test
2D coupled Burgers equation on 601x601 grid with 32 time steps
ut + uux + vuy =1
R(uxx + uyy ) (1)
vt + uvx + vvy =1
R(vxx + vyy ) (2)
Evaluated on one node of the Elwetritsch clusterwith two Intel E5-2670 cpu’s (16 cores)
Single test case: Run only one processFull memory bandwidth available
Multi test case: Run 16 times the same problemMemory bandwidth is limited
Tape memory is approx. 4.8 GB for linear indexing
Tape memory is approx. 4.5 GB for nonlinear indexing
Contains no active copy operations (have been optimized out)
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 20/ 23
Results - Speed test
0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
Linear indexing
Nonlinear indexing
Nonlinear indexingassign opt.
time in sec.
single record
single interpret
multi record
multi interpret
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 21/ 23
Conclusion & Outlook & Release
Conclusion:
Statements for active copy operations can be eliminated
Memory overhead for the management is minimal
Time overhead is also reasonable
Outlook:
Test the new scheme on real world test cases
Analyze the influence of the index pattern in the reverse mode
Release:
Today
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 22/ 23
CoDiPack Release 1.2
CoDiPack release 1.2
Modularized tape implementations
Vector mode implementation
Assign optimization index manger (new default one)
Special reference type for arguments that are used multiple timese.g. w = x ∗ x ∗ x + sin(x);stores 1 Jacobi instead of 4
Newsletter: [email protected]
Contact: [email protected]
Thank you for your attention!
Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 23/ 23