Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^...

23
Assign Optimization for Index-Reuse Tapes Max Sagebaum , Tim Albring Chair for Scientific Computing, TU Kaiserslautern 19th EuroAD Workshop April 2016

Transcript of Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^...

Page 1: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Assign Optimization for Index-Reuse Tapes

Max Sagebaum, Tim Albring

Chair for Scientific Computing,TU Kaiserslautern

19th EuroAD WorkshopApril 2016

Page 2: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Overview

Motivation and problem description

Solution with tools from the c++ std library

AD specific solution

Results & measurements

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 2/ 23

Page 3: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Motivation - Active Copy (Status 12/2015)Each statement increases the statement stack by one and the Jacobi stack by thenumber of independent variables.

Special case: Copy statement

Act i veRea l c ;{

Act i veRea l x ;x = . . . ;c = x ;

}

If x is active (i.e. has an index) we call this an Active Copy.

Treated differently depending on the tape implementation:

Linear Indexing: Index of x is simply copied along with the value. Noinformation is written on the tape.

Nonlinear Indexing: Index will be marked as free if x is destroyed. We need towrite a new statement and Jacobi entry (1.0) so that c gets its own index andthe adjoint vector position is correctly set to zero in the reverse sweep.

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 3/ 23

Page 4: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Motivation - Example: SU2 (Status 12/2015)Under active development by Stanford University and by organizations allaround the world (TUKL, TU Delft, ...)

Comprises complete self-contained optimization environment based thecontinuous or discrete adjoint method.

Finite-Volume discretization of (U)RANS equations and turbulence modelsusing a highly modular C++ code-structure

Convergence acceleration using multi-grid and local time-stepping methods

Several space and time integration methods

Integrated AD support

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 4/ 23

Page 5: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Motivation - Fixed-Point Formulation for Multi-Disciplinary Design (Status12/2015)

– β ∈ D ⊂ Rp: design vector

– U ∈ U ⊂ Rn: state vector

– X ∈ X ⊂ Rm: computational mesh

– M(β) = X : mesh deformation equation

– J(U,X ): objective function

– R(U,X ) = 0: discretized state equation

R or rather G may not only contain the flow equa-tion, but any (weakly) coupled model, i.e.

Flow + turbulence

Flow + structure

Flow + aeroacoustics

Reformulation also possible for unsteady problems(dual-time stepping).

minβ

J(U(β),X (β))

s.t. R(U(β),X (β)) = 0

M(β) = X

minβ

J(U(β),X (β))

s.t. G(U(β),X (β)) = U

M(β) = X

Assuming R(U,X ) = 0 is

solved by a fixed-point iteration:

G(U∗,X ) = U∗ ⇔ R(U∗,X ) = 0

In case of Newton-type solver:

G(U,X ) := U − P(U,X )R(U,X ),

where P ≈ (∂R/∂U)−1.

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 5/ 23

Page 6: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Motivation - The Discrete Adjoint Solver (Status 12/2015)

Using the method of Lagrangian multiplier we define the Lagrangian function as:

L(β,U,X , U, X ) = J(U,X ) + UT (G(U,X )︸ ︷︷ ︸=:N, Shifted Lagrangian

−U) + XT (M(β)− X )

KKT conditions yield equations for adjoints U, X and sensitivity vector dL/dβ:

U =∂

∂UJ(U,X ) +

∂UGT (U,X )U

=∂

∂UNT (U, U,X ) Adjoint equation

X =∂

∂XJ(U,X ) +

∂XGT (U,X )U

=∂

∂XNT (U, U,X ) Mesh Adjoint equation

dL

dβ=

d

dβMT (β)X Design equation

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 6/ 23

Page 7: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Motivation - Test Case(Status 12/2015)

We have to account for a lot of different equations as well as time and spacediscretization methods.

Simplest case: steady 2D Euler equations with explicit Euler for pseudo-timestepping

Space discretization:

1. Centered Scheme + Artificial dissipation (JST)

2. Upwind Scheme + Limiter (Roe)

Airfoil in transonic flow conditions (10127 Elements)

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 7/ 23

Page 8: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Motivation - Tape Specific Performance (Status 12/2015)

Memory

State

men

ts

Jaco

bi entr

ies

Adjoin

t vect

or

Index

vect

or0

20

40

60

80

%o

fT

ota

l

JST - Linear Indexing (59.53MB Total)

JST - Nonlinear Indexing (72.19MB Total)

Roe - Linear Indexing (150MB Total)

Roe - Nonlinear Indexing (173.88MB Total)

Statements

Oth

er

Activ

eCop

y

Linea

r0

20

40

60

80

%o

fT

ota

l

Roe (6.87 Mio Total)

JST (2.87 Mio Total)

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 8/ 23

Page 9: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Problem analysis

Differences for linear indexing and nonlinear indexing:

linear indexing nonlinear indexingStatement memory (byte, n args.) 1 + 12 ∗ n 5 + 12 ∗ nCopy statement (byte) 0 17

⇒ Constant memory per statement increased

⇒ and additional statements need to be written

But large size reduction of the adjoint vector

Solution: Avoid the writing of additional statements?Use reference counting for the indices!

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 9/ 23

Page 10: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Soultion - std::shared ptr example1

#inc l u d e <memory>#inc l u d e <i o s t r e a m>

s t r u c t Foo {Foo ( ) { s t d : : cout << ”Foo . . .\ n” ; }˜Foo ( ) { s t d : : cout << ”˜Foo . . .\ n” ; }

};

i n t main ( ){{

s t d : : cout << ” c o n s t r u c t o r w i t h o b j e c t\n” ;s t d : : s h a r e d p t r<Foo> sh2 (new Foo ) ;s t d : : s h a r e d p t r<Foo> sh3 ( sh2 ) ;s t d : : cout << sh2 . u s e c o u n t ( ) << ’\n ’ ;s t d : : cout << sh3 . u s e c o u n t ( ) << ’\n ’ ;

}

}

Output:

constructor with object

Foo...

2

2

~Foo...

1Taken from: http://en.cppreference.com

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 10/ 23

Page 11: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Solution - Off-the-shelf solution with std::shared ptr

s t r u c t Act i veRea l {double p ; // p r ima ls t d : : s h a r e d p t r<i n t> i n d e x P t r ;

} ;

Analysis of the method:

Size of ActiveReal increases: 16 byte → 24 byte

Memory for each managed index: 24 byte + 4 byte(internal shared ptr structure + data for int)

2 additional dereferencing operations for each index access:

i n t i n d e x = ∗a . i n d e x P t r . i n t e r n a l S t r u c t u r e−>d a t a P o i n t e r ;

⇒ Lots of additional memory required

⇒ Access of indices will be slow

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 11/ 23

Page 12: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Solution - Overview CoDiPack

Focus on industrial applications

High memory requirementsHeterogeneous parallelisation strategiesRun time efficiency...

Clear user interfaces

Easy to use (for simple cases)

Easy to extend

Open source license (e.g. in combination with SU2)

Different tapes:

Jacobi tapingPrimal value taping (under development)Index reuseHigher order derivatives

Additional features:

External functionsVector mode...

Expression

ActiveReal TapeInterface

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 12/ 23

Page 13: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Solution - New index manager in CoDiPack

Expression

ActiveReal TapeInterface

IndexManager NonlinearIndexManager

LinearIndexManager

???

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 13/ 23

Page 14: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Solution - New index manager in CoDiPack

Idea: Perform the index counting in the index manager.

1

2

0· · ·2

4

2· · ·

IndexManager

indexUse array

struct ActiveReal:double p;int index;

indexUse[index] = number of ActiveReal that use this index

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 14/ 23

Page 15: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Solution - New index manager in CoDiPack

Idea: Perform the index counting in the index manager.

Introduce array for the index count

Update index use when indices are accessed

Special handling for index == 0 omitted

c l a s s IndexManager {i n t globalMaximumIndex ;i n t currentMaximumIndex ;

s t d : : v e c t o r<i n t> f r e e I n d i c e s ;

std::vector<int> indexUse ;

. . .} ;

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 15/ 23

Page 16: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Solution - New index manager in CoDiPackChanges with respect to the old index manger

Special handling for index == 0 omitted

i n l i n e vo id f r e e I n d e x ( i n t& i n d e x ) {indexUse[index] -= 1 ;

i f ( indexUse[index] == 0) { // on l y f r e e the i ndex i f i t not used any l o n g e r. . .}}

i n l i n e i n t c r e a t e I n d e x ( ) {. . .

checkIndexUseSize() ;indexUse[index] = 1 ;r e t u r n i n d e x ;

}

i n l i n e vo id a s s i g n I n d e x ( i n t& i n d e x ) {i f ( indexUse[index] >1) {

indexUse[index] -= 1 ;

i n d e x = t h i s−>c r e a t e I n d e x ( ) ;}}

i n l i n e vo id c o p y I n d e x ( i n t& l h s , const i n t& r h s ) {f r e e I n d e x ( l h s ) ;indexUse[rhs] += 1 ;

l h s = r h s ;}

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 16/ 23

Page 17: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Solution - New index manager in CoDiPack

Characteristics of the implementation:

No dereferencing operation in order to access the index of an ActiveReal

Size of ActiveReal unchanged(no change to the structure)

4 byte for each managed index

Reference counting performed in array

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 17/ 23

Page 18: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Results - SU2 Test Case

Same test case as in the motivation

Simplest SU2 case: steady 2D Euler equations with explicit Euler forpseudo-timestepping

Space discretization:

1. Centered Scheme + Artificial dissipation (JST)

2. Upwind Scheme + Limiter (Roe)

Airfoil in transonic flow conditions (10127 Elements)

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 18/ 23

Page 19: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Results - Tape Specific Performance

Memory

State

men

ts

Jaco

bi entr

ies

Adjoin

t vect

or

Index

vect

or0

20

40

60

80

100

%o

fT

ota

l

Jst - Linear Indexing (61MB Total)

Jst - Nonlinear Indexing (61MB Total)

Jst - Nonlinear Indexing assign opt (49MB Total)

Roe - Linear Indexing (152MB Total)

Roe - Nonlinear Indexing (154MB Total)

Roe - Nonlinear Indexing assign opt (124MB Total)

Entries

State

men

ts

Jaco

bies0

2 · 106

4 · 106

6 · 106

8 · 106

1 · 107

1.2 · 107

En

trie

s

Jst - Linear Indexing

Jst - Nonlinear Indexing

Jst - Nonlinear Indexing assign opt

Roe - Linear Indexing

Roe - Nonlinear Indexing

Roe - Nonlinear Indexing assign opt

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 19/ 23

Page 20: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Results - Speed test

2D coupled Burgers equation on 601x601 grid with 32 time steps

ut + uux + vuy =1

R(uxx + uyy ) (1)

vt + uvx + vvy =1

R(vxx + vyy ) (2)

Evaluated on one node of the Elwetritsch clusterwith two Intel E5-2670 cpu’s (16 cores)

Single test case: Run only one processFull memory bandwidth available

Multi test case: Run 16 times the same problemMemory bandwidth is limited

Tape memory is approx. 4.8 GB for linear indexing

Tape memory is approx. 4.5 GB for nonlinear indexing

Contains no active copy operations (have been optimized out)

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 20/ 23

Page 21: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Results - Speed test

0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4

Linear indexing

Nonlinear indexing

Nonlinear indexingassign opt.

time in sec.

single record

single interpret

multi record

multi interpret

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 21/ 23

Page 22: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

Conclusion & Outlook & Release

Conclusion:

Statements for active copy operations can be eliminated

Memory overhead for the management is minimal

Time overhead is also reasonable

Outlook:

Test the new scheme on real world test cases

Analyze the influence of the index pattern in the reverse mode

Release:

Today

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 22/ 23

Page 23: Assign Optimization for Index-Reuse Tapes EuroAd Workshop - Max Sageba… · indexUse[index] =^ number of ActiveReal that use this index Sagebaum, Albring Assign Optimization for

CoDiPack Release 1.2

CoDiPack release 1.2

Modularized tape implementations

Vector mode implementation

Assign optimization index manger (new default one)

Special reference type for arguments that are used multiple timese.g. w = x ∗ x ∗ x + sin(x);stores 1 Jacobi instead of 4

Newsletter: [email protected]

Contact: [email protected]

Thank you for your attention!

Sagebaum, Albring Assign Optimization for Index-Reuse Tapes 23/ 23