Domain-specific languages and automated code generation ... · Domain-speciﬁc languages and...

Domain-specific languages and automated

code generation for scientific computing

Garth N. Wells

Department of Engineering, University of Cambridge

Software Frameworks for Challenging ComputationalProblems, University of Crete

14 January 2013

Collaborators

Martin S. Alnæs, Johan Hake, Anders Logg, Marie E. Rognes,Kristian B. Ølgaard

http://www.eng.cam.ac.uk/~gnw20


Outline

• Examples of expressive computing for PDEs

• Domain-specific languages for scientific computing

• FEniCS libraries for solving PDEs

• FEniCS examples

• Scientific software community building



Availability

All code available under GNU licenses:

http://www.fenicsproject.org

Book available under a Creative Commons license:

http://www.fenicsproject.org/book



http://www.fenicsproject.org/book


Reaction-diffusion equation

Differential format

−∇2u + u = f in Ω

∇u · n = 0 on ∂Ω

Variational format: find u ∈ V ⊂ H1 (Ω) such that

a (u, v) = L (v) ∀ v ∈ V

Bilinear and linear forms

a(u, v) :=

∫Ω∇u · ∇v + uv dx

L(v) :=

∫Ω

fv dx



Reaction-diffusion equationComplete solver – Python interface

from dolfin import *

# Create mesh and define function spacemesh = UnitCubeMesh(16, 16, 16)V = FunctionSpace(mesh, "Lagrange", 1)

# Define variational problemu = TrialFunction(V)v = TestFunction(V)f = Expression("sin(x[0])*sin(x[1])")a = dot(grad(u), grad(v))*dx + u*v*dxL = f*v*dx

# Compute solutionu = Function(V)solve(a == L, u)

plot(u, interactive=True)



Stokes equations

Differential format:

−∇2u +∇p = f in Ω

∇ · u = 0 in Ω

Find u,p ∈ V × Q such that

a((u,p), (v,q)) = L((v,q)) ∀ v,q ∈ V × Q

where

a((u,p), (v,q)) :=

∫Ω∇u : ∇v− p∇ · v + (∇ · u) q dx

L((v,q)) :=

∫Ω

f · v dx



Stokes equationsDomain-specific language representation

# Create mixed space (Taylor-Hood)V = VectorElement("Lagrange", "tetrahedron", 2)Q = FiniteElement("Lagrange", "tetrahedron", 1)TH = V * Q

# Create trial and test functions(u, p) = TrialFunctions(TH)(v, q) = TestFunctions(TH)

# Coefficient function appearing in Lf = Coefficient(V)

# Define formsa = inner(grad(u), grad(v))*dx - p*div(v)*dx + div(u)*q*dxL = dot(f, v)*dx



Nonlinear Poisson-like equation

Differential format:

−∇ ·(

1 + u2)∇u = f

Variational format: find u ∈ V such that

F (u; v) = 0 ∀ v ∈ V

where the functional F is linear in v and nonlinear in u

F (u; v) :=

∫Ω

(1 + u2

)∇u · ∇v− fv dx



Nonlinear Poisson-like equationDomain-specific language representation

# Function spaceV = FiniteElement("Lagrange", "tetrahedron", 2)

# Coefficientsu = Coefficient(V)f = Coefficient(V)

# Define residual (want to solve F = 0)v = TestFunction(V)F = (1.0 + u*u)*dot(grad(u), grad(v))*dx - f*v*dx

# Jacobian and incremental correction for a Newton solverdu = TrialFunction(V)J = derivative(F, u, du)



Domain-specific languages for scientific computing

• Language designed to support an application domain

• Expressive, mathematical syntax

• Support high-level abstractions

• Correctness checks

• Scope for domain-specific optimisations

• Represent intention – oblivious to low-level details



Traditional development approaches

• Time consuming and error prone

• Mathematical abstraction discarded in softwarerepresentation

• Often blurred boundary between method definition andimplementation

• Efficiency – readability/generality paradox

• New hardware is shifting the burden back onto thedeveloper

• Traditional programming languages are static



What DSLs can deliver: accelerated development

• Expressiveness

• Compact representations of intention

• Reduction in errors

• Address multiple low-level programming modelstransparently (threaded, MPI, OpenCL, CUDA, FPGA, . . .)

• Extensible (if well designed)

• Creation of auxiliary problems, e.g. Jacobians, adjoints forPDEs



What DSLs can deliver: higher performance

• Readable input code→ fast execution code

• Algorithm-specific optimisations, e.g.• (AT)T = A,• ∇u = 0 if u is constant

• Generate code representations that are not feasible byhand

• Search an algorithm space

• Target-specific low-level code



Not all domain-specific languages are equally

expressive . . .

Input File for Wave Eqn

@THORN SimpleWave@DERIVATIVESPDstandard2nd[i_] -> StandardCenteredDifferenceOperator[1,1,i],PDstandard2nd[i_, i_] -> StandardCenteredDifferenceOperator[2,1,i],PDstandard2th[i_, j_] -> StandardCenteredDifferenceOperator[1,1,i] *

StandardCenteredDifferenceOperator[1,1,j]@END_DERIVATIVES@TENSORSphi, pi

@END_TENSORS@GROUPSphi -> "phi_group",pi -> "pi_group"

@END_GROUPS@DEFINE PD = PDstandard2nd...

...@CALCULATION "initial_sine"@Schedule "AT INITIAL"@EQUATIONSphi -> Sin[2 Pi (x - t)],pi -> -2 Pi Cos[2 Pi (x - t)]

@END_EQUATIONS@END_CALCULATION@CALCULATION "calc_rhs"@Schedule "in MoL_CalcRHS"@EQUATIONSdot[phi] -> pi, dot[pi] -> Euc[ui,uj] PD[phi,li,lj]

http://hpc.pnl.gov/conf/wolfhpc/2011/talks/StevenBrandt.pdfhttp://www.eng.cam.ac.uk/~gnw20

http://hpc.pnl.gov/conf/wolfhpc/2011/talks/StevenBrandt.pdf


Some domain-specific languages for non-PDE

applications in scientific computing

• Elemental (dense linear algebra)

• SPL/Spiral (digital signal processing)

• Tensor Contraction Engine (quantum chemistry)

• . . .



Some DSLs for solving PDEs numerically

Domain-specific languages (DSL)

• Analysa

• FreeFEM++

Domain-specific embedded languages (DSEL)

• Liszt (Scala)

• FEEL++ (C++)

• Sundance (C++)

• AceGen (Mathematica) – not open

• Unified Form Language (Python)



FEniCS Project

• Collaborative project on automating the solution of PDEs

• Modular collection of free software libraries





Main FEniCS components

• FIAT (tabulation of basis functions)

• Unified Form Language (UFL)

• Instant (just-in-time compilation)

• FEniCS Form Compiler (FFC)

• UFC (generated code form specification)

• DOLFIN (problem solving environment)



Main FEniCS components

• FIAT (tabulation of basis functions)

• Unified Form Language (UFL)

• Instant

• FEniCS Form Compiler (FFC)

• UFC (generated code form specification)

• DOLFIN (problem solving environment)



UFL: Unified Form LanguageA language embedded in Python for variational forms basedon mathematical abstractions – involves both a specificationand algorithms

Sub-languages

• Function spaces• Expressions• Forms

Algorithms

• Adjoints• Differentiation• Extraction based on form arity• . . .

Alnæs, 2012; Alnæs, Logg, Rognes, Ølgaard, Wells,

http://arxiv.org/abs/1211.4047http://www.eng.cam.ac.uk/~gnw20

http://arxiv.org/abs/1211.4047


UFL: example language elements (1)

Function spaces

P2 = VectorElement("Lagrange", "triangle", 2)P1 = FiniteElement("Discontinuous Lagrange", "triangle", 1)R = FiniteElement("Real", "triangle", 0)ME0 = P2*P1ME1 = MixedElement([P2, [P1, P1], P1, R])

Expressions

u = Function(P2)I = Identity(element.cell().d) # Identity tensorF = I + grad(u) # Deformation gradientC = F.T*F # Right Cauchy-Green tensor

# Invariants of deformation tensorsIc = tr(C)J = det(F)

# Stored strain energy densitypsi = (mu/2)*(Ic - 3) - mu*ln(J) + (lmbda/2)*(ln(J))**2



UFL: example language elements (2)

FormsM = f*dx(2) + f*ds(5)L = f*v*dx + g*v*dsa = dot(grad(u),grad(v))*dx - dot(avg(jump(u,n), grad(v)))*dSa = dot(grad(u),grad(v))*dx(0, "quadrature_order": 1)

Form operatations

M = action(F, f)L = lhs(F)a = rhs(F)

Algorithms

L = derivative(F, u, v)a = derivative(L, u, du)



UFL: abstract syntax tree for H1-conforming Poisson

formulation

Form a

Cell integral

*

kappa

L

inner

R

grad

u v

R

grad

L

Form L

Cell integral Exterior facet integral

*

v

L

f

R

*

-1

L

*

R

v g

L R



UFL: abstract syntax tree for L2-conforming Poisson

formulationForm a

Cell integral Exterior facet integral Interior facet integral

*

dot

R

kappa

L

grad

u v

L

grad

R

+

+

R

*

L

*

v

L

[]

R

*

dot

R

-1

L

i_19

[]

grad

L

i_16

R

i_19

][

R

*

L

*

circumradius

R

2

L

[]

R

n

L

L

*

R

i_17

4

dot

RL

][

R

][

L

*

L

kappa

R

L

*

R

RL

u

L

i_17

R

[]

i_18

R

grad

L

*

R L

i_16

R

*

L

/

RL

R

L

][

L

i_18

R

R

L

R L

RL

+

*

L

+

R

[-]

*

][

*

L

i_12

R

[-]

u

[]

grad

L

i_12

R

][

*

L

i_9

R

*

[]

R

kappa

L

*

0.5

L

+

R

+

][

R

][

L

[+]

n

v i_14

grad

L

i_8

R

][

*

L

i_13

R

0.5

-1

*

L

+

R

2 *

[-]

R

[]

L

+

[+]

R

[-]

L

i_9

[+]

i_10

*

*

R

4

L

/

R L

[]

L R

0.5

i_11

][

i_14

R

*

L

*

L

dot

R

[]

R

[+]

L

[-][]

L R

[+]

+

R

[-]

L[+]

L

+

R

i_13

][

*

L

i_15

R

L

+

R

[]

L R

+

RL

i_15

circumradius

RL

[-]

[]

R L

*

R

[+]

L

R

0.5

L

[+]

L R

i_10

L

[+]

R

[-]

L

*

R

i_11

R L

*

L

[+]

R

RL

RL

R L

*

L

[-]

R

R

[-]

L

L

R

L R

dot

L

R

][

i_8

L R

L

R

LR

RL

L R



UFL: features

• Mathematical error checking

• Basic optimisations (must be floating-point safe)• Multiply by one, zero• add zero• Constant folding

• Developed in Python



Abstract syntax tree to concrete code: compilers

Generality Efficiency

Compiler

Some UFL compilers:

• FEniCS Form Compiler, FFC (Logg, Ølgaard, Rognes and Wells)

• Symbolic Form Compiler, SFC (Alnæs and Mardal)

• Manycore Form Compiler (Markall, Rathberger, et al.)



Automation with FEniCS

Input

Equation (variational problem)

Output

Efficient application-specific code

Kirby and Logg 2006; Ølgaard, Logg and Wells, 2010, Logg and Wells, 2010, . . .http://www.eng.cam.ac.uk/~gnw20


FFC: FEniCS Form Compiler// This code conforms with the UFC specification version 2.1.0+// and was automatically generated by FFC version 1.1.0+.//// This code was generated with the option ’-l dolfin’ and// contains DOLFIN-specific wrappers that depend on DOLFIN.//// This code was generated with the following parameters://// cache_dir: ’’// convert_exceptions_to_warnings: False// cpp_optimize: False// cpp_optimize_flags: ’-O2’// epsilon: 1e-14// error_control: False// form_postfix: True// format: ’dolfin’// log_level: 10// log_prefix: ’’// no_ferari: True// optimize: True

Kirby and Logg 2006; Ølgaard and Wells, 2010; Logg et al, 2012http://www.eng.cam.ac.uk/~gnw20


FFC: generation-time performance optimisations

• Novel representations (Kirby and Logg, ACM TOMS 2006)

• Structure-based methods to reducing floating pointoperations (Kirby, et al, SISC 2005)

• Symbolic analysis to minimise floating point operations(Ølgaard & Wells, ACM TOMS 2010)



FFC: representations

Poisson element matrix:

kij :=

∫E∇φi · ∇φj dx

1. ‘Tensor contraction’ representation (affine map only)

kij := AijklGkl

where A is independent of the geometry, G is dependenton geometry.

2. Quadrature

kij =N∑

q=1

d∑α1=1

d∑α2=1

d∑β=1

Wq ∂Xα1∂xβ

∂Φi(Xq)

∂Xα1

∂Xα2∂xβ

∂Φj(Xq)

∂Xα2det F



FFC: tensor contraction representationPoisson element stiffness matrix

kij = AijklGkl

where

Aijkl =

∫E0

∂φi

∂Xk

∂φj

∂Xldx

Gkl = det F∂φk

∂xm

∂φl

∂xm

• A is model specific and can be evaluated prior to run-time

• G is dependent on element geometry and is evaluated atrun-time

• Contraction can be unrolled

Kirby & Logg, ACM TOMS 2006



Tensor contraction optimisations

Matrix representation of A for Poisson equation (Lagrange,k = 2):

3 0 0 -1 1 1 -4 -4 0 4 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0-1 0 0 3 1 1 0 0 4 0 -4 -41 0 0 1 3 3 -4 0 0 0 0 -41 0 0 1 3 3 -4 0 0 0 0 -4-4 0 0 0 -4 -4 8 4 0 -4 0 4-4 0 0 0 0 0 4 8 -4 -8 4 00 0 0 4 0 0 0 -4 8 4 -8 -44 0 0 0 0 0 -4 -8 4 8 -4 00 0 0 -4 0 0 0 4 -8 -4 8 40 0 0 -4 -4 -4 4 0 -4 0 4 8

Exploit structures in A to reduce operation count – findcomplexity reducing relationshipsKirby et al. ACM TOMS 2005, 2006



FFC: quadrature optimisations

1. Eliminate operations on zeroes a priori

2. Tabulate basis functions

3. Simplify expressions, e.g. x(y + z) + 2xy→ x(3y + z)

4. Loop invariant code motion to reduce floating pointoperations

Ølgaard and Wells, ACM TOMS 2010



FFC: quadrature optimisations – runtime performanceWeighted Laplace

none

-zeros

-simplify

-simplify-zeros-ip

-ip-zeros

-basis

-basis-zeros

101

102

103tim

e[s

]

-O0

-O2

-O2 -funroll-loops

-O3



FFC: quadrature optimisations – runtime performanceHyperelasticity

none

-zeros

-simplify

-simplify-zeros-ip

-ip-zeros

-basis

-basis-zeros

100

101

102

103

104

105tim

e[s

]

-O0

-O2

-O2 -funroll-loops

-O3



FFC: relative performance

a (u,u) =

∫E(f0f1 . . . fnf ) ∇

su : ∇su dx

nf = 1 nf = 2 nf = 3flops q/t flops q/t flops q/t

p = 1, q = 1 888 0.34 3060 0.36 10224 0.11p = 1, q = 2 3564 1.42 11400 1.01 35748 0.33p = 1, q = 3 10988 3.23 34904 1.82 100388 0.63p = 1, q = 4 26232 5.77 82548 2.87 254304 0.93

p = 2, q = 1 888 1.20 8220 0.31 54684 0.09p = 2, q = 2 7176 1.59 41712 0.49 284232 0.11p = 2, q = 3 22568 2.80 139472 0.71 856736 0.17p = 2, q = 4 54300 4.36 337692 1.01 2058876 0.23

p = 3, q = 1 3044 0.36 30236 0.16 379964 0.02p = 3, q = 2 12488 0.92 126368 0.26 1370576 0.03p = 3, q = 3 36664 1.73 391552 0.37 4034704 0.05p = 3, q = 4 92828 2.55 950012 0.49 9566012 0.06

p: order of fi

q: order of u and vhttp://www.eng.cam.ac.uk/~gnw20


DOLFIN: problem solving environment

• Main FEniCS user interface

• Re-usable library designed to support generatedapplication-specific code

• Third-party linear algebra interfaces (PETSc, Trilinos,NumPy, ...)

• Distributed and shared memory parallelism



DOLFIN: interfaces

• Near identical C++ and Python interfaces

• Python interface generated largely automatically fromC++ using SWIG

• Smart pointers provide robust memory managementbetween C++ and Python interfaces

• Python interface dramatically reduces user adoptionthreshold

• Limited use of templates in high-level user interfacemakes Python wrapping tractable

• UFL, FFC and DOLFIN are seamlessly integrated in Pythoninterface



DOLFIN: C++ Poisson demo#include <dolfin.h>#include "Poisson.h"using namespace dolfin;.int main()// Create mesh and function spaceUnitSquareMesh mesh(32, 32);Poisson::FunctionSpace V(mesh);

// Define boundary conditionConstant u0(0.0);DirichletBoundary boundary;DirichletBC bc(V, u0, boundary);

// Define variational formsPoisson::BilinearForm a(V, V);Poisson::LinearForm L(V);Source f;L.f = f;

// Compute solutionFunction u(V);solve(a == L, u, bc);

// Save solution in VTK formatFile file("poisson.pvd");file << u;



DOLFIN: Python Poisson demofrom dolfin import *

# Create mesh and define function spacemesh = UnitSquareMesh(32, 32)V = FunctionSpace(mesh, "Lagrange", 1)

# Define Dirichlet boundary (x = 0 or x = 1)def boundary(x): return x[0] < DOLFIN_EPS or x[0] > 1.0 - DOLFIN_EPS

# Define boundary conditionu0 = Constant(0.0)bc = DirichletBC(V, u0, boundary)

# Define variational problemu, v = TrialFunction(V), TestFunction(V)f = Expression("10*exp(-(pow(x[0]-0.5, 2) + pow(x[1]-0.5, 2))/0.02)")g = Expression("sin(5*x[0])")a = inner(grad(u), grad(v))*dxL = f*v*dx + g*v*ds

# Compute solutionu = Function(V)solve(a == L, u, bc)

# Save solution in VTK formatFile("poisson.pvd") << u

# Plot solutionplot(u, interactive=True)



Reconciling high-level scripted interfaces and

performanceJust-in-time compilation



DOLFIN: parallel

mpirun -np 1024 python demo.py



DOLFIN: parallel hardware

• Distributed paradigm (message passing) straightforward

• Intra-node hard• Changing hardware• Changing languages• Non-uniform memory access (NUMA)• Hard to develop good performance models to select best

strategy

• Effective threading crucial on modern lowmemory-per-core machines

• DOLFIN currently being tested/developed on two systemsin the top 10 of the Top 500 list



DOLFIN: Mira at Argonne National Laboratory (Blue

Gene/Q)



DOLFIN: intra-node parallelismColoured mesh



DOLFIN: threaded matrix assembly – single socketIntel Core i7-980 (6 cores) data no re-ordering

1 2 3 4 5 6number of threads

0

1

2

3

4

5

6sp

eed

up fa

ctor

PoissonNavier-Stokesideal



DOLFIN: threaded matrix assembly – single socketIntel Core i7-980 (6 cores) with re-ordering for data locality


1

2

3

4

5

6sp

eed

up fa

ctor




DOLFIN: threaded matrix assembly – dual socket

NUMA2 x Intel Xeon X5690 (12 cores) with for data locality re-ordering


2

4

6

8

10

12sp

eed

up fa

ctor




DOLFIN: threaded matrix assembly – dual socket

NUMA2 x Intel Xeon X5690 (12 cores) with re-ordering but no matrix insertion


2

4

6

8

10

12sp

eed

up fa

ctor




DOLFIN: some ongoing developments

• Hybrid threaded/MPI computation

• Distributed mesh refinement

• Multi-domain code generation

• New code generation optimisation strategies

• Target-specific code generation



Examples: hyperelasticity

Displacement field u? given by:

u? = argminu∈V

Π(u)

where

• Π :=∫

Ω ψ(E(u))− B · v dx−∫∂Ω T · v ds

• ψ (E) is the strain energy density

• E :=(FTF− I

)/2 is the Green-Lagrange strain

• F := ∇Xu + I is the deformation gradient.



Examples: hyperelasticity as a minimisation

problem (1)

V = VectorElement("Lagrange", "tetrahedron", 1)

# Current displacementu = Coefficient(V)

# Body force per unit volume and traction force (on reference config)B, T = Coefficient(V), Coefficient(V)

# KinematicsI = Identity(V.cell().d) # Identity tensorF = I + grad(u) # Deformation gradientC = F.T*F # Right Cauchy - Green tensor

# Invariants of deformation tensorsJ, Ic = det(F), tr(C)



Examples: hyperelasticity as a minimisation

problem (2)

# Elasticity parametersmu, lmbda = 100, 0.3

# Stored strain energy density (compressible neo-Hookean model)psi = (mu/2)*(Ic - 3) - mu*ln(J) + (lmbda /2)*(ln(J))**2

# Total potential energyPi = psi*dx - dot(B, u)*dx - dot(T, u)*ds

# First variation of Piv = TestFunction(V)F = derivative(Pi, u, v)

# Compute Jacobian of Fdu = TrialFunction(V)a = derivative(F, u, du)



Examples: time-dependent problemsLinear advection–diffusion equation

At time tn+1, given un, a and f , find un+1 ∈ V such that

F(un+1; v) = 0 ∀v ∈ V

where

F :=

∫Ω

un+1 − un

∆tv + a ·∇un+1/2v +∇un+1/2 ·∇v− fn+1/2v



Examples: time-dependent problemsLinear advection–diffusion implementation

# Function spaceV = FunctionSpace(mesh, "Lagrange", 1)

# Advective velocityvelocity = Constant( (-100.0, 0.0) )

# Solution from previous time stepu0 = Coefficient(V)

# Trial and test functionsu, v = TrialFunction(V), TestFunction(V)

# Mid-point solutionu_mid = 0.5*(u0 + u)

# Variational problem posed at mid-pointF = (u - u0)*v*dx + dt*(dot(velocity, grad(u_mid)*v)*dx

+ dot(grad(u_mid), grad(v))*dx)

# Extract bilinear and linear formsa, L = lhs(F), rhs(F)



Examples: coupled systems of PDEsV = VectorElement("Lagrange", "triangle", 2)W = FiniteElement("Discontinuous Lagrange", "triangle", 1)Q = FiniteElement("Brezzi-Marini-Douglas", "triangle", 2)P = FiniteElement("Nedelec 1st kind H(curl)", "triangle", 2)

# Define nested mixed spaceZ = MixedElement([[V, W], Q, P]).U = Coefficient(Z).p_mid = (1 - theta)*p0 + theta*p.# F_i for each processF0 = . . . .F1 = . . . .F2 = . . . ..# Want to solve F = 0F = F0 + F1 + F2 + . . .

# Jacobiana = derivative (F, U, dU)



Community: third-party libraries and applications



Community: forums

Development repositories, bug tracker, mailing lists, answerforums hosted at http://launchpad.net


http://launchpad.net


FEniCS’13 Workshop – University of Cambridge18–19 March 2013



Domain-specific languages and automated code generation ... · Domain-speciﬁc languages and...

Documents

Transcript of Domain-specific languages and automated code generation ... · Domain-speciﬁc languages and...