A Mesh-Free, Physics-Constrained Approach to solve Partial ...

A Mesh-Free, Physics-ConstrainedApproach to solve Partial Differential

Equations with a Deep Neural Network

Bachelor’s Thesis

submitted by

Kevin Kress

at the

Faculty of Sciences

Department of Computer and Information Science

1.Reviewer: Prof. Dr. Bastian Goldlucke

2.Reviewer: Prof. Dr. Stefan Volkwein

Konstanz, 2020

Konstanzer Online-Publikations-System (KOPS) URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-2-1pqapp54g26sl0

Kress, KevinA Mesh-Free, Physics-Constrained Approachto solve Partial Differential Equations with aDeep Neural NetworkBachelor’s thesis, University of Konstanz,2020.

i

Abstract

In this work, we utilized different approaches for solving partial differential equationswith a deep neural network. The network respects the given physical laws of the equa-tions by incorporating these constraints in the training process or in the network archi-tecture. Specifically, a deep, feed-forward, and fully-connected neural network is usedto approximate the partial differential equation, where the initial and boundary condi-tions are either hard or soft assigned. The resulting physics-informed surrogate modellearns to satisfy the differential operator and the initial and boundary conditions andcan be differentiated with respect to all input variables. The accuracy of the methods isdemonstrated on multiple equations of different types and compared to either the exactor a finite element solution.

ii

Contents

List of Figures v

List of Tables vii

1 Introduction 1

1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Foundations 3

2.1 Partial Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.1 Classification of Partial Differential Equations . . . . . . . . . . . 3

2.1.2 Well-Posed Partial Differential Equations . . . . . . . . . . . . . . 6

2.2 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Related Work 13

3.1 Incorporating Neural Networks in Conventional Methods . . . . . . . . . 13

3.2 Solving Partial Differential Equations with Neural Networks . . . . . . . 14

4 Methods 17

4.1 Deep Neural Network Implementation . . . . . . . . . . . . . . . . . . . . 17

4.1.1 Network Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1.2 Training the Network . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2 Deep Learning for Partial Differential Equations . . . . . . . . . . . . . . 19

4.2.1 Trial Function and Hard and Soft Assignment . . . . . . . . . . . 19

CONTENTS iii

5 Experiments 23

5.1 Transport Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.2 Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.3 Poisson Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.3.1 Poisson Equation in 2D . . . . . . . . . . . . . . . . . . . . . . . 27

5.3.2 Poisson Equation in High Dimension . . . . . . . . . . . . . . . . 29

5.4 Heat Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.5 Burgers’ Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.6 Max-PDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.6.1 Coupled Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.6.2 Decoupled Regions . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6 Conclusion 47

Bibliography 49

Appendix 53

6.1 Further Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

iv CONTENTS

LIST OF FIGURES v

List of Figures

2.1 A simple neural network with two input nodes, a hidden layer with three

hidden nodes, and one output node. . . . . . . . . . . . . . . . . . . . . . 9

4.1 A block of the network structure with a residual connection after two

hidden layers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5.1 Network solution of the transport equation for different timestamps . . . 24

5.2 Network solution of the traveling wave equation . . . . . . . . . . . . . . 26

5.3 Network solution of the Poisson equation . . . . . . . . . . . . . . . . . . 28

5.4 3D network solution of the Poisson equation . . . . . . . . . . . . . . . . 28

5.5 Loss rate during training of the high dimension Poisson equation . . . . . 30

5.6 Solutions of the heat equation for different timesteps . . . . . . . . . . . 32

5.7 Network solution of the Burgurs’ equation . . . . . . . . . . . . . . . . . 34

5.8 Network solution of the Burgurs’ equation at different timestamps . . . . 35

5.9 Solution of the coupled max-PDE for a fixed µ = 1 . . . . . . . . . . . . 39

5.10 Solutions of the coupled max-PDE . . . . . . . . . . . . . . . . . . . . . 41

5.11 Solution of the decoupled max-PDE for fixed µ . . . . . . . . . . . . . . 42

5.12 Solution of the decoupled max-PDE . . . . . . . . . . . . . . . . . . . . . 43

6.1 Results of the Poisson equation with corner singularity . . . . . . . . . . 54

vi LIST OF FIGURES

LIST OF TABLES vii

List of Tables

5.1 The development of the relative L2 error for varying amounts of hidden

layers and neurons per layer. 20000 collocation points were used for training. 38

5.2 The development of the relative L2 error for varying amounts of collocation

points. 8 hidden layer were used each with 20 neurons. . . . . . . . . . . 39

5.3 Summary of all experiments with the used network parameters, training

samples, training duration, prediction duration (for 100000 points) and

the relative L2 Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

viii LIST OF TABLES

Chapter 1. Introduction 1

Chapter 1

Introduction

Deep Learning (DL) has become a popular and useful approach to analyze complex sys-tems in the biological, physical or engineering field. Especially with the explosive growthof available data and computer infrastructure in the current days, remarkable progress invarious scientific fields could be achieved with the help of DL and data analytics, includ-ing image recognition [KSH12], natural language processing [LBH15], cognitive science[LST15], and genomics [ADWF15]. But more often than not, the necessary amount ofdata for training the neural networks is not available or to prohibitive to acquire whichleads to incomplete results. Thus, leading to the problem that conclusions and decisionshave to be made under partial information. Also, a small data set for DL often hurtsthe robustness and convergence of the network.

One advantage of modeling physical or biological systems is that there exists a vastamount of prior knowledge. Such prior knowledge can be existing physical laws or domainexpertise and can be employed to reduce the size of the solution space. Constraining thesolution space helps the network to extract more meaningful information from the dataand to acquire an accurate result faster.

Many physical phenomena such as fluid dynamics, heat and mass transfer, and elec-tromagnetic theory, can be modeled by Partial Differential Equations (PDE). For theanalysis of these equations one has to respect the inherent uncertainties like bound-ary and initial conditions or other physical properties. Finding a numerical solutionfor a PDE has been a longstanding issue and is still a challenge, especially for high-dimensional PDEs [ZZKP19a]. The task to train a DL network to accurately solve thePDE by learning the non-linear map between the initial and output data seems difficultat first, but with the help of prior knowledge (e.g. boundary and initial conditions)solving the problem becomes more feasible.

In this work we present various approaches on how to solve PDEs with Deep NeuralNetworks (DNN) and test them on multiple different PDEs. The types and dimensionsof the PDEs vary to give a better impression of the possibilities and limitations of theproposed approach. With the DNN a surrogate function is built that approximatesthe solution of the PDE. Two approaches were introduced on how to incorporate the

2 1.1. Notation

auxiliary conditions in the DNN; a hard assignment that forces the surrogate functionto always return the right boundary and initial values and a weak assignment where theDNN learns the conditions with additional training data. The results are compared tothe exact solutions or to solutions obtained by other conventional numerical methodslike finite element method.

The necessary background for a better understanding of PDEs and neural networksis presented in chapter 2. In the next section the related literature and work is presentedto give an overview of what has been done. Section 3 explains the methods that areutilized to solve the PDEs with the help of a neural network. Implementing and testingthe proposed methods on different PDEs is described in section 4. The results areanalyzed and compared to the exact solution. In the last section the work is concludedwith a section about the future work and a summary about the results.

1.1 Notation

In this section we will introduce notations that are used throughout the thesis. Sincemost of the notation is trivial maths or abbreviation we keep this section short.

The derivative of the function under analysis is abbreviated with the following nota-tion

ux1 =∂u

∂x1

,

ux1x1 =∂2u

∂x21

,

ux1x2 =∂2u

∂x1∂x2

=∂

∂x2

(∂u

∂x1

),

∆u = ux1x1 + ux2x2 + · · ·+ uxnxn ,

where u = u(x1, x2, . . . , xn) is a function with n parameters.

The domain of the PDE is written as Ω which is an open subset of Rn, if not specifiedotherwise. The boundary region of the domain will be denoted as Γ or ∂Ω.

Chapter 2. Foundations 3

Chapter 2

Foundations

2.1 Partial Differential Equations

Partial differential equations (PDE) have a long history and appear in many applicationsto this day. They are often used to describe a physical system which contains a varietyof phenomena, such as heat transfer, fluid dynamics, sound and many more. PDEs oftenhave a large degree of complexity, as they often contain multiple independent variablesin addition to an unknown function and its derivatives that are dependant on thesevariables. The highest occurring derivative defines the order of each PDE.An ordinarydifferential equation (ODE) is a special case, with only one independent variable. Thesolution of an ODE behaves quite different and generally has significantly less complex-ity than a PDE with multiple independent variables. As a consequence of their highcomplexity, there exists no general theory that gives a statement about the solvabilityof all PDEs. Since PDEs can model a wide variety of physical systems with differentgeometric and physical circumstances, such a theory is very unlikely to be developed.Therefore, researcher select and try to solve particular PDEs that can be used for furtherapplications and research [Eva10].

Definition 1 Let k ≥ 1 be a fixed integer and Ω an open subset of Rn, then a kth-orderpartial differential equation has the form of

F (Dku(x), Dk−1u(x), . . . , Du(x), u(x), x) = 0, x ∈ Ω, (2.1)

where F is the given function, u(x) = u(x1, . . . , xn) is the unknown function and Dku(x)is the set of all partial derivatives of order k for the function u(x).

2.1.1 Classification of Partial Differential Equations

We introduce definitions to differentiate between different forms of PDEs in an attemptto group similar PDEs. The goal for this classification is to find PDEs that might sharesolution methods or help to anticipate other certain properties.

4 2.1. Partial Differential Equations

Linear vs Non-Linear

One of the most important distinction is made between linear and non-linear PDEs.Many mathematical techniques like Fourier series, Laplace transform or superpositioncan be used to gain an exact solution from numerous linear PDEs. These exact solutionscan not only be used to verify calculations and simulations, but are also often utilized toobtain numerical solutions of non-linear PDEs [Eva10]. The analytical solutions of non-linear PDEs are often unattainable and, therefore, exploiting linear PDEs to acquire ordevelop a numerical method for solving difficult non-linear PDEs is a valuable steppingstone. The following definitions in this section are taken from [Eva10].

Definition 2 A PDE is called linear if there exists a linear combination of all termscontaining u and all its derivatives. The coefficients of these terms must depend on theindependent variables alone.

k∑i=0

ai(x)Diu = f(x),

where ai and f are given functions.

Definition 3 A PDE is called semi-linear if the derivatives of order k are a linearcombination with coefficients that are independent of u.

ak(x)Dku+ a0(Dk−1u, . . . , Du, u, x) = 0.

Definition 4 A PDE is called quasi-linear if the derivatives of order k are a linear com-bination with coefficients that depend only on the independent variables and derivativesof u of order strictly less than k.

ak(Dk−1u, . . . , Du, u, x)Dku+ a0(Dk−1u, . . . , Du, u, x) = 0.

Definition 5 A PDE is called non-linear if the derivatives of order k are not indepen-dent of u. Every PDE that is not quasi-linear is non-linear.

With the help of the above definitions, we can create the following relationship forPDEs

”Linear $ Semi-Linear $ Quasi-linear $ Non-linear”

To give better and more concrete examples, we create a general form for the first-orderpartial differential equation with n independent variables from the definitions

F (ux1 , ux2 . . . , uxn , u, x1, . . . , xn) = 0,

where F is the given function, u = u(x1, . . . , xn) is the unknown function and x ∈ Ω.

Single first order PDE with two independent variables can have following forms:


1. Linear

a(x, y)ux + b(x, y)uy = c(x, y)u+ d(x, y)

2. Semi-linear

a(x, y)ux + b(x, y)uy = c(x, y, u)

3. Quasi-linear

a(x, y, u)ux + b(x, y, u)uy = c(x, y, u)

Types of Linear Partial Differential Equations

Another way to classify linear PDEs is to divide them into these three types: elliptic,hyperbolic and parabolic. Usually second order PDEs are often classified this way buta similar definition can be used for higher dimensions which we did not include in thisthesis.

Furthermore, we create a general form for the second-order partial differential equa-tion with two independent variables from the definitions:

auxx + buyy + 2cuxy + dux + euy + fu+ g = 0

where the coefficients are given functions of x and y (i.e. a = a(x, y), . . . , g = g(x, y))and u = u(x, y) is the unknown function.

Definition 6 A PDE of second order with two independent variables is called

elliptic

hyperbolic

parabolic

if ∆(x, y) := a(x, y)b(x, y)− c(x, y)2 is

> 0 for all (x, y)

< 0 for all (x, y)

= 0 for all (x, y)

Since the definition is dependent on points, there also exist PDEs that change theirtype depending on the region. The reason behind this classification is the differentbehavior of these PDE types. Each equation of the three types behaves very differentlyand can give more insight to the described physical system. Elliptic PDEs, like anellipse, tend to be rather smooth and, for example, often describe a physical problemthat incorporates a diffusion process reaching an equilibrium. Hyperbolic PDEs areoften used to describe physical phenomena with discontinuities, including elasticity oracoustics, analogously to the disconnected section in a hyperbola. For example, shockwaves created by a mechanical oscillator can be expressed by a hyperbolic PDE. ParabolicPDEs are utilized for time dependent physical systems, like heat flow or particle diffusion,and fill the gap between hyperbolic and elliptic PDEs.

6 2.1. Partial Differential Equations

2.1.2 Well-Posed Partial Differential Equations

Many PDEs express infinitely many characteristically similar phenomena or processes,due to the fact that, in general, PDEs have infinitely many solutions. Therefore, one hasto define the auxiliary conditions to separate a specific solution from the set of solutionsthat describe a physical system. In particular, these conditions must be chosen in sucha way that the PDE is well-posed.

Definition 7 A partial differential equation is well posed when

1. a solution to the problem exists,

2. the solution is unique, and

3. the solution depends continuously on the data defined in the problem statement.

Therefore, we need enough auxiliary conditions to prevent the existence of multiplesolution and not so many conditions to have no solution. Furthermore, the uniquesolution should depend continuously on the initial data and, preferably, small changesto the initial data should lead to small changes in the solution.

Boundary Conditions

One common auxiliary condition is called the boundary condition. As the name suggests,boundary conditions specify the behaviour of the PDE’s solution at the boundary of theconsidered domain [PR05, 1.5.2]. Let’s consider the time dependant wave equation on atwo-dimensional spatial domain Ω:

utt −∆u = 0, (x, y) ∈ Ω, t > 0.

In general, we can assume that Ω is a fixed domain with a boundary. To gain a uniquesolution, we need to specify the behaviour of the solution on the boundary ∂Ω. There arethree very common types of boundary conditions that are encountered in applications.The first kind is the Dirichlet condition which we also used in all our experiments. TheDirichlet condition states the value of the solution on the boundary:

u(x, y, t) = f(x, y, t), (x, y) ∈ ∂Ω, t > 0.

The second kind, called Neumann condition, describes the normal derivative of thesolution on the boundary:

un = f(x, y, t), (x, y) ∈ ∂Ω, t > 0,

where un = ∂u∂n

.


The last type is called the Robin condition and contains a relation between thesolution and the normal derivative of it:

α(x, y, t)un + u(x, y, t) = f(x, y, t), (x, y) ∈ ∂Ω, t > 0.

In addition to these three types, there are also mixed boundary conditions whichconsist of more than one kind of boundary conditions [PR05, 1.5.2]. For example, amixed boundary condition can be created by describing one region of the boundary witha Dirichlet condition and the rest of the boundary region with a Neumann condition.There are more exception but since they are not utilized in our experiments we won’tdive deeper into this topic.

Initial Conditions

Another constraint that appears in nearly every time-dependent PDE is the initial con-dition. This condition describes the initial state of the solution (most of the time t = 0).It makes sense that you need the initial state of a physical system to describe its devel-opment. If we look at the example above the initial condition can be defined as

u0 = u(x, y, 0) = f(x, y), (x, y) ∈ Ω.

8 2.2. Deep Learning

2.2 Deep Learning

Many science fiction books and movies present a vision of the future which contains ma-chines that can think and learn and, in fact, be as intelligent as humans. Even thoughthis genre often portrays technological progress as a threat to humanity, the currentstate of artificial intelligence (AI) has many possible applications, and the future lookspromising. Realistically speaking, while AI is a long way from being as intelligent ashumans, it is a thriving field with increasing resources. Currently, research is focused oncreating an intelligent system to solve a specific task. Such tasks include comprehend-ing speech and text, recognizing persons or objects, or making diagnoses in the medicalfield. Most of the current problems inherit a similar issue, namely formally describingthe solution in contrast to other problems that can be solved and described by a formal,mathematical way and can therefore be solved with the help of AI without major hin-drance. This intuitive and automatic problem solving we use for understanding speechand visual content needs to be broken down into simpler concepts so that computers canlearn from them.

Even though Deep Learning (DL) is seen as a new and exciting technology by manypeople, it has a quite long history. DL dates back as far as 1940 where it was known ascybernetics: However, this field has been known as DL since 2006 [GBC16]. The influenceof different research and perspectives played a large part in the naming and brandingof DL. There are multiple reasons why DL got more and more popular. Firstly, withthe increasing amount of available training data DL achieves better and more accurateresults. Secondly, with hardware and software improvements the size of DL models grewover time. Therefore, more complex tasks can be solved with a higher accuracy. Whilethere are more reasons, these are the most obvious, and for brevity we won’t investigatethe history of DL further. [GBC16].

Especially today with the explosive growth of available data and computer infras-tructure, remarkable progress in multiple scientific fields has been achieved with thehelp of DL and data analytics, including image recognition [KSH12], natural languageprocessing [LBH15], cognitive science [LST15], and genomics [ADWF15].

2.2.1 Deep Neural Networks

Perceptron

Before we look further into a deep neural networks, we briefly explain what a percep-tron (or node) is, since it is the basic building block for a deep neural network. Aperceptron can be viewed as an algorithm with multiple inputs x = (x1, . . . , xn)T andone output y. The output of a perceptron is defined as follows y = σ(wTx + b), wherew = (w1, . . . , wn)T, called weights, and b, known as bias, are the parameters of theperceptron and σ is the activation function, which defines the output values. There aremany various activation functions with different advantages and disadvantages but themost common ones are the step function (e.g. σ(x) = 0 if x < 0 and σ(x) = 1 if x >= 0


Input Layer Hidden Layer Output Layer

Figure 2.1: A simple neural network with two input nodes, a hidden layer with three hiddennodes, and one output node.

for binary classification) and sigmoid functions (e.g. σ(x) = tanhx). Due to its sim-plicity, a perceptron can only be a linear binary classifier because it can only separatedata into two different groups. Therefore, more complex system needs to be build tomodel more complex systems. One way to do that is to build a network with a lot ofperceptrons.

Feed-Forward Fully-Connected Deep Neural Networks

In general, a deep feed-forward network consists of various layers with multiple percep-trons which are connected in a feed-forward fashion, where the word ”deep” emphasizesthe characteristic of numerous hidden layers. Deep feed-forward networks are the mosttypical deep learning models. The purpose of a feed-forward network is often to ap-proximate a function f by defining a mapping y = f(x; θ) and learning the values ofthe network’s parameters θ to achieve the best result. The parameter θ consists of allweights and biases that appear in the network. Since the input information x flowsthrough some intermediate computations to the final output y, the models are calledfeed-forward. The outputs of the network are never fed back to itself, because there areno cyclic or recurrent connections, in contrast to e.g. recurrent networks. Feed forwardnetworks were a huge milestone towards machine learning. Not only do they build thebasis of commercial applications and networks, they were also a conceptual steppingstone for other networks like recurrent networks, that are used for many natural lan-guage applications. Also, convolutional networks, for example, are a specialized kind offeed-forward networks [NM18].

In the learning phase, the network learns how to use the hidden layers to producethe desired output. Since these intermediate layers do not have a desired output in thetraining data, they are called hidden.


Each layer consists of multiple nodes, which are connected to the nodes of the previouslayer. When each node in the layer is connected to all nodes in the previous layer, thenetwork is called fully-connected. The i-th hidden layer can be described by the function

li(xi) = σ(xi ·Wi + bi), xi ∈ Rd

where i is the number of the hidden layer, xi is the output from the previous layer, Wi isthe weight matrix with the dimension d× q, that represents the connections between thenodes and bi is the bias vector with the size of 1×q. The function li(xi) can be written asthe map li : Rd −→ Rq. σ is the activation function, which is an element-wise, non-linearfunction. The output of the activation function is then transformed by the next weightmatrix and bias vector and then given to another activation function. Adding a newhidden layer only consists of adding a weight matrix and bias vector.

With more hidden layers or more nodes in the hidden layer the network’s capabilityto approximate more complex functions can be increased. Multiple different activationfunctions can be employed in the network. Sigmoid, hyperbolic tangent (Tanh) or Recti-fied Linear Unit (ReLU) are popular choices. Since we have to deal with at least secondorder partial differential equations, we have to calculate the second and higher deriva-tives. This requirement removes the ReLU functionf(θ) = max(0, θ) from our choices,because the second and higher order derivatives of it is 0 everywhere (except at θ = 0).

With this information we can define a k − 1 hidden layer neural network, that canbe easily generalized to a multiple hidden layer network, forming a deep neural network.The d-dimensional input vector x ∈ Dd is the model input, the output y ∈ Dp is ap-dimensional vector, and the hidden layers have a q-dimensional output vector. So, thefunction can be written as:

y(x) = Wk · (lk−1 · · · l1 l0(x)) + bk, x ∈ Rd,

where W0, Wi|0<i<k, Wk are the weight matrices of the sizes d× q, q × q and q × p andbi|0≤i<k and bk are the bias vectors of the sizes 1× q and 1× p, respectively.

For the training aspect, we need to measure the performance of the model and increaseits accuracy. One way of doing this is to compute the error of the prediction with thehelp of an object function and try to minimize this function [GBC16]. The so-calledloss function must capture the desired result in a single number in such a way that byreducing the number a better model accuracy is acquired. Therefore, it is importantto find a suitable objective function that represents the network goals. Hence, trainingdata is needed to compare the accuracy of the model.

Having defined a loss function, we further need an algorithm to adjust the parametersof the network in such a way that the loss function reaches a minimum. This leads tothe optimization problem

minθ∈Θ

ferror(θ),

where θ is the high-dimensional vector of the network parameters and Θ is the space ofall possible θ. Back-propagation is a popular building block of such a learning algorithm


and is a method to compute the gradient of a function (in this case the optimizationfunction). The gradients are calculated layer by layer using the chain rule starting fromthe output layer and propagating towards the input layer [GBC16]. The weights andbiases are then adjusted with the help of the gradients in another algorithm like stochasticgradient descent (SGD).

Chapter 3. Related Work 13

Chapter 3

Related Work

Solving partial differential equations (PDE) has a long history. Many methods to solvethem have been developed over centuries. Typical methods are often based on one oftwo general methods. One is the Finite Differences (FD) approach, where the solutionis approximated on a number of finite points, usually arranged on a grid. The FiniteElement (FE) method, on the other hand, divides the whole domain of the PDE intosimpler elements by choosing a basis function to represent the PDE solution and solvesthe resulting system of algebraic equations.

Studies related to solving PDE with neural networks have been proposed in the pastand can be classified in two categories. The first one uses the neural network in orderto replace or help a process of a traditional method. In the second category the neuralnetworks completely substitute the conventional methods and approximates the PDE byitself.

3.1 Incorporating Neural Networks in Conventional

Methods

In the 90s multiple papers were published that explored the benefit of neural networksfor solving differential equations. Until 1998 most of the work was restricted to theapproach of using a neural network in conjunction with other classical methods like FEor FD to solve the PDE. The task of the network was to accelerate certain parts of theconventional methods, e.g. mapping the solution of the system of algebraic equationsto a Hopfield neural network and minimizing its energy function to obtain a solution[LK90, WM90, YZ96].

For Ordinary Differential Equations (ODE) another approach was developed whichis based on the fact that the ODE can be solved by using certain splines as basis func-tions. By solving a system of linear and nonlinear equations the coefficients of the basisfunctions can be acquired. This form is then mapped to a Feed Forward Artificial Neu-ral Network (FFANN) by using a sum of piecewise activation functions instead of each

14 3.2. Solving Partial Differential Equations with Neural Networks

spline [MF94a, MF94b]. A drawback of this method is the need for many network pa-rameters, since in general the method requires many splines, and it is difficult to extendthe method to multidimensional domains.

3.2 Solving Partial Differential Equations with Neu-

ral Networks

In 1998 Isaac Lagaris, Aristidis Likas and Dimitrios Fotiadis introduced an approachof solving ODEs and PDEs by completely replacing the conventional methods withneural networks [LLF98] which we also adapted in this work. The method relies onthe capability of FFANN to approximate functions. The solution is constructed in sucha way that it has a differentiable and closed-analytic form and is then employed on aDeep Neural Network (DNN) to approximate the unknown function. The function of themodel consists of two parts. The first part satisfies the initial and boundary constraintsand can not be adjusted, since it has no parameters. The second part consists of theactual DNN that approximates the unknown function on the rest of the domain [LLF98].In summary, their research adopted constraint learning to incorporate physical knowledge[SE16].

They found that using machine learning gave multiple advantages. The first one wasthat the learned solution given by the neural network has a differentiable, closed analyticform, that can be used for any further calculations. Furthermore, the proposed approachis general and can be applied to various ODEs and PDEs. In contrast to methods thatconsider local function around each grid point, the increase of the dimensionality is notsuch a serious problem for the authors, since only the number of training points increasesinstead of the number of parameters [LLF98]. Also, the adoption of parallel architecturescan provide more efficiency. However, one of the biggest obstacles previously has beenthe limitations of hardware, which has improved vastly in performance and availabilityin recent years.

However, for highly irregular or mixed boundary conditions the proposed approachwas not easy to reconstruct. Therefore, methods were developed to handle such boundaryconditions by letting the boundary condition be calculated by a distance function (orlength factor [LLP00]) and then mapping the distance function to an ANN to be in adifferentiable form [BN18, NM18, LLP00].

Other authors used the approach as a stepping stone and developed a similar butdifferent approach. Instead of adding the boundary conditions into the model function,the model function consists only of the DNN with the same task as before [RPK17b,NM18]. To satisfy the boundary conditions, the boundary constraint is added to the errorfunction as an extra weighted term that is summed to the rest of the error function. In[NM18] the variational form [EHJ17] of the PDE was also utilized as the equation lossand provided another option to define the error function.

The authors of [RPK17b, RPK17c] called their approach physics informed deep learn-

Chapter 3. Related Work 15

ing. They incorporated PDE models and data observation to estimate DNNs and solvePDEs in one and two spatial dimensions. By taking the assumption that physical dynam-ics can be modeled by the PDEs as prior knowledge, physical models can be approximatedfrom limited data.

Another approach for solving quasilinear PDEs, which can be portrayed as forward-backward stochastic differential equations (FBSDE), has been developed in the recentyears [BEJ17, EHJ17, FTT17]. The focus of the algorithm lies on approximating theresult of the PDE in a single point, in contrast to for example [SS18], where the DNNcan accurately approximate the solution on the whole domain (spatial and temporal)and does not rely on a FBSDE representation to exist.

With the diversity in network architectures, convolutional neural networks (CNN)were employed to acquire a numerical solution to the Navier-Strokes equation by solv-ing a large sparse linear system with the CNN [TSSP16]. Convolutional autoencoderswere also employed in recent works. In [ZZKP19b] the authors adopted this approachto handle input data in form of image-like data instead of data computed from an an-alytical formula. They noticed, that the CNN learns the intrinsic values better than afully connected neural network (FCNN) and also build a probabilistic surrogate modelto indicate the predictive uncertainty of the model. Especially when PDEs have multi-ple solutions, the probabilistic surrogate model can capture and show the distributionover the solutions [ZZKP19b]. Another work utilized the convolutional autoencoders toreduce the dimensionality of the input with an encoder and then learning the featuredynamics of a fluid system on the low-dimensional domain with a recurrent networkarchitecture [GB18].

In [SS18] the authors avoid creating a fixed mesh by randomly sampling points inspace and time in the given domain and thus creating a mesh free method. They im-plemented a Monte Carlo method to calculate the second order derivatives in order toreduce the computational expense for high dimensions. Furthermore, a proof for theconvergence of the error was constructed in order to have a theoretical guarantee that anetwork with sufficient hidden units can accurately approximate a PDE.

Other studies based their approach on the Ritz method. Their approach is also mesh-free and they use neural networks as surrogate functions in context of the Ritz method[EY17, WZ20]. For that, the PDE gets transformed into a variational problem in orderto attain a lower order function. This reduces the complexity of the function and leadsto faster but still accurate results.

16 3.2. Solving Partial Differential Equations with Neural Networks

Chapter 4. Methods 17

Chapter 4

Methods

4.1 Deep Neural Network Implementation

4.1.1 Network Structure

In our experiments the network was a fully-connected, feed-forward neural network asdescribed in 2.2.1. If not specifically stated in the experiment the network applies theactivation function Tanh. As for the loss function, we utilized the Mean Squared Error(MSE) in all experiments. The amount of layers and the layer’s size is specified in thedescription of the experiment, since they vary depending on the problem setup. In someexperiments, we used a high amount of layers (8 >) which resulted in the slower trainingdue to the vanishing gradient problem [GBC16]. To circumvent the problem, residualconnections were added to the network structure [HZRS15]. A residual connection addsthe output of the hidden layer li to the input of the hidden layer li+m, where m is oftentwo or three as seen in 4.1. Blocks of the network using the residual connection thatskips m layers can be defined as a function with the map ri : Rq −→ Rq

ri(xi) = (li+m · · · li(xi)) + xi, xi ∈ Rq (4.1)

where xi is the output vector of the previous residual block function ri−1 and li, . . . li+mare the hidden layer functions described in 2.2.1. The output of the function A networkwith k − 1 hidden layers can then be described with these blocks by

y(x) = Wk · (rk−m rk−2m · · · rm r0(x)) + bk, x ∈ Rd (4.2)

where x is the input to the network and Wk ∈ Rq×p and bk ∈ Rp are the weight matrixand the bias vector of the final layer, respectively. If the input of the network has adifferent dimension than the output of the residual block, a simple hidden layer can beprovided to transform the dimension. These connections can be used in the trainingprocess to skip layers, since it behaves like an identity function. Therefore, the vanishinggradient problem can be avoided by propagating larger gradients to the initial layers.

18 4.1. Deep Neural Network Implementation

Figure 4.1: A block of the network structure with a residual connection after two hidden layers.

4.1.2 Training the Network

To train the model we adopted the L-BFGS ([LN89a]) optimizer from the scipy.optimizerpackage and the stochastic, gradient-descent based optimizer Adam ([KB15]) fromtensorflow. The L-BFGS optimizer was superior in nearly every experiment since thedatasets never were excessively high and convergence pretty fast. For the error functionthe least squares is implemented.

Getting enough training data to accurately model a function is a typical problem inDL, because, in general, DNNs need a lot of data for the training process. Training datasometimes has to be created first and labeled by humans. Normally, this step is very timeconsuming. Luckily in our case, we can generate as much training data as we want, sincewe have the function given. For discretization of the domain, we could create a meshwith the granularity we want and train on these points. The disadvantage of this methodbecomes very noticeable when the amount of independent variables increases drastically.Sampling from a mesh would yield to many data points and becomes no longer feasible[SS18]. Instead, random points were sampled on the domain with the help of LatinHypercube Sampling method [Ima99] and thus remove the need for a mesh.


4.2 Deep Learning for Partial Differential Equations

In 1998, Isaac Lagaris, Aristidis Likas and Dimitrios Fotiadis (University of Ioannina,Greece) introduced an approach of solving differential equations with the help of ma-chine learning [LLF98]. They found multiple advantages by using the means of machinelearning. The first one was that the learned solution given by the neural network hasa differentiable, closed analytic form, that can be used for any further calculations.Furthermore, the proposed approach is general and can be applied to various ordinarydifferential equations (ODE) and partial differential equations (PDE). In contrast tomethods that consider local function around each grid point, the increase of the dimen-sionality is not such a serious problem for the authors, since only the number of trainingpoints increases instead of the number of parameters [LLF98]. Also, the adoption ofparallel architectures can provide more efficiency but one of the biggest obstacle thenwas the hardware, which has been improved vastly in performance and availability till2019. Therefore, in the recent years more researches picked up the idea of using ML andespecially DL for solving PDEs.

4.2.1 Trial Function and Hard and Soft Assignment

The partial differential equation has the form:

L(t, x;u(t, x)) = f(t, x), t ∈ [0, T ], x ∈ Ω, (4.3)

I(x;u(0, x)) = f(0, x) = a(x), x ∈ Ω, (4.4)

B(t, x;u(t, x)) = f(t, x) = g(t, x), t ∈ [0, T ], x ∈ Γ, (4.5)

in which L(·) is a general differential operator that can consist of spatial derivatives, timederivatives and linear and non-linear terms, x is a vector that defines the position on abounded continuous spatial domain Ω ⊆ RD, Γ is the set of the boundary locations ofthe domain Ω, and t ∈ [0, T ], Furthermore, I(·) and B(·) define the initial and boundarycondition, respectively, and are composed of differential, linear, or non-linear operators.

We want to train a network that finds an approximate numerical solution for u(t, x)and define u = u(x, t, θ) where θ are the parameters of the network.

To acquire the optimal parameters θ, we have to define a cost function. We use theMSE:

EL(θ) =1

2

∫Ω×[0,T ]

|L(t, x; u(t, x; θ))− f(t, x)|2dxdt.

EI(θ) =1

2

∫Ω

|I(x; u(x; θ))− a(x)|2dx.

EB(θ) =1

2

∫Γ×[0,T ]

|B(t, x; u(t, x; θ))− g(t, x)|2dxdt.

For discretization random points were sampled [SS18] to acquire sets of points Ωd,Γd and [0, T ]d, where |Ωd| = Nd, |Γd| = Nb, [0, T ]d = Nt and usually Nb Nd.

20 4.2. Deep Learning for Partial Differential Equations

In its discrete form the error functions become

EL(θ) =1

2

1

Nd

1

Nt

∑xj∈Ωd

∑ti∈[0,T ]

|L(ti, xj; u(ti, xj; θ))− f(ti, xj)|2.

EI(θ) =1

2

1

Nd

∑xi∈Ωd

|I(xi; u(xi; θ))− a(xi)|2.

EB(θ) =1

2

1

Nb

1

Nt

∑xj∈Γd

∑ti∈[0,T ]

|L(ti, xj; u(ti, xj; θ))− f(ti, xj)|2.

So, the optimal parameters in θ can be acquired by

θ∗ = arg minθ

(EL(θ)),

s.t.EI(θ) = 0, EB(θ) = 0,

The solution to the differential equation defined in (4.3) can be reduced to an opti-mization problem and the initial (4.4) and boundary (4.5) equations can be seen asconstraints. In the experiments, the constraint optimization problem was modified toan unconstrained optimization problem. We distinguish two possible options to achievethis, hard assignment and soft assignment. For both methods the initial and bound-ary constraints are added in the function form and loss function respectively, s.t. theconstraints are taken into account. For the soft assignment the constraint equations aresimply added to the loss function and act as extra penalty terms. It is easy to implement,because only the loss function changes.

us(t, x; θ) := u(t, x; θ) (4.6)

Es(θ) = EL(θ) + λ1EI(θ) + λ2EB(θ) (4.7)

θ∗ = arg minθEs(θ)

where λ1 and λ2 are used to give the constraints more relative importance, and u is thesolution from the feed-forward fully-connected deep residual network. The drawbacks ofthis method can be that it is uncertain how to set and tune the values for λ1 and λ2, andit can also not be insured that the initial and boundary conditions will be completelysatisfied in the solution.

In the hard assignment, the function form of the solution is adapted in such a waythat the initial and boundary condition are always satisfied.

uh(t, x; θ) = G(t, x) +D(t, x)u(t, x; θ) (4.8)

Eh(θ) = EL(θ), (4.9)

θ∗ = arg minθ

(Eh(θ)),


in which the function G(t, x) satisfies the initial and boundary conditions but has notune-able parameter and D(·) is adjusted in such a way that u does not contribute to theinitial and boundary conditions. The advantage is a higher robustness of the solution incontrast to the soft assignment. But the creation of the constraint-aware function canbe more difficult for irregular boundaries.

Irregular Boundaries

For brevity we remove the time parameter in the functions, since the focus is on theirregular boundary data. The hard assignment method described in (4.8) requires G(x)to be defined in Ω to have at least the same order of continuous derivatives as thedifferential operator L and to fulfill following equation:

|G(x)− g(x)| ≤ ε, ∀x ∈ Γ

The exact form is not important as long as these requirements are satisfied [BN18].Since we can easily create boundary data, a fitting approach would be to train a simpleneural network to fit g(x),∀x ∈ Γ. So we denote G(x; θ) to be calculated by a lowcapacity network with the parameters θ. For the optimal parameters the minimizationproblem is defined as follows

θ∗ = arg minθ

1

2

1

Nd

∑xi∈Γd

|G(xi; θ)− g(xi)|2

A distance function that is also smooth for calculating derivatives can be very usefulfor this, since it can be utilized to calculate a distance between points [BN18, Ont20].One way to create such a function is to compute a non-smooth distance function d(x)first and then approximate it by using a simple and low capacity network D(x; θ).

d(x) = minxb∈Ω|x− xb|2

D(x; θ) doesn’t need to have a certain form, it just has to be smooth and satisfy

|D(x; θ)| < d(x)− ε, ∀x ∈ Γ

For the optimal parameters we have the following minimization problem

θ∗ = arg minθ

1

2

1

Nd + |ωd|∑

xi∈Γd∪ωd

|D(xi; θ)− d(xi)|2,

in which ωd ⊂ Ωd s.t. ωd +Nb Nd.

One can also employ a stricter distance function like

d(x) =

0, x ∈ Γ,

1, otherwise,

22 4.2. Deep Learning for Partial Differential Equations

but experiments showed a worse accuracy and convergence of the solutions in contrastto the other mentioned distance function [BN18].

To give an overview algorithm 4.1 summarizes the method with step-by-step fash-ion. The while loop can also be stopped by setting an other stopping criteria than themaximum number of iteration.

Define the network architecture and parameters (e.g. number of layers, size ofhidden layers, learning rate, activation functions)

Initialize network parameters θ(0)

Select hard or soft assignment method m ∈ h, sif m=h then

Form model function uh as described in (4.8)Form loss function Eh as described in (4.9)

elseForm loss function Es as described in (4.7)

endSpecify maximum number of iterations and set i = 0while i¡max iterations do

if m=h thenSample n random points from [0, T ]× Ω

Calculate the loss function Eh(θ(i))

Take a descent step θ(i+1) = θ(i) − η(i)∇θEh(θ(i))

elseSample n random points from [0, T ]× ΩSample m random points from Γ for the boundary termSample l random points from Ω for the initial term

Calculate the loss function Es(θ(i))

Take a descent step θ(i+1) = θ(i) − η(i)∇θEs(θ(i))

endi = i+ 1

end

Algorithm 4.1: Algorithm for weak and hard assignment of boundary and initialcondition

Chapter 5. Experiments 23

Chapter 5

Experiments

For all experiments the networks are implemented using TensorFlow [AAB+16], whichis one of the most popular open source libraries for machine learning operations andcomputations. Model training took place on a Nvidia Titan Xp GPU card.

5.1 Transport Equation

We start our experiments with a linear, first-order transport equation that has the form

ut + cux = 0,

where c > 0 is fixed constant. The name of the equation stems from the fact that it canbe used to model the transport of matter with the velocity c. Since the equation is timedependent it has an initial condition.

Consider the following problem statement

ut + uxx = 0, x ∈ Ω, t ∈ T,

u(x, t) =1

1 + (x− 2t)2, x ∈ Γ, t ∈ T,

u(x, 0) =1

1 + x2, x ∈ Ω,

where Ω = [0, 2] is the one-dimensional spatial domain with Dirichlet boundary condi-tions and T = [0, 1] is the temporal domain. It can be easily verified that the analyticalsolution is

u(x, t) =1

1 + (x− 2t)2, x ∈ Ω, t ∈ T.

The weak assignment method was applied to this equation. Therefore, 10000 pointswere randomly sampled on the domain, 500 points were sampled on the boundary and500 points were sampled for the initial condition as training data for EL(θ), EI(θ) and

24 5.2. Wave Equation

EB(θ) in 4.7, respectively. The networks had 4 layers with 10 neurons each which wasmore then sufficient for the equation. The network was trained with the L − BFGSoptimizer in under one minute and the L2 error amounted to 4.159934e− 04. Figure 5.1shows the the solution of the network and the exact solution for three timestamps. Italso displays how the substance is carried from left to right.

Figure 5.1: Network solution of the transport equation for different timestamps

The figures show the network solution as the yellow dashed line and the exact solution as theblue line. The left figure is at the timestamp at t = 0.25, the middle figure at the timestamp

t = 0.5 and the right one at timestamp t = 0.75.

5.2 Wave Equation

We start our experiments with a linear, second-order, hyperbolic wave equation thatdescribes a wave. The problem statement is designed rather simple to show the approx-imation accuracy of the DNN for such problems.

We examine the following problem statement:

utt − uxx = 0, x ∈ Ω, t ∈ T,u(x, t) = sin(−2πt), x ∈ Γ, t ∈ T,

u(x, 0) = sin(2πx), x ∈ Ω,

ut(x, 0) = −2πcos(2πx), x ∈ Ω,

(5.1)

where Ω = [0, 1] is the one-dimensional spatial domain and T = [0, 1] is the temporaldomain. Furthermore, one can see that Dirichlet boundary conditions are used. Theanalytical solution of this statement is

u(x, t) = sin(2π(x− t)), x ∈ Ω, t ∈ T

which can be easily proven

utt = 4π2 · −sin(2π(x− t)),uxx = 4π2 · −sin(2π(x− t)).


10000 collocation points are randomly sampled on the domain as learning data. Al-though the solution to the equation is pretty simple, a second initial condition has to beassigned to the network. We learned the boundary and initial condition with the weakassignment method and adapted the error function in 4.7 to

Es(θ) = EL(θ) + λ1EI(θ) + λ2EB(θ) + λ3EIt(θ),

where EIt(θ) is the penalty term for the second initial condition. λ1, λ2 and λ3 were set to100. For that, we sample 500 points on the boundary and 500 on each initial condition.The network consists of three hidden layers with 10 neurons each. Furthermore, theL-BFGSoptimizer is used and no residual connections are implemented, because ourgenerated data set and the network structure is not overly large [LN89a].

In listing 5.1 one can see simplicity of the code structure and how the network func-tions are built. With the help of automatic differentiation [BPRS15] (see tf.gradients)the function to calculate the right side of the equation 5.1 can be built without bigchallenges.

Listing 5.1: Defining network function

def net_u(self , x, t):

u = self.neural_net(tf.concat ([x, t], 1), \

self.weights , self.biases)

return u

def net_f(self , x, t):

u = self.net_u(x, t)

u_x = tf.gradients(u, x)[0]

u_t = tf.gradients(u, t)[0]

u_xx = tf.gradients(u_x , x)[0]

u_tt = tf.gradients(u_t , t)[0]

f = u_tt - u_xx

return f

def net_u_t(self , x, t):

u = self.net_u(x, t)

u_t = tf.gradients(u, t)[0]

return u_t

Due to the simplicity of the PDE, the training went pretty fast (under 1000 iterations)and the DNN learns to approximate the PDE with a L2 error of 4.253207e− 04 in undera minute. In Figure 5.2 the prediction and the error relative to the exact solution isdisplayed. We won’t elaborate further for this experiment, as there are more interestingPDEs.

26 5.3. Poisson Equation

Figure 5.2: Network solution of the traveling wave equation

Top left shows the prediction of the given wave equation for on a 100× 100 grid. Top right isthe exact solution on the same grid. Bottom left shows the error between these two and

bottom right the loss function during the training.

5.3 Poisson Equation

Next up we want to look at a linear, elliptic, second-order PDE, namely the famousPoisson equation with the general form

−∆u = f.

This PDE is frequently used in theoretical physics and can, for example, describe thepotential field of an electric charge.


5.3.1 Poisson Equation in 2D

The problem states as follows:

−∆u = 0.5π2sin(πx+ 1

2)sin(π

y + 1

2), (x, y) ∈ Ω,

u(x, y) = 0, (x, y) ∈ Γ,

where Ω = [0, 1]2 is the two-dimensional spatial domain with Dirichlet boundary condi-tions. The analytical solution of this statement is

u(x, y) = sin(πx+ 1

2)sin(π

y + 1

2), (x, y) ∈ Ω,

which can be easily proven.

The boundary conditions and the domain specification are not very complicated,which makes it attractive to use the hard assignment of boundary condition specified in4.8. The network function can then be easily created as follows

uh(x, y; θ) = x(1− x)y(1− y)u(x, y; θ),

where θ are the network parameters and u is the solution from the DNN. uh is alwayszero when a point on the boundary is evaluated.

For the training, 10000 points were randomly sampled from the domain. Boundarytraining data wasn’t needed due to the hard assignment of the boundary condition. Thenetwork consists of four layers each with 10 neurons. The L-BFGS optimizer showedfaster results and the same accuracy as the Adam optimizer.

Accuracy and speed are similar to the previous experiment. For such a simple functionthe DNN only needed around 250 iteration and reached an L2 error of 1.970306e − 05.In figure 5.3 one can see the solution of the DNN compared the exact solution and alsothe loss per iteration during the training. This further proves that simple PDEs can beaccurately and efficiently approximated by DNNs. Figure 5.4 illustrates the solution onthe two-dimensional domain.

28 5.3. Poisson Equation

Figure 5.3: Network solution of the Poisson equation

Top left shows the prediction of the given Poisson equation for on a 100× 100 grid. Top rightis the exact solution on the same grid. Bottom left the error between these two and bottom

right shows the loss function during the training.

(a) Network solution

Figure 5.4: 3D network solution of the Poisson equation


5.3.2 Poisson Equation in High Dimension

The use of Deep Learning in computer vision and artificial intelligence tasks shows thatDNNs are powerful and accurate in high dimensions. In this experiment, the dimensionsof the domain are higher in contrast to the other experiments to test the performance ofour approach.

The problem states as follows:

−∆u = 0, x ∈ Ω,

u(x) =5∑i=1

x2i−1x2k, x ∈ Γ,

where Ω = [0, 1]10 is the ten-dimensional spatial domain. As one can see Dirichletboundary conditions are used. The results of our DNN are compared to the analyticalsolution

u(x) =5∑i=1

x2i−1x2k, x ∈ Ω.

Fore the precise network structure, six hidden, fully-connected layers were used witha residual connection after every two hidden layers. Each layer has a size of 20 neurons.We randomly sampled 100000 points from the domain for the training step. The L-BFGS optimizer showed good results in 20 minutes of training. The L2 error amountsto 1.377442e − 03. Figure 5.5 shows the error during the training process. The resultsshow that at least for the simpler high dimensional PDE the amount of training pointsdoes not increase exponentially with higher dimensions and that the DNN still predictsthe PDE accurately.

5.4 Heat Equation

Next up we want to look at the last type of second-order PDEs, a parabolic PDE.The heat equation is a popular parabolic PDE and can be physically interpreted as theevolution in time of the density of heat or any other quantity [Eva10]. As the namesuggests, the equation can be used to model the diffusion of heat in a domain. Thedifficulty of the problem statement is similar to the traveling wave equation but we adda second spatial dimension and a different domain shape.

We consider the three dimensional heat equation PDE on the domain Ω× [0, T ] withDirichlet boundary conditions

∂u

∂t−∆u = f, t ∈ [0, T ], x ∈ Ω

u = uD, t ∈ [0, T ], x ∈ Γ

u|t=0 = u0, x ∈ Ω

(5.2)

30 5.4. Heat Equation

Figure 5.5: Loss rate during training of the high dimension Poisson equation

and choose

u(x, t) = e−4π2t cos(2πx1) cos(2πx2).

It follows:∂u(x, t)

∂t= −4π2e−4π2t cos(2πx1) cos(2πx2)

−∆(x, t) = 8π2e−4π2t cos(2πx1) cos(2πx2)

In that case we get the following partial differential equation from (5.2)

∂u

∂t−∆u = 4π2e−4π2t cos(2πx1) cos(2πx2), t ∈ [0, T ], x ∈ Ω

u = e−4π2t cos(2πx1) cos(2πx2), t ∈ [0, T ], x ∈ Γ

u|t=0 = cos(2πx1) cos(2πx2), x ∈ Ω

(5.3)

For the implementation we choose T = 0.1 and Ω is a circle, centered in the origin, withradius r = 1.


Our goal is to approximate u(x, t) with a DNN including the information we havefrom the equations in (5.3).

First, we randomly sample points for f , ud and u0, which are made use of for training.For the error function we use the mean squared error.

For the soft assignment we choose the following

u(x, t) = uDNN(x, t; θ)

E = Ef + λ1EuD + λ2Eu0

where uDNN is the network that approximates u. λ1 and λ2 are adjustable weights forthe initial and boundary constraints and

Ef =1

Nf

Nf∑i=1

|fDNN(xif , tif ; θ)− f(xif , t

if )|2

where fDNN =∂uDNN∂t

−∆uDNN

(5.4)

EuD =1

ND

ND∑i=1

|uDNN(xiD, tiD; θ)− uiD|2

Eu0 =1

N0

N0∑i=1

|uDNN(xi0, 0; θ)− ui0|2.

xif , tifNf

i are the sampled points for the whole domain, xiD, tiD, uiDNDi are the sam-

pled boundary points and xiD, ui0N0i are the sampled initial points. So, Ef imposes the

structure of the PDE and ED and E0 impose the boundary and initial structure of thePDE. For all executions the number of training data ND and N0 is relatively small incontrast to Nf . In fact, ND = 9000, N0 = 1000 and Nf = 1000000. The weights forthe boundary had to be set relatively high with λ1 = 1000 and λ2 = 1000 to accuratelysatisfy the initial boundary conditions. For optimization we choose the L-BFGS algo-rithm [LN89b]. The results of the experiment in Figure 5.6 show that we accuratelypredict the solution of the well-posed PDE with a unique solution given a sufficientlyexpressive network and sufficient amount of training data. In contrast to the secondexperiment the boundary condition is not completely satisfied, since a small error stillexists. To achieve these results the DNN had 12 hidden layers each with 100 neuronsand used the hyperbolic tangent activation function. For the error function we used thestandard mean squared error loss. 1000000 collocation points were sampled from thedomain Ω× [0, T ] as training data using the Latin Hypercube Sampling method [Ima99].The training points for the boundary penalty term were uniformly sampled along theboundary.

32 5.4. Heat Equation

(a) Network solution for t = 0(b) Error to exact solution for t = 0. Therelative L2 error is 5.0948 · 10−3.

(c) Network solution for t = 4.040 · 10−3 (d) Error to exact solution for t = 4.040 · 10−3.The relative L2 error is 5.4676 · 10−3.

(e) Network solution for t = 8.9898 · 10−3 (f) Error to exact solution for t = 8.9898 ·10−3.The relative L2 error is 3.2519 · 10−3.

Figure 5.6: Solutions of the heat equation for different timesteps

1000000 collocation points, randomly sampled, were used for training. Left side: The networksolutions for the different timesteps t on a 100× 100 grid. Right side: The error between the

network solution and the exact solution.


5.5 Burgers’ Equation

The Burgers’ equation is a quasi-linear, PDE used in numerous fields like fluid mechanicsor nonlinear acoustics [BDH+86] with the general form

ut + uux − vuxx = 0,

where v is a term that describes the viscosity. For a small v the equation leads to asawtooth wave at the origin which can be difficult for traditional numerical methods tosolve.

In our experiment the test problem states as follows

−ut + uux − 0.01πuxx = 0, x ∈ Ω, t ∈ [0, T ]

u(x, t) = 0, x ∈ Γ, t ∈ [0, T ]

u(x, 0) = −sin(πx), x ∈ Ω,

where Ω = [−1, 1] is the one-dimensional spatial domain, [0, T ] with T = 1 is the tem-poral domain and v = 0.01π. Dirichlet boundary conditions are utilized. The analyticalsolution is not so trivial and can be found here [BDH+86].

Weak assignment was used for this experiment for which we refer to 4.6 and 4.7for the general form or to the heat equation for a practical example. λ1 and λ2 wereset to one. The network contained nine layers with 20 neurons each. The L BFGSoptimizer was used to train the network. 10000 points were randomly sampled acrossthe domain and 200 points as boundary and initial data. With these parameters wegot accurate results in under a minute which are summarized in figure 6.1. As you cansee in the figure, the equation led to a sharp change separation in values at t = 0.4.As expected the error of the prediction is higher in the intricate region than in the restof the domain. Nevertheless, the DNN could express the complicated behavior of theequation and achieved a L2 error of 2.164368e− 03.

For a better visualization, we included figure 5.8 which shows the solution of theBurgurs’ equation at different timestamps. In the timestamps, one can clearly see howthe equation develops into a sawtooth wave with increasing time. This behaviour is whatmakes the equation so complicated to approximate.

34 5.5. Burgers’ Equation

Figure 5.7: Network solution of the Burgurs’ equation




Figure 5.8: Network solution of the Burgurs’ equation at different timestamps

The figures show the network solution as the yellow dashed line and the exact solution as theblue line. The left figure is at the timestamp at t = 0.25, the middle figure at the timestamp

t = 0.4 and the right one at timestamp t = 0.6.

36 5.6. Max-PDE

5.6 Max-PDE

The Max-PDE is a semilinear, non-smooth, parameter dependent, elliptic µPDE withthe general form

−∇(c(µ)∇u) + a(µ)max(0, u) = f(µ).

A nonlinear and non-smooth max term is inherent in the PDE which leads to moredifficulties during the calculation when using standard methods. Since the max termis not differentiable, certain methods like the standard Newton method cannot be ap-plied [Ber19]. For a fixed µ the existence and uniqueness of the solution is proven and,therefore, is a well-posed problem.

5.6.1 Coupled Regions

Consider the following parametrized, semi-linear, elliptic boundary value problem:

−∆u+ 8π2µmax(0, u) = f(µ), x ∈ Ω

u = 0, x ∈ Γ

where Ω = [0, 1]2 is the two-dimensional spatial domain, µ ∈ [µmin, µmax] with µmin = 0and µmax = 4. We define the following subregions

u(x, µ) =

0, x ∈ [0, 1

2]× [0, 1

2]

b1(µ)h(x), x ∈ [12, 1]× [0, 1

2]

b2(µ)h(x), x ∈ [12, 1]× [1

2, 1]

b3(µ)h(x), x ∈ [0, 12]× [1

2, 1]

where

h(x) = sin2(2πx1)sin2(2πx2),

b1(µ) = 1− 2(min(max(3µ− µmin

µmax − µmin, 2), 3)− 2),

b2(µ) = 1 + 2(min(max(3µ− µmin

µmax − µmin, 1), 2)− 1),

b3(µ) = 1− 2min(max(3µ− µmin

µmax − µmin, 0), 1).

(5.5)

The exact solution follows

f(x, µ) =

0, x ∈ [0, 1

2]× [0, 1

2]

−b1(µ)∆h(x) + 8π2µmax(0, b1(µ)h(x)), x ∈ [12, 1]× [0, 1

2]

−b2(µ)∆h(x) + 8π2µmax(0, b2(µ)h(x)), x ∈ [12, 1]× [1

2, 1]

−b3(µ)∆h(x) + 8π2µmax(0, b3(µ)h(x)), x ∈ [0, 12]× [1

2, 1]


with

∆h(x) = −8π2(2sin2(2πx1)sin2(2πx2)− cos2(2πx1)sin2(2πx2)− sin2(2πx1)cos2(2πx2)).

Since the domain Ω is a square with side length of 1 and u(x, µ) = 0 for all x ∈ Γ, thehard assignment is a suitable approach. We can easily build a trial function as describedin (4.8)

u(x) = G(x) +D(x)uDNN(x; θ)

G(x) = 0,

D(x) = x1(1− x1)x2(1− x2).

(5.6)

Since the initial and boundary conditions are already satisfied in (5.6), the error functionis

E = Ef ,

where Ef is the same as in (5.4).

We proceed by approximating u(x, µ) for a fixed µ. For optimization we choose theL-BFGS algorithm [LN89b], because our data set is not overly large. It is a full-batchedgradient based optimization algorithm, that belongs in the family of the quasi-Newtonmethods. The results of the experiment show that we accurately predict the solution ofthe well-posed PDE with a unique solution given a sufficiently expressive network andsufficient amount of training data.

In the following Listing 5.2 the code structure of the network function is shown again.In this listing, we want to emphasize the hard assignment of the boundary condition thatcan be seen in the net u function. The result of this function will always be zero whenpoints of the boundary are evaluated.

38 5.6. Max-PDE

Hidden layers/Neurons 10 20 402 6.262881e-02 2.197335e-02 1.234320e-024 9.415829e-03 6.850964e-03 3.327710e-038 3.329636e-03 2.351025e-03 1.400220e-03

Table 5.1: The development of the relative L2 error for varying amounts of hidden layers andneurons per layer. 20000 collocation points were used for training.

Listing 5.2: Defining network function

def net_u(self , x, y):

u = x*(1-x)*y*(1-y)*\

self.neural_net(tf.concat ([x, y], 1),

self.weights , self.biases)

return u

def net_f(self , x, y):

u = self.net_u(x, y)

u_x = tf.gradients(u, x)[0]

u_y = tf.gradients(u, y)[0]

u_xx = tf.gradients(u_x , x)[0]

u_yy = tf.gradients(u_y , y)[0]

f = -u_xx - u_yy + 8*pi**2* self.mu*tf.maximum(u,0)

return f

In Figure 5.9 we can see the approximated solution of the DNN for the max-PDE.To achieve these results the DNN had 9 hidden layers each with 40 neurons and used thehyperbolic tangent activation function. For the error function we used the standard meansquared error loss. 20000 collocation points were sampled from the domain Ω as trainingdata using a random sampling method [Ima99]. The relative L2 error is 2.4768 ·10−3 andis on par with the accuracy of the FE method with a L2 error of 1.1473 ·10−3 (comparedon a 100 × 100 grid). The results confirm our assumption that the DNN can learn theunderlying PDE with respect to its boundary condition.

To confirm our presumption that most of the time an increase in the number of thehidden layers and in the number of neurons leads to a better accuracy, we evaluatedthe experiment on varying architectures. In table 5.1 we can see how the relative L2

error develops. As expected the accuracy increases as the network capacity increases.Furthermore, table 5.2 shows how the L2 error develops if the amount of collocationpoints increases.

Next, we wanted to train a neural network to learn the solution of the max-PDE overthe whole range of µ ∈ [µmin, µmax] with µmin = 0 and µmax = 4. For training 1000000collocation points were used, again sampled with the Latin Hypercube Sampling strategy[Ima99]. The network was extended to 10 hidden layers, each with 50 neurons, becausewe need a more expressive network than before to handle the additional variable. Since


(a) Network solution (b) Exact solution

(c) Error between exact and network solution (d) Error between exact and FE solution

Figure 5.9: Solution of the coupled max-PDE for a fixed µ = 1

(a) The predicted solution of the DNN for µ = 1. 20000 collocation points, sampled with theLatin Hypercube Sampling strategy, were used for training. (b) The exact solution. (c) Errorbetween the network solution and the exact solution. The relative L2 error is 2.4768 · 10−3.

(d) Error mask between the FE solution and the exact solution. The relative L2 error is1.1473 · 10−3

Collocation points 2000 5000 10000 15000 20000Relative L2 error 4.6219e-03 3.4260e-03 3.5837e-03 3.0221e-03 2.3363e-03

Table 5.2: The development of the relative L2 error for varying amounts of collocation points.8 hidden layer were used each with 20 neurons.

40 5.6. Max-PDE

this is more of a brute-force approach, the number of collocation points increased bya huge amount for this particular problem. For higher dimensions different approachescould be used for sampling or optimization that reduce the negative effect of the curseof dimensionality even further. Figure 5.10 shows the results of the network solutionand its comparison to the exact solution. The resulting relative L2 error is similar tothe error of the network trained on a fixed µ and sometimes even better. The averagerelative L2 error is 3.2009 · 10−3

5.6.2 Decoupled Regions

We take the same PDE and decouple the subregions as defined above, so that differentµ change the values of different subregions [Ber19]. Lets take a look at the followingequation

−∆u+ 8π2(µ1 + µ2)max(0, u) = f(µ1, µ2), x ∈ Ω,

u = 0, x ∈ Γ,

where Ω is again [0, 1]2 and µ1 and µ2 are in [0, 1] and are each regulating a subregion.To achieve this behavior we define the following

u(x, µ) =

0, x ∈ [0, 1

2]× [0, 1]

b1(µ)h(x), x ∈ [12, 1]× [0, 1

2]

b2(µ)h(x), x ∈ [12, 1]× [1

2, 1]

where

h(x) = sin2(2πx1)sin2(2πx2),

b1(µ1) = −1 + 2µ1 − µ1,min

µ1,max − µ1,min

,

b2(µ2) = 1− 2µ2 − µ2,min

µ2,max − µ2,min

.

(5.7)

µ1 adjusts the region [0, 12]× [1

2, 1] with µ1,min = 0, µ1,max = 1 and µ2 adjusts the region

[12, 1]× [1

2, 1] with µ2,min = 0, µ2,max = 1.

Out of brevity, we leave the rest of the of the problem statement to the reader sinceit is analog to the example above. The important part is how the accurate and efficientthe DNN solves the changed PDE with an extra parameter. The implementation detailswere kept the same.

First, both µ were set to a fixed constant in the given range which the DNN thentried to learn. To our luck the same network structure also led to a similar accurateand fast result as depicted in figure 5.11. The figure shows how only one region of thedomain changes when only mu2 is changed and mu1 is kept fixed. The L2 error is around3.4 · 10−3 and changes sligthly for various values of µ.

After training for fixed µ, we trained the network over the whole range of both µ. Incontrast to the experiment with the coupled regions, only 100000 points were needed toacquire a good accuracy. The results for individual some µ can be seen in figure 5.12.


(a) Network solution for µ = 0(b) Error to exact solution for µ = 0. Therelative L2 error is 5.0948 · 10−3.

(c) Network solution for µ = 1.6161(d) Error to exact solution for µ = 1.6161. Therelative L2 error is 1.4356 · 10−3.

(e) Network solution for µ = 3.5959(f) Error to exact solution for µ = 3.5959. Therelative L2 error is 7.2310 · 10−4.

Figure 5.10: Solutions of the coupled max-PDE

Shown on a 200× 200 grid. 1000000 collocation points, sampled with the Latin HypercubeSampling strategy, were used for training. Left side: The network solutions for the different

µ. Right side: The error between the network solution and the exact solution.

42 5.6. Max-PDE

(a) Network solution for mu1 = 1 and mu2 = 0 (b) Network solution for mu1 = 1 and mu2 = 0.3

(c) Network solution for mu1 = 1 and mu2 = 0.7 (d) Network solution for mu1 = 1 and mu2 = 1

Figure 5.11: Solution of the decoupled max-PDE for fixed µ

The figures show the The relative L2 error is 1.1473 · 10−3


(a) Network solution for µ1 = 0 and µ2 = 0(b) Error to exact solution for µ1 = 0 and µ2 =0. The relative L2 error is 4.5379 · 10−3.

(c) Network solution for µ1 = 0.1053 and µ2 =0.5789

(d) Error to exact solution for µ1 = 0.1053 andµ2 = 0.5789. The relative L2 error is 2.5030 ·10−3.

(e) Network solution for µ1 = 0.7895 and µ2 =0.8947

(f) Error to exact solution for µ1 = 0.7895 andµ2 = 0.8947. The relative L2 error is 1.3948 ·10−4.

Figure 5.12: Solution of the decoupled max-PDE

Shown on a 200× 200 grid. 100000 collocation points, sampled with the Latin HypercubeSampling strategy, were used for training. Left side: The network solutions for the differentµ1 and µ2. Right side: The error between the network solution and the exact solution.

44 5.7. Discussion

5.7 Discussion

We showed that the proposed methods can accurately approximate multiple PDEs andcan handle different types and varying amounts of dimension without heavy modifica-tions. The boundary and initial conditions can either be explicitly trained or enforcedsuch that the DNN can learn the underlying physical laws of the PDE. Especially withthe mesh-free approach and the capability powers of DNNs, the methods do not sufferas much from scalability issues as classical methods for solving PDE. An encounteredproblem was that we do not have a theoretical guarantee that the DNN converges toa certain accuracy and does not stop in local minima. Nevertheless, our experimentsshow that good prediction accuracy can be achieved given a sufficient DNN architectureand enough training data. Furthermore, the method is relatively simple and fast tosetup, and particularly easy when the domain shape and boundary conditions are nottoo complex. A point of concern is that the reliability and robustness for the method isnot completely clear and also varies for different PDEs. For some experiments the DNNneeded significantly more time for the training than for others despite having similarnetwork structure. It is also not transparent which architecture to choose for a givenPDE and how the number of layers and the layer size should change when, for example,the domain or the PDE gets more complex.


PDENetwork

ParametersTrainingSamples

TrainingDuration

PredictionDuration

RelativeL2 Error

TransportEquation

Four layers withten neurons

each10000 ∼ 3s < 1s 4.1599e-04

WaveEquation

Three layerswith ten

neurons each10000 ∼ 9s < 1s 4.2532e-04

PoissonEquation

2D

Four layers withten neurons

each10000 ∼ 6s < 1s 1.9703e-05

PoissonEquation

HighDimension

Six layers with20 neurons each

100000 ∼ 831s < 1s 1.3774e-03

HeatEquation

Twelve layerswith 100

neurons each1010000 ∼ 721s < 1s 4.6488e-03

Burgurs’Equation

Nine layers with20 neurons each

10000 ∼ 52s < 1s 2.1643e-03

Max PDECoupledSingle

Nine layers with40 neurons each

20000 ∼ 145s < 1s 1.1473e-03

Max PDECoupled

Ten layers with50 neurons each

1000000 ∼ 5338s < 1s 1.1473e-03

Max PDEDecoupled

Single

Eight layerswith 20 neurons

each20000 ∼ 76s < 1s 7.2310e-04

Max PDEDecoupled

Ten layers with50 neurons each

100000 ∼ 579s < 1s 1.1387e-03

Table 5.3: Summary of all experiments with the used network parameters, training samples,training duration, prediction duration (for 100000 points) and the relative L2 Error

46 5.7. Discussion

Chapter 6. Conclusion 47

Chapter 6

Conclusion

Partial differential equations are important in physics, engineering and finance and withrecent advancements deep learning could be an important and helpful approach forsolving these. We showed that the solution of the underlying PDE can be approximatedby a DNN which satisfies the differential operators, the initial condition and the boundarycondition by testing two different methods on different PDEs. Therefore, the DNN canencode the underlying physics of the data set. Instead of creating a mesh that becomesinfeasible in higher dimensions, a mesh free approach was utilized by randomly samplingdata points in time and space. In our work we used two different methods. At first,the hard assignment, where the boundary and initial conditions are always fulfilled byadding an additional penalty term to the model function. This term forces the analyticalsolution to inherently satisfy the conditions. The second one is the soft assignment, wherethe boundary and initial condition are weakly learned by adding an extra penalty termto the loss function.

Multiple advantages can be found for solving a PDE with the shown approaches. (1)The resulting solution (DNN) has a closed analytical form and can be differentiated withrespect to the spatial and temporal variables. Therefore, it provides an easily accessibletool for further calculations and post processing (e.g. sensitivity analysis). (2) Theused approach can make efficient use of good hardware and can easily be parallelized onmultiple GPUs. (3) The loss function for the soft assignment (and sometimes the modelfunction of the hard assignment) is straightforward to construct and does not rely on aconsiderable, problem-dependent setup, making it easy to test and explore new PDEsquickly. (4) The method can be used for many different PDEs since it is described as ageneral approach. (5) The properly trained DNN gives a solution that is valid over thewhole domain and in contrast to some other methods, does not need extra interpolation.

The numerical results in chapter 4 have to be seen in proper context, since thereare many different types and solutions of the PDE (e.g. hyperbolic, parabolic or el-liptic equations with linear, non-monotonic or oscillatory solutions). Therefore, furtherdevelopment is required to investigate the use of DNNs for solving different types ofPDEs.

48

Even though Deep Learning can be a valuable approach for solving Partial DifferentialEquations, the proposed solution should not replace the best classical numerical methods,since they were developed and improved for over 50 years and provide the necessaryrobustness and efficiency standards that are required in practice. This work should showthat DNNs can be used to accurately approximate the underlying PDEs and that bothmethods can coexist with different advantages and disadvantages.

Future Work

It will be necessary to extend the research for other various types of PDEs to makebetter conclusions about the robustness, numerical accuracy and usefulness. Especially inhigher dimensions, other network architectures, optimization algorithms or training dataacquisition methods could be needed. Furthermore, more information has to be gatheredhow the parameter space of the DNN can be better utilized to increase robustness andefficiency. For some PDEs, the problem of getting stuck in local minima or saddle point isapparent and requires further analysis on how to deal with them. To reduce the amountof required training data, a point selection algorithm for lesser known or more complexareas can be adopted by either taking points where the error is high or by quantifyingthe uncertainty of predictions given by the DNN with the help of Gaussian Processes[RPK17a] or probabilistic surrogates [ZZKP19a]. Nonetheless the amount of trainingdata will increase and the need for a more efficient mini-batch optimization algorithmwill arise (e.g. stochastic gradient descent and its modern variants [GBC16]).

Convolutional networks could also help to predict the temporal development of aPDE in a low dimensional latent space [GB18]. Another important point is to makethe computations efficient in high dimensions. Therefore, faster methods for calculatingderivatives can be used and implemented [SS18].

BIBLIOGRAPHY 49

Bibliography

[AAB+16] Martın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, ZhifengChen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, MatthieuDevin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving,Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, ManjunathKudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, DerekMurray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, IlyaSutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasude-van, Fernanda Viegas, Oriol Vinyals, Pete Warden, Martin Wattenberg,Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensorflow: Large-scalemachine learning on heterogeneous distributed systems, 2016.

[ADWF15] Babak Alipanahi, Andrew Delong, Matthew Weirauch, and Brendan Frey.Predicting the sequence specificities of dna- and rna-binding proteins by deeplearning. Nature biotechnology, 33, 07 2015.

[BDH+86] C Basdevant, M Deville, P Haldenwang, J.M Lacroix, J Ouazzani, R Peyret,P Orlandi, and A.T Patera. Spectral and finite difference solutions of theburgers equation. Computers & Fluids, 14(1):23 – 41, 1986.

[BEJ17] Christian Beck, Weinan E, and Arnulf Jentzen. Machine learning approx-imation algorithms for high-dimensional fully nonlinear partial differentialequations and second-order backward stochastic differential equations, 2017.

[Ber19] Marco Bernreuther. Rb-based pde-constrained non-smooth optimization,2019.

[BN18] Jens Berg and Kaj Nystrom. A unified deep artificial neural network ap-proach to partial differential equations in complex geometries. Neurocom-puting, 317:28–41, Nov 2018.

[BPRS15] Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul,and Jeffrey Mark Siskind. Automatic differentiation in machine learning: asurvey, 2015.

50 BIBLIOGRAPHY

[EHJ17] Weinan E, Jiequn Han, and Arnulf Jentzen. Deep learning-based numericalmethods for high-dimensional parabolic partial differential equations andbackward stochastic differential equations, 2017.

[Eva10] Lawrence C. Evans. Partial differential equations. American MathematicalSociety, Providence, R.I., 2010.

[EY17] Weinan E and Bing Yu. The deep ritz method: A deep learning-based nu-merical algorithm for solving variational problems. CoRR, abs/1710.00211,2017.

[FTT17] Masaaki Fujii, Akihiko Takahashi, and Masayuki Takahashi. Asymptoticexpansion as prior knowledge in deep learning method for high dimensionalbsdes, 2017.

[GB18] Francisco J. Gonzalez and Maciej Balajewicz. Deep convolutional recurrentautoencoders for learning low-dimensional feature dynamics of fluid systems,2018.

[GBC16] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MITPress, 2016. http://www.deeplearningbook.org.

[HZRS15] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residuallearning for image recognition. CoRR, abs/1512.03385, 2015.

[Ima99] Ronald Iman. Latin hypercube sampling. 01 1999.

[KB15] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic op-timization. In Yoshua Bengio and Yann LeCun, editors, 3rd InternationalConference on Learning Representations, ICLR 2015, San Diego, CA, USA,May 7-9, 2015, Conference Track Proceedings, 2015.

[KSH12] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classifica-tion with deep convolutional neural networks. In F. Pereira, C. J. C. Burges,L. Bottou, and K. Q. Weinberger, editors, Advances in Neural InformationProcessing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012.

[LBH15] Yann LeCun, Y. Bengio, and Geoffrey Hinton. Deep learning. Nature,521:436–44, 05 2015.

[LK90] Hyuk Lee and In Seok Kang. Neural algorithm for solving differential equa-tions. Journal of Computational Physics, 91:110–131, 11 1990.

[LLF98] I. E. Lagaris, A. Likas, and D. I. Fotiadis. Artificial neural networks forsolving ordinary and partial differential equations. IEEE Transactions onNeural Networks, 9(5):987–1000, Sep. 1998.

http://www.deeplearningbook.org

BIBLIOGRAPHY 51

[LLP00] Isaac Lagaris, Aristidis Likas, and Dimitrios Papageorgiou. Neural-networkmethods for boundary value problems with irregular boundaries. IEEEtransactions on neural networks / a publication of the IEEE Neural Net-works Council, 11:1041–9, 02 2000.

[LN89a] Dong C. Liu and Jorge Nocedal. On the limited memory bfgs method forlarge scale optimization. Math. Program., 45(1–3):503–528, August 1989.

[LN89b] Dong C. Liu and Jorge Nocedal. On the limited memory bfgs method forlarge scale optimization. Mathematical Programming, 45(1):503–528, Aug1989.

[LST15] Brenden M. Lake, Ruslan Salakhutdinov, and Joshua B. Tenenbaum.Human-level concept learning through probabilistic program induction. Sci-ence, 350(6266):1332–1338, 2015.

[MF94a] A.J. Meade and A.A. Fernandez. The numerical solution of linear ordinarydifferential equations by feedforward neural networks. Mathematical andComputer Modelling, 19(12):1 – 25, 1994.

[MF94b] A.J. Meade and A.A. Fernandez. Solution of nonlinear ordinary differen-tial equations by feedforward neural networks. Mathematical and ComputerModelling, 20(9):19 – 44, 1994.

[NM18] Mohammad Amin Nabian and Hadi Meidani. A deep neural network sur-rogate for high-dimensional random partial differential equations. CoRR,abs/1806.02957, 2018.

[Ont20] Santiago Ontanon. An overview of distance and similarity functions forstructured data, 2020.

[PR05] Yehuda Pinchover and Jacob Rubinstein. Introduction, page 1–22. Cam-bridge University Press, 2005.

[RPK17a] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Numericalgaussian processes for time-dependent and non-linear partial differentialequations, 2017.

[RPK17b] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics in-formed deep learning (part i): Data-driven solutions of nonlinear partialdifferential equations. arXiv preprint arXiv:1711.10561, 2017.

[RPK17c] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics in-formed deep learning (part ii): Data-driven discovery of nonlinear partialdifferential equations. arXiv preprint arXiv:1711.10566, 2017.

52 BIBLIOGRAPHY

[SE16] Russell Stewart and Stefano Ermon. Label-free supervision of neural net-works with physics and domain knowledge. CoRR, abs/1609.05566, 2016.

[SFG73] G. Strang, George J. Fix, and D. S. Griffin. An analysis of the finite elementmethod. 1973.

[SS18] Justin Sirignano and Konstantinos Spiliopoulos. Dgm: A deep learningalgorithm for solving partial differential equations. Journal of ComputationalPhysics, 375:1339–1364, Dec 2018.

[TSSP16] Jonathan Tompson, Kristofer Schlachter, Pablo Sprechmann, and Ken Per-lin. Accelerating eulerian fluid simulation with convolutional networks, 2016.

[WM90] L. Wang and Jerry Mendel. Structured trainable networks for matrix alge-bra. In 1990 IJCNN International Joint Conference on Neural Networks,pages 125 – 132 vol.2, 07 1990.

[WZ20] Zhongjian Wang and Zhiwen Zhang. A mesh-free method for interface prob-lems using the deep learning approach. Journal of Computational Physics,400:108963, Jan 2020.

[YZ96] R. Yentis and M. E. Zaghloul. Vlsi implementation of locally connected neu-ral network for solving partial differential equations. IEEE Transactions onCircuits and Systems I: Fundamental Theory and Applications, 43(8):687–690, Aug 1996.

[ZZKP19a] Yinhao Zhu, Nicholas Zabaras, Phaedon-Stelios Koutsourelakis, and ParisPerdikaris. Physics-constrained deep learning for high-dimensional surrogatemodeling and uncertainty quantification without labeled data, 2019.

[ZZKP19b] Yinhao Zhu, Nicholas Zabaras, Phaedon-Stelios Koutsourelakis, and ParisPerdikaris. Physics-constrained deep learning for high-dimensional surrogatemodeling and uncertainty quantification without labeled data. Journal ofComputational Physics, 394:56–81, Oct 2019.

53

Appendix

All experiments were executed and evaluated under Python 3.5.2. The used pythonpackages for the experiments and their versions are shown in the following table:

Package Versionnumpy 1.14.3matplotlib 2.2.2scipy 1.1.0tensorflow-gpu 1.8.0

6.1 Further Experiments

Another experiment on the Poisson equation 5.3 was conducted. This time the domainof the problem statement contains the ”corner singularity” issue [SFG73]. The problemis defined as follows:

−∆u(x, y) = f, (x, y)Ω, u(x, y) = u(r, θ) =√rsin

θ

2, (x, y)Γ,

where Ω = (−1, 1) × (−1, 1)\[0, 1) × 0 is the two dimensional domain with peculiarboundary. The analytical solution is given in spatial coordinates:

u(r, θ) =√rsin

θ

2.

We build the network in similar fashion to the other experiments, but the DNN gotstuck in a local minimum during training when the network was not excessively large.This shows a drawback of our proposed method, since we cannot verify the convergencespeed or accuracy for every PDE. It could be that for some PDEs a different networkarchitecture or optimizer is better suited or that the method needs more refinement towork for a greater variety of equations. Nevertheless, we show the results of the trainingin figure ??.

54 6.1. Further Experiments

Figure 6.1: Results of the Poisson equation with corner singularity



Irregular Boundary

In this experiment we wanted to briefly show an example how irregular boundaries can behandled by the hard assignment. Therefore, we will define the domain and a non-convexboundary as seen in Figure 6.2a).

We calculate a distance function d(x) with the help of 800 collocation points, whichare shown in Figure 6.2b). Now, we need to smooth the distance function by letting aneural network learn the smooth distance function D(x) from d(x). The network hadtwo hidden layers and each layer contains 20 neurons. The result can be seen in Figure6.2c) and should be close to zero on the edges. The form of D(x) is not important but aslong as we can differentiate it we can use it for model function for the hard assignmentin (4.8) This method can be extended to much more complex boundaries as seen in[BN18].

55

(a) Domain with boundary (b) Collocation points used to calculate d(x)

(c) Smoothed distance function by the neural net-work with two hidden layers and 20 neurons perlayer computed on the grid

A Mesh-Free, Physics-Constrained Approach to solve Partial ...

Documents

Transcript of A Mesh-Free, Physics-Constrained Approach to solve Partial ...