Deep learning for universal linear embeddings of nonlinear ... · Deep learning for universal...

14
Deep learning for universal linear embeddings of nonlinear dynamics Bethany Lusch 1,2* , J. Nathan Kutz 1 , and Steven L. Brunton 1,2 1 Department of Applied Mathematics, University of Washington, Seattle, WA 98195, United States 2 Department of Mechanical Engineering, University of Washington, Seattle, WA 98195, United States Abstract Identifying coordinate transformations that make strongly nonlinear dynamics approximately linear is a central challenge in modern dynamical systems. These transfor- mations have the potential to enable prediction, estimation, and control of nonlinear systems using standard linear theory. The Koopman operator has emerged as a leading data-driven embedding, as eigenfunctions of this operator provide intrinsic coordinates that globally linearize the dynamics. However, identifying and representing these eigenfunctions has proven to be mathematically and com- putationally challenging. This work leverages the power of deep learning to discover representations of Koopman eigenfunctions from trajectory data of dynamical systems. Our network is parsimonious and interpretable by con- struction, embedding the dynamics on a low-dimensional manifold parameterized by these eigenfunctions. In par- ticular, we identify nonlinear coordinates on which the dynamics are globally linear using a modified auto-encoder. We also generalize Koopman representations to include a ubiquitous class of systems that exhibit continuous spectra, ranging from the simple pendulum to nonlinear optics and broadband turbulence. Our framework parametrizes the continuous frequency using an auxiliary network, enabling a compact and efficient embedding, while connecting our models to half a century of asymptotics. In this way, we benefit from the power and generality of deep learning, while retaining the physical interpretability of Koopman embeddings. Keywords– Dynamical systems, Koopman theory, machine learning, deep learning. 1 Introduction Nonlinearity is a hallmark feature of complex systems, giv- ing rise to a rich diversity of observed dynamical behav- iors across the physical, biological, and engineering sci- ences [17, 10]. Although computationally tractable, there * Corresponding author ([email protected]). Code: github.com/BethanyL/DeepKoopman exists no general mathematical framework for solving non- linear dynamical systems. Thus representing nonlinear dy- namics in a linear framework is particularly appealing be- cause of powerful and comprehensive techniques for the analysis and control of linear systems [12], which do not readily generalize to nonlinear systems. Koopman operator theory, developed in 1931 [25, 26], has recently emerged as a leading candidate for the systematic linear representation of nonlinear systems [38, 35]. This renewed interest in Koop- man analysis has been driven by a combination of theoreti- cal advances [38, 35, 9, 36, 34], improved numerical methods such as dynamic mode decomposition (DMD) [46, 45, 29], and an increasing abundance of data. Eigenfunctions of the Koopman operator are now widely sought, as they pro- vide intrinsic coordinates that globally linearize nonlinear dynamics. Despite the immense promise of Koopman em- beddings, obtaining representations has proven difficult in all but the simplest systems, and representations are of- ten intractably complex or are the output of uninterpretable black-box optimizations. In this work, we utilize the power of deep learning for flexible and general representations of the Koopman operator, while enforcing a network structure that promotes parsimony and interpretability of the result- ing models. Neural networks (NNs), which form the theoretical ar- chitecture of deep learning, were inspired by the primary visual cortex of cats where neurons are organized in hier- archical layers of cells to process visual stimulus [20]. The first mathematical model of a NN was the neocognitron [13] which has many of the features of modern deep neural net- works (DNNs), including a multi-layer structure, convo- lution, max pooling and nonlinear dynamical nodes. Im- portantly, the universal approximation theorem [11, 18, 19] guarantees that a NN with sufficiently many hidden units and a linear output layer is capable of representing any arbi- trary function, including our desired Koopman eigenfunc- tions. Although NNs have a four-decade history, the anal- ysis of the ImageNet data set [28], containing over 15 mil- lion labeled images in 22,000 categories, provided a water- shed moment [30]. Indeed, powered by the rise of big data and increased computational power, deep learning is result- ing in transformative progress in many data-driven classi- fication and identification tasks [30, 28, 16]. A strength of deep learning is that features of the data are built hierar- chically, which enables the representation of complex func- 1 arXiv:1712.09707v2 [math.DS] 13 Apr 2018

Transcript of Deep learning for universal linear embeddings of nonlinear ... · Deep learning for universal...

Page 1: Deep learning for universal linear embeddings of nonlinear ... · Deep learning for universal linear embeddings of nonlinear dynamics Bethany Lusch 1;2, J. Nathan Kutz , and Steven

Deep learning for universal linear embeddingsof nonlinear dynamics

Bethany Lusch1,2∗, J. Nathan Kutz1, and Steven L. Brunton1,2

1 Department of Applied Mathematics, University of Washington, Seattle, WA 98195, United States2 Department of Mechanical Engineering, University of Washington, Seattle, WA 98195, United States

Abstract

Identifying coordinate transformations that make stronglynonlinear dynamics approximately linear is a centralchallenge in modern dynamical systems. These transfor-mations have the potential to enable prediction, estimation,and control of nonlinear systems using standard lineartheory. The Koopman operator has emerged as a leadingdata-driven embedding, as eigenfunctions of this operatorprovide intrinsic coordinates that globally linearize thedynamics. However, identifying and representing theseeigenfunctions has proven to be mathematically and com-putationally challenging. This work leverages the powerof deep learning to discover representations of Koopmaneigenfunctions from trajectory data of dynamical systems.Our network is parsimonious and interpretable by con-struction, embedding the dynamics on a low-dimensionalmanifold parameterized by these eigenfunctions. In par-ticular, we identify nonlinear coordinates on which thedynamics are globally linear using a modified auto-encoder.We also generalize Koopman representations to include aubiquitous class of systems that exhibit continuous spectra,ranging from the simple pendulum to nonlinear optics andbroadband turbulence. Our framework parametrizes thecontinuous frequency using an auxiliary network, enablinga compact and efficient embedding, while connecting ourmodels to half a century of asymptotics. In this way, webenefit from the power and generality of deep learning,while retaining the physical interpretability of Koopmanembeddings.

Keywords– Dynamical systems, Koopman theory, machinelearning, deep learning.

1 Introduction

Nonlinearity is a hallmark feature of complex systems, giv-ing rise to a rich diversity of observed dynamical behav-iors across the physical, biological, and engineering sci-ences [17, 10]. Although computationally tractable, there

∗ Corresponding author ([email protected]).Code: github.com/BethanyL/DeepKoopman

exists no general mathematical framework for solving non-linear dynamical systems. Thus representing nonlinear dy-namics in a linear framework is particularly appealing be-cause of powerful and comprehensive techniques for theanalysis and control of linear systems [12], which do notreadily generalize to nonlinear systems. Koopman operatortheory, developed in 1931 [25, 26], has recently emerged as aleading candidate for the systematic linear representation ofnonlinear systems [38, 35]. This renewed interest in Koop-man analysis has been driven by a combination of theoreti-cal advances [38, 35, 9, 36, 34], improved numerical methodssuch as dynamic mode decomposition (DMD) [46, 45, 29],and an increasing abundance of data. Eigenfunctions ofthe Koopman operator are now widely sought, as they pro-vide intrinsic coordinates that globally linearize nonlineardynamics. Despite the immense promise of Koopman em-beddings, obtaining representations has proven difficult inall but the simplest systems, and representations are of-ten intractably complex or are the output of uninterpretableblack-box optimizations. In this work, we utilize the powerof deep learning for flexible and general representations ofthe Koopman operator, while enforcing a network structurethat promotes parsimony and interpretability of the result-ing models.

Neural networks (NNs), which form the theoretical ar-chitecture of deep learning, were inspired by the primaryvisual cortex of cats where neurons are organized in hier-archical layers of cells to process visual stimulus [20]. Thefirst mathematical model of a NN was the neocognitron [13]which has many of the features of modern deep neural net-works (DNNs), including a multi-layer structure, convo-lution, max pooling and nonlinear dynamical nodes. Im-portantly, the universal approximation theorem [11, 18, 19]guarantees that a NN with sufficiently many hidden unitsand a linear output layer is capable of representing any arbi-trary function, including our desired Koopman eigenfunc-tions. Although NNs have a four-decade history, the anal-ysis of the ImageNet data set [28], containing over 15 mil-lion labeled images in 22,000 categories, provided a water-shed moment [30]. Indeed, powered by the rise of big dataand increased computational power, deep learning is result-ing in transformative progress in many data-driven classi-fication and identification tasks [30, 28, 16]. A strength ofdeep learning is that features of the data are built hierar-chically, which enables the representation of complex func-

1

arX

iv:1

712.

0970

7v2

[m

ath.

DS]

13

Apr

201

8

Page 2: Deep learning for universal linear embeddings of nonlinear ... · Deep learning for universal linear embeddings of nonlinear dynamics Bethany Lusch 1;2, J. Nathan Kutz , and Steven

tions. Thus, deep learning can accurately fit functions with-out hand-designed features or the user choosing a good ba-sis [16]. However, a current challenge in deep learning re-search is the identification of parsimonious, interpretable,and transferable models [54].

Deep learning has the potential to enable a scaleableand data-driven architecture for the discovery and repre-sentation of Koopman eigenfunctions, providing intrinsiclinear representations of strongly nonlinear systems. Thisapproach alleviates two key challenges in modern dynami-cal systems: 1) equations are often unknown for systems ofinterest [5, 47, 8], as in climate, neuroscience, epidemiology,and finance; and, 2) low-dimensional dynamics are typi-cally embedded in a high-dimensional state space, requir-ing scaleable architectures that discover dynamics on latentvariables. Although NNs have also been used to modeldynamical systems [15] and other physical processes [39]for decades, great strides have been made recently in us-ing DNNs to learn Koopman embeddings, resulting in sev-eral excellent papers [50, 33, 48, 53, 44, 31]. For example,the VAMPnet architecture [50, 33] uses a time-lagged auto-encoder and a custom variational score to identify Koop-man coordinates on an impressive protein folding exam-ple. In all of these recent studies, DNN representations havebeen shown to be more flexible and exhibit higher accuracythan other leading methods on challenging problems.

The focus of this work is on developing DNN repre-sentations of Koopman eigenfunctions that remain inter-pretable and parsimonious, even for high-dimensional andstrongly nonlinear systems. Our approach (See Fig. 1) dif-fers from previous studies, as we are focused specificallyon obtaining parsimonious models that match the intrin-sic low-rank dynamics while avoiding overfitting and re-maining interpretable, thus merging the best of DNN ar-chitectures and Koopman theory. In particular, many dy-namical systems exhibit a continuous eigenvalue spectrum,which confounds low-dimensional representation using ex-isting DNN or Koopman representations. This work de-velops a generalized framework and enforces new con-straints specifically designed to extract the fewest mean-ingful eigenfunctions in an interpretable manner. For sys-tems with continuous spectra, we utilize an augmented net-work to parameterize the linear dynamics on the intrinsiccoordinates, avoiding an infinite asymptotic expansion inharmonic eigenfunctions. Thus, the resulting networks re-main parsimonious, and the few key eigenfunctions are in-terpretable. We demonstrate our deep learning approachto Koopman on several examples designed to illustrate thestrength of the method, while remaining intuitive in termsof classic dynamical systems.

2 Data-driven dynamical systems

To give context to our deep learning approach to identifyKoopman eigenfunctions, we first summarize highlightsand challenges in the data-driven discovery of dynamics.Throughout this work, we will consider discrete-time dy-

namical systems,

xk+1 = F(xk), (1)

where x ∈ Rn is the state of the system and F representsthe dynamics that map the state of the system forward intime. Discrete-time dynamics often describe a continuous-time system that is sampled discretely in time, so that xk =x(k∆t) with sampling time ∆t. The dynamics in F are gen-erally nonlinear, and the state x may be high dimensional,although we typically assume that the dynamics evolve on alow-dimensional attractor governed by persistent coherentstructures in the state space [10]. Note that F is often un-known and only measurements of the dynamics are avail-able.

The dominant geometric perspective of dynamical sys-tems, in the tradition of Poincare, concerns the organizationof trajectories of Eq. 1, including fixed points, periodic or-bits, and attractors. Formulating the dynamics as a systemof differential equations in x often admits compact and ef-ficient representations for many natural systems [8]; for ex-ample, Newton’s second law is naturally expressed by Eq.1. However, the solution to these dynamics may be arbitrar-ily complicated, and possibly even irrepresentable, exceptfor special classes of systems. Linear dynamics, where themap F is a matrix that advances the state x, are among thefew systems that admit a universal solution, in terms of theeigenvalues and eigenvectors of the matrix F, also knownas the spectral expansion.

Koopman operator theory

In 1931, B. O. Koopman provided an alternative descriptionof dynamical systems in terms of the evolution of functionsin the Hilbert space of possible measurements y = g(x)of the state [25]. The so-called Koopman operator, K, thatadvances measurement functions is an infinite-dimensionallinear operator:

Kg , g ◦ F =⇒ Kg(xk) = g(xk+1). (2)

Koopman analysis has gained significant attention recentlywith the pioneering work of Mezic et al. [38, 35, 9, 36, 34],and in response to the growing wealth of measurement dataand the lack of known equations for many systems [8, 29].Representing nonlinear dynamics in a linear framework, viathe Koopman operator, has the potential to enable advancednonlinear prediction, estimation, and control using the com-prehensive theory developed for linear systems. However,obtaining finite-dimensional approximations of the infinite-dimensional Koopman operator has proven challenging inpractical applications.

Finite-dimensional representations of the Koopman op-erator are often approximated using the dynamic mode de-composition (DMD) [45, 29], introduced by Schmid [46].By construction, DMD identifies spatio-temporal coherentstructures from a high-dimensional dynamical system, al-though it does not generally capture nonlinear transientssince it is based on linear measurements of the system,

2

Page 3: Deep learning for universal linear embeddings of nonlinear ... · Deep learning for universal linear embeddings of nonlinear dynamics Bethany Lusch 1;2, J. Nathan Kutz , and Steven

Input 𝒙𝒌

Autoencoder:

𝝋−𝟏 𝛗 𝒙𝒌 = 𝒙𝒌

Input 𝒙𝒌 𝒚𝒌 Output 𝒙𝒌

Encoder 𝒚 = 𝝋(𝒙) Decoder 𝒙 = 𝝋−𝟏(𝒚)

𝒚𝒌

Prediction: 𝝋−𝟏 𝐊𝛗 𝒙𝒌 = 𝒙𝐤+𝟏

𝒚𝒌+𝟏

Linearity: 𝐊𝛗 𝒙𝒌 = 𝝋(𝒙𝐤+𝟏)

Network outputs equivalent

𝒚𝒌 Output 𝒙𝒌+𝟏𝒚𝒌+𝟏

𝝋 𝝋−𝟏

𝑲 linear

……

……

……

……

𝒙𝒌 𝒚𝒌

𝝋

𝑲

𝒚𝒌+𝟏 𝒙𝒌+𝟏 𝒚𝒌+𝟏

𝝋

Figure 1: Diagram of our deep learning schema to identify Koopman eigenfunctions ϕ(x). (a) Our network is based ona deep auto-encoder, which is able to identify intrinsic coordinates y = ϕ(x) and decode these coordinates to recoverx = ϕ−1(y). (b,c) We add an additional loss function to identify a linear Koopman model K that advances the intrinsicvariables y forward in time. In practice, we enforce agreement with the trajectory data for several iterations through thedynamics, i.e. Km. In (b), the loss function is evaluated on the state variable x and in (c) it is evaluated on y.

g(x) = x. Extended DMD (eDMD) and the related vari-ational approach of conformation dynamics (VAC) [41, 42,43] enriches the model with nonlinear measurements [51,31]; for more details, see SI Appendix. Identifying regressionmodels based on nonlinear measurements will generally re-sult in closure issues, as there is no guarantee that thesemeasurements form a Koopman invariant subspace [7]. Theresulting models are of exceedingly high dimension, andwhen kernel methods are employed [52], the models maybecome uninterpretable. Instead, many approaches seek toidentify eigenfunctions of the Koopman operator directly,satisfying:

ϕ(xk+1) = Kϕ(xk) = λϕ(xk). (3)

Eigenfunctions are guaranteed to span an invariant sub-space, and the Koopman operator will yield a matrix whenrestricted to this subspace [7, 21]. In practice, Koopmaneigenfunctions may be more difficult to obtain than the so-lution of (1); however, this is a one-time up-front cost thatyields a compact linear description. The challenge of iden-tifying and representing Koopman eigenfunctions providesstrong motivation for the use of powerful emerging deeplearning methods [50, 33, 48, 53, 44, 31].

Koopman for systems with continuous spectra

The Koopman operator provides a global linearization ofthe dynamics. The concept of linearizing dynamics is notnew, and locally linear representations are commonly ob-tained by linearizing around fixed points and periodic or-bits [17]. Indeed, asymptotic and perturbation methodshave been widely used since the time of Newton to ap-

proximate solutions of nonlinear problems by starting fromthe exact solution of a related, typically linear problem.The classic pendulum, for instance, satisfies the differentialequation x = − sin(ωx) and has eluded an analytic solutionsince its mathematical inception. The linear problem asso-ciated with the pendulum involves the small angle approxi-mation whereby sin(ωx) = ωx− (ωx)3/3! + · · · and only thefirst term is retained in order to yield exact sinusoidal so-lutions. The next correction involving the cubic term givesthe Duffing equation which is one of the most commonlystudied nonlinear oscillators in physics [17]. Importantly,the cubic contribution is known to shift the linear oscilla-tion frequency of the pendulum, ω → ω + ∆ω as well asgenerate harmonics such as exp(±3iω) [4, 22]. An exact rep-resentation of the solution can be derived in terms of Jacobielliptic functions, which have a Taylor series representationin terms of an infinite sum of sinusoids with frequencies(2n−1)ω where n = 1, 2, · · · ,∞. Thus, the simple pendu-lum oscillates at the (linear) natural frequency ω for smalldeflections, and as the pendulum energy is increased, thefrequency decreases continuously, resulting in a so-calledcontinuous spectrum.

The importance of accounting for the continuous spec-trum was discussed in 1932 in an extension by Koopmanand von Neumann [26]. A continuous spectrum, as de-scribed for the simple pendulum, is characterized by a con-tinuous range of observed frequencies, as opposed to thediscrete spectrum consisting of isolated, fixed frequencies.This phenomena is observed in a wide range of physicalsystems that exhibit broadband frequency content, such asturbulence and nonlinear optics. The continuous spectrumthus confounds simple Koopman descriptions, as there is

3

Page 4: Deep learning for universal linear embeddings of nonlinear ... · Deep learning for universal linear embeddings of nonlinear dynamics Bethany Lusch 1;2, J. Nathan Kutz , and Steven

𝝋−𝟏

𝑲(λ)

λ

𝒙𝒌 𝒚𝒌 𝒚𝒌+𝟏 𝒙𝒌+𝟏

𝝋

Λ

Figure 2: Schematic of modified schema with auxiliary net-work to identify (parametrize) the continuous eigenvaluespectrum λ. This facilitates an aggressive dimensionalityreduction in the auto-encoder, avoiding the need for higherharmonics of the fundamental frequency that are generatedby the nonlinearity [4, 22]. For purely oscillatory motion,as in the pendulum, we identify the continuous frequencyλ± = ±iω.

not a straightforward finite approximation in terms of asmall number of eigenfunctions [34]. Indeed, away fromthe linear regime, an infinite Fourier sum is required to ap-proximate the shift in frequency and eigenfunctions. In fact,in some cases, eigenfunctions may not exist at all.

Recently, there have been several algorithmic advancesto approximate systems with continuous spectra, includingnonlinear Laplacian spectral analysis [14] and the use of de-lay coordinates [6, 2]. A critically enabling innovation ofthe present work is explicitly accounting for the parametricdependence of the Koopman operator K(λ) on the continu-ously varying λ, related to the classic perturbation resultsabove. By constructing an auxiliary network (See Fig. 2)to first determine the parametric dependency of the Koop-man operator on the frequency λ± = ±iω, an interpretablelow-rank model of the intrinsic dynamics can then by con-structed. In particular, a nonlinear oscillator with continu-ous spectrum may now be represented as a single pair ofconjugate eigenfunctions, mapping trajectories into perfectsines and cosines, with a continuous eigenvalue parameter-izing the frequency. If this explicit frequency dependence isunaccounted for, then a high-dimensional network is nec-essary to account for the shifting frequency and eigenval-ues. We conjecture that previous Koopman models usinghigh-dimensional DNNs represent the harmonic series ex-pansion required to approximate the continuous spectrumfor systems such as the Duffing oscillator.

3 Deep learning to identify Koopmaneigenfunctions

The overarching goal of this work is to leverage the powerof deep learning to discover and represent eigenfunctionsof the Koopman operator. Our perspective is driven bythe need for parsimonious representations that are efficient,avoid overfitting, and provide minimal descriptions of the

dynamics on interpretable intrinsic coordinates. Unlike pre-vious deep learning approaches to Koopman [50, 33, 48, 53],our network architecture is designed specifically to handlea ubiquitous class of nonlinear systems characterized by acontinuous frequency spectrum generated by the nonlin-earity. A continuous spectrum presents unique challengesfor compact and interpretable representation, and our ap-proach is inspired by the classical asymptotic and perturba-tion approaches in dynamical systems.

Our core network architecture is shown in Fig. 1, andit is modified in Fig. 2 to handle the continuous spectrum.The objective of this network is to identify a few key in-trinsic coordinates y = ϕ(x) spanned by a set of Koopmaneigenfunctions ϕ : Rn → Rp, along with a dynamical sys-tem yk+1 = Kyk. There are three high-level requirementsfor the network, corresponding to three types of loss func-tions used in training:

1. Intrinsic coordinates that are useful for reconstruction. Weseek to identify a few intrinsic coordinates y = ϕ(x)where the dynamics evolve, along with an inverse x =ϕ−1(y) so that the state x may be recovered. This isachieved using an auto-encoder (See Figure 1 a.), whereϕ is the encoder and ϕ−1 is the decoder. The dimen-sion p of the auto-encoder subspace is a hyperparam-eter of the network, and this choice may be guided byknowledge of the system. Reconstruction accuracy ofthe auto-encoder is achieved using the following loss:‖x−ϕ−1(ϕ(x))‖.

2. Linear dynamics. To discover Koopman eigenfunctions,we learn the linear dynamics K on the intrinsic coordi-nates, i.e., yk+1 = Kyk. Linear dynamics are achievedusing the following loss: ‖ϕ(xk+1) − Kϕ(xk)‖. Moregenerally, we enforce linear prediction over m timesteps with the loss: ‖ϕ(xk+m)−Kmϕ(xk)‖. (See Figure1 c.)

3. Future state prediction. Finally, the intrinsic coordinatesmust enable future state prediction. Specifically, weidentify linear dynamics in the matrix K. This corre-sponds to the loss ‖xk+1 − ϕ−1(Kϕ(xk))‖, and moregenerally ‖xk+m −ϕ−1(Kmϕ(xk))‖. (See Figure 1 b.)

Our norm ‖ · ‖ is mean-squared error, averaging over di-mension then number of examples, and we add `2 regular-ization.

To address the continuous spectrum, we allow theeigenvalues of the matrix K to vary, parametrized by thefunction λ = Λ(y), which is learned by an auxiliary net-work (See Fig. 2). The eigenvalues λ± = µ ± iω are thenused to parametrize block-diagonal K(µ, ω). For each pairof complex eigenvalues, the discrete-time K has a Jordanblock of the form:

B(µ, ω) = exp(µ∆t)

[cos(ω∆t) − sin(ω∆t)sin(ω∆t) cos(ω∆t)

].

This network structure allows the eigenvalues to varyacross phase space, facilitating a small number of eigen-

4

Page 5: Deep learning for universal linear embeddings of nonlinear ... · Deep learning for universal linear embeddings of nonlinear dynamics Bethany Lusch 1;2, J. Nathan Kutz , and Steven

𝑥2

𝑥1 𝑦1

𝑦2

Figure 3: Demonstration of neural network embedding ofKoopman eigenfunctions for simple system with a discreteeigenvalue spectrum.

functions. To enforce circular symmetry in the eigenfunc-tion coordinates, we often parameterize the eigenvalues bythe radius λ(‖y‖22). The second and third prediction lossfunction must also be modified for systems with continu-ous spectrum, as discussed in the SI Appendix.

To train our network, we generate trajectories from ran-dom initial conditions, which are split into training, valida-tion, and test sets. Models are trained on the training setand compared on the validation set, which is also used forearly stopping to prevent overfitting. We report accuracy onthe test set.

4 Results

We demonstrate our deep learning approach to identifyKoopman eigenfunctions on several example systems, in-cluding a simple model with a discrete spectrum and twoexamples that exhibit a continuous spectrum: the nonlinearpendulum and the high-dimensional unsteady fluid flowpast a cylinder.

Simple model with discrete spectrum

Before analyzing systems with the additional challenges of acontinuous spectrum and high-dimensionality, we considera simple nonlinear system with a single fixed point and adiscrete eigenvalue spectrum:

x1 = µx1

x2 = λ(x2 − x21).

This dynamical system has been well-studied in the liter-ature [49, 7], and for stable eigenvalues λ < µ < 0, thesystem exhibits a slow manifold given by x2 = x21; we useµ = −0.05 and λ = −1. As shown in Fig. 3, the Koopmanembedding identifies nonlinear coordinates that flatten thisinertial manifold, providing a globally linear representationof the dynamics; moreover, the correct Koopman eigenval-ues are identified. Specific details about the network andtraining procedure are provided in the SI Appendix.

Nonlinear pendulum with continuous spectrum

As a second example, we consider the nonlinear pendulum,which exhibits a continuous eigenvalue spectrum with in-creasing energy:

x = − sin(x) =⇒{x1 = x2x2 = − sin(x1).

Although this is a simple mechanical system, it has eludedparsimonious representation in the Koopman framework.The deep Koopman embedding is shown in Fig. 4, whereit is clear that the dynamics are linear in the eigenfunctioncoordinates, given by y = ϕ(x). As the Hamiltonian energyof the system increases, corresponding to an elongation ofthe oscillation period, the parameterized Koopman networkaccounts for this continuous frequency shift and providesa compact representation in terms of two conjugate eigen-functions. Alternative network architectures that are notspecifically designed to account for continuous spectra withan auxiliary network would be forced to approximate thisfrequency shift with the classical asymptotic expansion interms of harmonics. The resulting network would be overlybulky and would limit interpretability.

Recall that we have three types of losses on the network:reconstruction, prediction, and linearity. Figure 4II.A showsthat the network is able to function as an auto-encoder, ac-curately reconstructing the ten example trajectories. Next,we show that the network is able to predict the evolutionof the system. Figure 4II.B shows the prediction horizonfor ten initial conditions that are simulated forward withthe network, stopping the prediction when the relative er-ror reaches 10%. As expected, the prediction horizon de-teriorates as the energy of the initial condition increases,although the prediction is still quite accurate. Finally, wedemonstrate that the dynamics in the intrinsic coordinatesy are truly linear, as shown by the nearly concentric circlesin Fig. 4II.C. The eigenfunctions ϕ1(x) and ϕ2(x) are shownin Fig. 4III, and are connected to theory [37] in the SI Ap-pendix.

High-dimensional nonlinear fluid flow

As our final example, we consider the nonlinear fluid flowpast a circular cylinder at Reynolds number 100 basedon diameter, which is characterized by vortex shedding.This model has been a benchmark in fluid dynamics fordecades [40], and has been extensively analyzed in the con-text of data-driven modeling [8, 32] and Koopman analy-sis [3]. In 2003, Noack et al. [40] showed that the high-dimensional dynamics evolve on a low-dimensional attrac-tor, given by a slow-manifold in the following model:

x1 = µx1 − ωx2 +Ax1x3

x2 = ωx1 + µx2 +Ax2x3

x3 = −λ(x3 − x21 − x22).

This mean-field model exhibits a stable limit cycle corre-sponding to von Karman vortex shedding, and an unstable

5

Page 6: Deep learning for universal linear embeddings of nonlinear ... · Deep learning for universal linear embeddings of nonlinear dynamics Bethany Lusch 1;2, J. Nathan Kutz , and Steven

𝜃

𝑥1

𝑥2 = ሶ𝜃

𝑥1

𝑡

𝑡

𝑥2

potential

𝑥1 = 𝜃

I. II.A. III.

B.

C.

𝑥1

𝑥1

𝑦1

𝑥2

𝑥2

𝑦2

−0.09 0 0.09

Figure 4: Illustration of deep Koopman eigenfunctions for the nonlinear pendulum. The pendulum, although a simplemechanical system, exhibits a continuous spectrum, making it difficult to obtain a compact representation in terms of Koop-man eigenfunctions. However, by leveraging a generalized network, as in Fig. 2, it is possible to identify a parsimoniousmodel in terms of a single complex conjugate pair of eigenfunctions, parameterized by the frequency ω. In eigenfunctioncoordinates, the dynamics become linear, and orbits are given by perfect circles. For the sake of visualization, we use tenevenly spaced trajectories instead of the random trajectories in the testing set.

equilibrium corresponding to a low-drag condition. Start-ing near this equilibrium, the flow unwinds up the slowmanifold toward the limit cycle. In [32], Loiseau showedthat this flow may be modeled by a nonlinear oscillator withstate-dependent damping, making it amenable to the con-tinuous spectrum analysis. We use trajectories from thismodel to train a Koopman network, and the resulting eigen-functions are shown in Fig. 5.

In this example, the damping rate µ and frequency ω areallowed to vary along level sets of the radius in eigenfunc-tion coordinates, so that µ(R) and ω(R), whereR2 = y21+y22 ;this is accomplished with an auxiliary network as in Fig. 2.Although we only show the ability of the model to predictthe future state in Fig. 5, corresponding to the second andthird loss functions, the network also functions as an au-toencoder.

5 Discussion

In summary, we have employed powerful deep learning ap-proaches to identify and represent coordinate transforma-tions that recast strongly nonlinear dynamics into a globallylinear framework. Our approach is designed to discovereigenfunctions of the Koopman operator, which provide anintrinsic coordinate system to linearize nonlinear systems,and have been notoriously difficult to identify and repre-sent using alternative methods. Building on a deep auto-

encoder framework, we enforce additional constraints andloss functions to identify Koopman eigenfunctions wherethe dynamics evolve linearly. Moreover, we generalize thisframework to include a broad class of nonlinear systemsthat exhibit a continuous eigenvalue spectrum, where acontinuous range of frequencies is observed. Continuous-spectrum systems are notoriously difficult to analyze, espe-cially with Koopman theory, and naive learning approachesrequire asymptotic expansions in terms of higher order har-monics of the fundamental frequency, leading to unwieldymodels. In contrast, we utilize an auxiliary network toparametrize and identify the continuous frequency, whichthen parameterizes a compact Koopman model on the auto-encoder coordinates. Thus, our deep neural network mod-els remain both parsimonious and interpretable, mergingthe best of neural network representations and Koopmanembeddings. In most deep learning applications, althoughthe basic architecture is extremely general, considerable ex-pert knowledge and intuition is typically used in the train-ing process and in designing loss functions and constraints.Throughout this paper, we have also used physical insightand intuition from asymptotic theory and continuous spec-trum dynamical systems to design parsimonious Koopmanembeddings.

The use of deep learning in physics and engineering isincreasing at an incredible rate, and this trend is only ex-pected to accelerate. Nearly every field of science is revis-

6

Page 7: Deep learning for universal linear embeddings of nonlinear ... · Deep learning for universal linear embeddings of nonlinear dynamics Bethany Lusch 1;2, J. Nathan Kutz , and Steven

−0.3 0.30

Figure 5: Learned Koopman eigenfunctions for the mean-field model of fluid flow past a circular cylinder at Reynoldsnumber 100. (top) Reconstruction of trajectory from linearKoopman model with two states; modes for each of the statespace variables x are shown along the coordinate axes. (bot-tom) Koopman reconstruction in eigenfunction coordinatesy, along with eigenfunctions y = ϕ(x).

iting challenging problems of central importance from theperspective of big data and deep learning. With this ex-plosion of interest, it is imperative that we as a commu-nity seek machine learning models that favor interpretabil-ity and promote physical insight and intuition. In this chal-lenge, there is a tremendous opportunity to gain new un-derstanding and insight by applying increasingly power-ful techniques to data. For example, discovering Koop-man eigenfunctions will result in new symmetries and con-servation laws, as conserved eigenfunctions are related toconservation laws via a generalized Noether’s theorem. Itwill also be important to apply these techniques to increas-ingly challenging problems, such as turbulence, epidemiol-ogy, and neuroscience, where data is abundant and mod-els are needed. The goal is to model these systems with asmall number of coupled nonlinear oscillators using simi-lar parameterized Koopman embeddings. Finally, the useof deep learning to discover Koopman eigenfunctions mayenable transformative advances in the nonlinear control ofcomplex systems. All of these future directions will be facil-itated by more powerful network representations.

Acknowledgements

We acknowledge generous funding from the Army Re-search Office (ARO W911NF-17-1-0306) and the DefenseAdvanced Research Projects Agency (DARPA HR0011-16-

C-0016). We would like to thank many people for valu-able discussions about neural networks and Koopman the-ory: Bing Brunton, Karthik Duraisamy, Eurika Kaiser, BerndNoack, and Josh Proctor, and especially Jean-ChristopheLoiseau, Igor Mezic, and Frank Noe.

References

[1] Martın Abadi, Ashish Agarwal, Paul Barham, EugeneBrevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado,Andy Davis, Jeffrey Dean, Matthieu Devin, SanjayGhemawat, Ian Goodfellow, Andrew Harp, GeoffreyIrving, Michael Isard, Yangqing Jia, Rafal Jozefow-icz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg,Dan Mane, Rajat Monga, Sherry Moore, Derek Mur-ray, Chris Olah, Mike Schuster, Jonathon Shlens, BenoitSteiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vin-cent Vanhoucke, Vijay Vasudevan, Fernanda Viegas,Oriol Vinyals, Pete Warden, Martin Wattenberg, Mar-tin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensor-Flow: Large-scale machine learning on heterogeneoussystems, 2015. Software available from tensorflow.org.

[2] Hassan Arbabi and Igor Mezic. Ergodic theory, dy-namic mode decomposition and computation of spec-tral properties of the koopman operator. SIAM J. Appl.Dyn. Syst., 16(4):2096–2126, 2017.

[3] Shervin Bagheri. Koopman-mode decomposition ofthe cylinder wake. Journal of Fluid Mechanics, 726:596–623, 2013.

[4] Carl M Bender and Steven A Orszag. Advanced mathe-matical methods for scientists and engineers I: Asymptoticmethods and perturbation theory. Springer Science &Business Media, 2013.

[5] Josh Bongard and Hod Lipson. Automated reverse en-gineering of nonlinear dynamical systems. Proceedingsof the National Academy of Sciences, 104(24):9943–9948,2007.

[6] S. L. Brunton, B. W. Brunton, J. L. Proctor, E. Kaiser,and J. N. Kutz. Chaos as an intermittently forced linearsystem. Nature Communications, 8(19):1–9, 2017.

[7] S. L. Brunton, B. W. Brunton, J. L. Proctor, and J. NKutz. Koopman invariant subspaces and finite lin-ear representations of nonlinear dynamical systems forcontrol. PLoS ONE, 11(2):e0150171, 2016.

[8] S. L. Brunton, J. L. Proctor, and J. N. Kutz. Discoveringgoverning equations from data by sparse identificationof nonlinear dynamical systems. Proc. Nat. Acad. Sci.,113(15):3932–3937, 2016.

[9] Marko Budisic, Ryan Mohr, and Igor Mezic. AppliedKoopmanism a). Chaos: An Interdisciplinary Journal ofNonlinear Science, 22(4):047510, 2012.

7

Page 8: Deep learning for universal linear embeddings of nonlinear ... · Deep learning for universal linear embeddings of nonlinear dynamics Bethany Lusch 1;2, J. Nathan Kutz , and Steven

[10] Mark C Cross and Pierre C Hohenberg. Pattern forma-tion outside of equilibrium. Reviews of modern physics,65(3):851, 1993.

[11] G. Cybenko. Approximation by superpositions of asigmoidal function. Mathematics of Control, Signals, andSystems (MCSS), 2(4):303–314, December 1989.

[12] Geir. E. Dullerud and Fernando Paganini. A course inrobust control theory: A convex approach. Texts in AppliedMathematics. Springer, Berlin, Heidelberg, 2000.

[13] F. Fukushima. A self-organizing neural network modelfor a mechanism of pattern recognition unaffected byshift in position. Biological Cybernetic, 36:193–202, 1980.

[14] Dimitrios Giannakis and Andrew J Majda. Nonlin-ear laplacian spectral analysis for time series with in-termittency and low-frequency variability. Proc. Nat.Acad. Sci., 109(7):2222–2227, 2012.

[15] R Gonzalez-Garcia, R Rico-Martinez, andIG Kevrekidis. Identification of distributed pa-rameter systems: A neural net based approach. Comp.& Chem. Eng., 22:S965–S968, 1998.

[16] Ian Goodfellow, Yoshua Bengio, and Aaron Courville.Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.

[17] Philip Holmes and John Guckenheimer. Nonlinear oscil-lations, dynamical systems, and bifurcations of vector fields,volume 42 of Applied Mathematical Sciences. Springer-Verlag, 1983.

[18] Kurt Hornik, Maxwell Stinchcombe, and HalbertWhite. Multilayer feedforward networks are universalapproximators. Neural networks, 2(5):359–366, 1989.

[19] Kurt Hornik, Maxwell Stinchcombe, and HalbertWhite. Universal approximation of an unknown map-ping and its derivatives using multilayer feedforwardnetworks. Neural networks, 3(5):551–560, 1990.

[20] D. H. Hubel and T. N. Wiesel. Receptive fields, binoc-ular interaction and functional architecture in the cat’svisual cortex. Journal of Physiology, 160:106–154, 1962.

[21] E. Kaiser, J. N. Kutz, and S. L. Brunton. Data-driven discovery of Koopman eigenfunctions for con-trol. arXiv preprint arXiv:1707.01146, 2017.

[22] Jirayr Kevorkian and Julian D Cole. Perturbation meth-ods in applied mathematics, volume 34. Springer Science& Business Media, 2013.

[23] Diederik Kingma and Jimmy Ba. Adam: A method forstochastic optimization. arXiv preprint arXiv:1412.6980,2014.

[24] Stefan Klus, Feliks Nuske, Peter Koltai, Hao Wu, Ioan-nis Kevrekidis, Christof Schutte, and Frank Noe. Data-driven model reduction and transfer operator approx-imation. Journal of Nonlinear Science, 2018.

[25] B. O. Koopman. Hamiltonian systems and transfor-mation in Hilbert space. Proceedings of the NationalAcademy of Sciences, 17(5):315–318, 1931.

[26] BO Koopman and J v Neumann. Dynamical systems ofcontinuous spectra. Proceedings of the National Academyof Sciences of the United States of America, 18(3):255, 1932.

[27] Milan Korda and Igor Mezic. Linear predictorsfor nonlinear dynamical systems: Koopman opera-tor meets model predictive control. arXiv preprintarXiv:1611.03537, 2016.

[28] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hin-ton. Imagenet classification with deep convolutionalneural networks. In Advances in neural information pro-cessing systems, pages 1097–1105, 2012.

[29] J. N. Kutz, S. L. Brunton, B. W. Brunton, and J. L. Proc-tor. Dynamic Mode Decomposition: Data-Driven Modelingof Complex Systems. SIAM, 2016.

[30] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton.Deep learning. Nature, 521(7553):436–444, 2015.

[31] Qianxiao Li, Felix Dietrich, Erik M Bollt, and Ioan-nis G Kevrekidis. Extended dynamic mode decompo-sition with dictionary learning: A data-driven adap-tive spectral decomposition of the koopman operator.Chaos: An Interdisciplinary Journal of Nonlinear Science,27(10):103111, 2017.

[32] J.-C. Loiseau and S. L. Brunton. Constrained sparseGalerkin regression. Journal of Fluid Mechanics, 838:42–67, 2018.

[33] Andreas Mardt, Luca Pasquali, Hao Wu, and FrankNoe. VAMPnets: Deep learning of molecular kinetics.Nature Communications, 9(5), 2018.

[34] I. Mezic. Spectral operator methods in dynamical systems:Theory and applications. Springer, 2017.

[35] Igor Mezic. Spectral properties of dynamical systems,model reduction and decompositions. Nonlinear Dy-namics, 41(1-3):309–325, 2005.

[36] Igor Mezic. Analysis of fluid flows via spectral prop-erties of the Koopman operator. Annual Review of FluidMechanics, 45:357–378, 2013.

[37] Igor Mezic. Koopman operator spectrum and dataanalysis. arXiv preprint arXiv:1702.07597, 2017.

[38] Igor Mezic and Andrzej Banaszuk. Comparison of sys-tems with complex behavior. Physica D: Nonlinear Phe-nomena, 197(1):101–133, 2004.

[39] Michele Milano and Petros Koumoutsakos. Neuralnetwork modeling for near wall turbulent flow. Jour-nal of Computational Physics, 182(1):1–26, 2002.

8

Page 9: Deep learning for universal linear embeddings of nonlinear ... · Deep learning for universal linear embeddings of nonlinear dynamics Bethany Lusch 1;2, J. Nathan Kutz , and Steven

[40] B. R. Noack, K. Afanasiev, M. Morzynski, G. Tadmor,and F. Thiele. A hierarchy of low-dimensional mod-els for the transient and post-transient cylinder wake.Journal of Fluid Mechanics, 497:335–363, 2003.

[41] Frank Noe and Feliks Nuske. A variational approach tomodeling slow processes in stochastic dynamical sys-tems. Multiscale Modeling & Simulation, 11(2):635–655,2013.

[42] Feliks Nuske, Bettina G Keller, Guillermo Perez-Hernandez, Antonia SJS Mey, and Frank Noe. Varia-tional approach to molecular kinetics. Journal of chemi-cal theory and computation, 10(4):1739–1752, 2014.

[43] Feliks Nuske, Reinhold Schneider, Francesca Vitalini,and Frank Noe. Variational tensor approach for ap-proximating the rare-event kinetics of macromolecularsystems. J. Chem. Phys., 144(5):054105, 2016.

[44] Samuel E Otto and Clarence W Rowley. Linearly-recurrent autoencoder networks for learning dynam-ics. arXiv preprint arXiv:1712.01378, 2017.

[45] C. W. Rowley, I. Mezic, S. Bagheri, P. Schlatter, and D.S.Henningson. Spectral analysis of nonlinear flows. J.Fluid Mech., 645:115–127, 2009.

[46] P. J. Schmid. Dynamic mode decomposition of numer-ical and experimental data. Journal of Fluid Mechanics,656:5–28, August 2010.

[47] Michael Schmidt and Hod Lipson. Distilling free-form natural laws from experimental data. Science,324(5923):81–85, 2009.

[48] Naoya Takeishi, Yoshinobu Kawahara, and TakehisaYairi. Learning koopman invariant subspaces for dy-namic mode decomposition. In Advances in Neural In-formation Processing Systems, pages 1130–1140, 2017.

[49] Jonathan H. Tu, Clarence W. Rowley, Dirk M. Luchten-burg, Steven L. Brunton, and J. Nathan Kutz. On dy-namic mode decomposition: Theory and applications.Journal of Computational Dynamics, 1(2):391–421, 2014.

[50] Christoph Wehmeyer and Frank Noe. Time-lagged au-toencoders: Deep learning of slow collective variablesfor molecular kinetics. arXiv preprint arXiv:1710.11239,2017.

[51] Matthew O Williams, Ioannis G Kevrekidis, andClarence W Rowley. A data-driven approximation ofthe Koopman operator: extending dynamic mode de-composition. J. Nonlin. Sci., 6:1307–1346, 2015.

[52] Matthew O Williams, Clarence W Rowley, and Ioan-nis G Kevrekidis. A kernel approach to data-drivenKoopman spectral analysis. Journal of ComputationalDynamics, 2(2):247–265, 2015.

[53] Enoch Yeung, Soumya Kundu, and Nathan Hodas.Learning deep neural network representations forKoopman operators of nonlinear dynamical systems.arXiv preprint arXiv:1708.06850, 2017.

[54] Jason Yosinski, Jeff Clune, Yoshua Bengio, and HodLipson. How transferable are features in deep neuralnetworks? In Advances in neural information processingsystems, pages 3320–3328, 2014.

9

Page 10: Deep learning for universal linear embeddings of nonlinear ... · Deep learning for universal linear embeddings of nonlinear dynamics Bethany Lusch 1;2, J. Nathan Kutz , and Steven

Supporting Information (SI)

Model problems and training datasets

We demonstrate the ability of deep learning to representKoopman eigenfunctions on three example systems, shownin Fig. 6.

The first system has a single fixed point and a discreteeigenvalue spectrum:

x1 = µx1 (4a)

x2 = λ(x2 − x21), (4b)

where µ = −0.05 and λ = −1. This provides a benchmarksystem to test our architecture.

The second system is the nonlinear pendulum:

x1 = x2 (5a)x2 = − sin(x1). (5b)

The nonlinear pendulum exhibits a continuous eigenvaluespectrum as the energy of the pendulum is increased, pos-ing a significant challenge for classical Koopman embed-ding techniques. In this example, we consider the friction-less pendulum, so that the system is conservative and tra-jectories evolve on Hamiltonian energy level sets.

The third example is the mean-field model [40, 32] forthe fluid flow past a circular cylinder at Reynolds number100:

x1 = µx1 − ωx2 +Ax1x3 (6a)x2 = ωx1 + µx2 +Ax2x3 (6b)

x3 = −λ(x3 − x21 − x22), (6c)

where µ = 0.1, ω = 1, A = −0.1, and λ = 10. This system isa challenging canonical system in fluid dynamics, and is amodel for the vortex shedding observed behind bluff bod-ies. The system exhibits a slow manifold, and we considertwo cases corresponding to trajectories that start on the slowmanifold and trajectories that start off of the manifold.

Creating the datasets

We create our datasets by solving the systems of differentialequations in MATLAB using the ode45 solver.

For each dynamical system, we choose 5000 initial con-ditions for the test set, 5000 for the validation set, and 5000-20000 for the training set (see Table 1). For each initial condi-tion, we solve the differential equations for some time span.That time span is t = 0, .02, . . . , 1 for the discrete spectrumand pendulum datasets. Since the dynamics on the slowmanifold for the fluid flow example are slower and morecomplicated, we increase the time span for that dataset tot = 0, .05, . . . , 6. However, when we include data off theslow manifold, we want to capture the fast dynamics as thetrajectories are attracted to the slow manifold, so we changethe time span to t = 0, .01, . . . , 1.

The discrete spectrum dataset is created from randominitial conditions x where x1, x2 ∈ [−0.5, 0.5], since this por-tion of phase space is sufficient to capture the dynamics.

The pendulum dataset is created from random initialconditions x where x1 ∈ [−3.1, 3.1] (just under [−π, π]),x2 ∈ [−2, 2], and the potential function is under 0.99. Thepotential function for the pendulum is 1

2x22 − cos(x1). These

ranges are chosen to sample the pendulum in the full phasespace where the pendulum approaches having an infiniteperiod.

𝑥2𝑥1

𝑥3x3

x2

x1

𝑥2

𝑥1 𝑦1

𝑦2

𝜃

𝑥1

𝑥2 = ሶ𝜃

𝑥1

𝑡

𝑡

𝑥2potential

𝑥1 = 𝜃

I. II.A. III.

B.

C.

𝑥1

𝑥1

𝑦1

𝑥2

𝑥2

𝑦2

−0.1 0 0.1

𝜃

𝑥1

𝑥2 = ሶ𝜃

𝑥1

𝑡

𝑡

𝑥2

potential

𝑥1 = 𝜃

I. II.A. III.

B.

C.

𝑥1

𝑥1

𝑦1

𝑥2

𝑥2

𝑦2

−0.1 0 0.1

Figure 6: Three model problems used to demonstrate deepneural network embedding of Koopman eigenfunctions.(top) Simple system with discrete spectrum and a singlefixed point, (middle) nonlinear pendulum, exhibiting a con-tinuous spectrum, and (bottom) unsteady fluid flow past acylinder at Reynolds number 100.

The fluid flow problem limited to the slow manifold iscreated from random initial conditions x on the bowl wherer ∈ [0, 1.1], θ ∈ [0, 2π], x1 = r cos(θ), x2 = r sin(θ), andx3 = x21+x22. This captures all types of dynamics on the slowmanifold, including spiraling in towards the limit cycle atr = 1 and spiraling out towards it.

The fluid flow problem beyond the slow manifoldis created from random initial conditions x where x1 ∈[−1.1, 1.1], x2 ∈ [−1.1, 1.1], and x3 ∈ [0, 2.42]. These lim-its are chosen to include the dynamics on the slow mani-

10

Page 11: Deep learning for universal linear embeddings of nonlinear ... · Deep learning for universal linear embeddings of nonlinear dynamics Bethany Lusch 1;2, J. Nathan Kutz , and Steven

fold covered by the previous dataset, as well as trajectoriesthat begin off the slow manifold. Any trajectory that growsto x3 > 2.5 is eliminated so that the domain is reasonablycompact and well-sampled.

Table 1: Dataset Sizes

Discrete Pendulum Fluid Fluidspectrum flow 1 flow 2

Length of traj. 51 51 121 101# training traj. 5000 15,000 15,000 20,000Batch size 256 128 256 128

Network architecture and training

Code

We use the Python API for the TensorFlow framework[1] and the Adam optimizer [23] for training. All ofour code is available online at github.com/BethanyL/DeepKoopman.

Network architecture

Each hidden layer has the form of Wx + b followed byan activation with the rectified linear unit (ReLU): f(x) =max{0,x}. In our experiments, training was significantlyfaster with ReLU as the activation function than with sig-moid. See Table 2 for the number of hidden layers in theencoder, decoder, and auxiliary network, as well as theirwidths. The output layers of the encoder, decoder, and aux-iliary network are linear (simply Wx + b).

The input to the auxiliary network is y, and it outputsthe parameters for the eigenvalues of K. For each complexconjugate pair of eigenvalues λ± = µ± iω, the network de-fines a function Λ mapping y2j + y2j+1 to µ and ω, where yjand yj+1 are the corresponding eigenfunctions. Similarly,for each real eigenvalue λ, the network defines a functionmapping yj to λ. For example, for the fluid flow problemoff the attractor, we have three eigenfunctions. The auxil-iary network learns a map from y21 + y22 to µ and ω and an-other map from y3 to λ. This could be implemented as onenetwork defining a mapping Λ : R2 → R3 where the layersare not fully connected (y21 + y22 should not influence λ andy3 should not influence µ and ω). However, for simplicity,we implement this as two separate auxiliary networks, onefor the complex conjugate pair of eigenvalues and one forthe the real eigenvalue.

Table 2: Network ArchitectureDiscrete Pendulum Fluid Fluid

spectrum flow 1 flow 2

# hidden layers (HL) 2 2 1 1Width HL 30 80 105 130# HL aux. net. 3 1 1 2Width HL aux. net. 10 170 300 20

Explicit loss function

Our loss function has three weighted mean-squared errorcomponents: reconstruction accuracy Lrecon, future stateprediction Lpred, and linearity of dynamics Llin. Since weknow that there are no outliers in our data, we also use anL∞ term to penalize the data point with the largest loss. Fi-nally, we add `2 regularization on the weights W to avoidoverfitting. More specifically:

L = α1(Lrecon + Lpred) + Llin + α2L∞ + α3‖W‖22

(7a)

Lrecon = ‖x1 −ϕ−1(ϕ(x1))‖MSE (7b)

Lpred =1

Sp

Sp∑m=1

‖xm+1 −ϕ−1(Kmϕ(x1))‖MSE (7c)

Llin =1

T − 1

T−1∑m=1

‖ϕ(xm+1)−Kmϕ(x1)‖MSE (7d)

L∞ = ‖x1 −ϕ−1(ϕ(x1))‖∞ + ‖x2 −ϕ−1(Kϕ(x1))‖∞,(7e)

where MSE refers to mean squared error and T is the num-ber of time steps in each trajectory. The weights α1, α2, andα3 are hyperparameters. The integer Sp is a hyperparame-ter for how many steps to check in the prediction loss. Thehyperparameters α1, α2, α3, and Sp are listed in Table 3.

Table 3: Loss Hyperparameters

Discrete Pendulum Fluid Fluidspectrum flow 1 flow 2

α1 0.1 0.001 0.1 0.1α2 10−7 10−9 10−7 10−9

α3 10−15 10−14 10−13 10−13

Sp 30 30 30 30

The matrix K is parametrized by the function λ = Λ(y),which is learned by an auxiliary network. The eigenvaluescan vary along a trajectory, so in Lpred and Llin, Km =K(λ1) · K(λ2) · · ·K(λm). However, on Hamiltonian sys-tems, such as the pendulum, the eigenvalues are constantalong each trajectory. If a system is known to be Hamilto-nian, the network training could be sped up by encodingthe constraint that Km = K(λ)m. In order to demonstratethat this specialized knowledge is not necessary, we use themore general case for all of our datasets, including the pen-dulum.

Training

We initialize each weight matrix W randomly from a uni-form distribution in the range [−s, s] for s = 1/

√a, where a

is the dimension of the input of the layer. This distributionwas suggested in [16]. Each bias vector b is initialized to0. The model for the discrete spectrum example is trainedfor two hours on an NVIDIA K80 GPU. The other modelsare each trained for six hours. The learning rate for the

11

Page 12: Deep learning for universal linear embeddings of nonlinear ... · Deep learning for universal linear embeddings of nonlinear dynamics Bethany Lusch 1;2, J. Nathan Kutz , and Steven

−7.00

−5.7

𝑥1

50

𝑥2logerror

steps

Figure 7: On the left is the average log10 prediction erroras the number of prediction steps increases for the discretespectrum example. On the right, for each trajectory, weshow how many steps the network can take before reach-ing 10% relative error.

Adam optimizer is 0.001. On the pendulum and fluid flowdatasets, for five minutes, we pretrain the network to be asimple autoencoder, using the autoencoder loss but not thelinearity or prediction losses, as this speeds up the training.For each dynamical system, we train multiple models in arandom search of parameter space and choose the one withthe lowest validation error. Each model is initialized withdifferent random weights. We also use early stopping; foreach model, at the end of training, we resume the step withthe lowest validation error.

Results

The training, validation, and test errors for all examples arereported in Table 4.

Discrete spectrum example

Figure 7 shows the average prediction error versus the num-ber of prediction steps. Even for a large number of steps,the error is quite small, giving good prediction. This fig-ure also demonstrates prediction performance on exampletrajectories. The eigenfunctions for this example are shownin Fig. 8. We see that one is quadratic and the other is lin-ear. This is expected because we can analytically derive thaty1 = x1 and y2 = x2 − bx21 is a pair of eigenfunctions forthis system, where b = −λ

2µ−λ . When the eigenvalues areallowed to vary with the auxiliary network used for contin-uous spectrum systems, the eigenvalues remain relativelyconstant, near the true values of −0.05 and −1, as shown inFig. 9.

Table 4: Errors for each Problem

Discrete Pendulum Fluid Fluidspectrum flow 1 flow 2

Training 1.4× 10−7 8.5× 10−8 5.4× 10−7 2.8× 10−6

Validation 1.4× 10−7 9.4× 10−8 5.4× 10−7 2.9× 10−6

Test 1.5× 10−7 1.1× 10−7 5.5× 10−7 2.9× 10−6

−0.4 0 0.4 −0.26 0.260

𝜑1 𝜑2

Figure 8: Eigenfunctions for discrete spectrum example.

−0.0494−0.0525 −0.9993 −0.9984

𝑦1

𝑦2

𝑦1

𝑦2

𝜆1 𝜆2

Figure 9: When the eigenvalues of the discrete spectrum ex-ample are allowed to vary in terms of y1 and y2, they remainrelatively constant; i.e., they are close to the true values−.05and −1 as expected.

Nonlinear pendulum

The nonlinear pendulum is one of the simplest examplesthat exhibits a continuous eigenvalue spectrum. Using theauxiliary network, we allow the frequency ω of the Koop-man eigenvalues to vary continuously with the embeddedcoordinates y1 and y2, as shown in Fig. 10. The frequencyω varies smoothly with the radius

√y21 + y22 , from around

−0.95 to−0.4 as the energy is increased. When the dampingrate is also allowed to vary continuously, it remains nearlyconstant around the value of µ = 0, since the system is con-servative. Figure 11 shows the Koopman eigenfunctions inmagnitude and phase coordinates, where it can be seen thatthat magnitude essentially traces level sets of the Hamil-tonian energy. This is consistent with previous theoreticalderivations of Mezic [37], and we thank him for communi-cating this connection to us.

Fluid flow on attractor

For the final example, we consider the nonlinear fluid vor-tex shedding behind a cylinder. We begin by consideringdynamics on the attracting manifold. When we train thenetwork with trajectories on the slow manifold, we are ableto identify a single conjugate eigenfunction pair, shown inFig. 13. The corresponding continuously varying eigenval-ues are shown in Fig. 12, where it can be seen that the fre-quency ω is extremely close to the true constant −1, whilethe damping µ varies significantly, and in fact switches sta-bility for trajectories outside the natural limit cycle. This isconsistent with the data-driven model of Loiseau [32].

12

Page 13: Deep learning for universal linear embeddings of nonlinear ... · Deep learning for universal linear embeddings of nonlinear dynamics Bethany Lusch 1;2, J. Nathan Kutz , and Steven

−0.4−0.95 −0.0002

𝑦1

𝑦2

𝑦1

𝑦2

𝜔 𝜇

0.0002

Figure 10: Eigenvalues for the pendulum vary in terms ofy1 and y2. Note that the frequency decreases as the radiusincreases, and µ ≈ 0.

−𝜋0 0.0044 0.0088 𝜋0

𝜑12 + 𝜑2

2 tan−1 ൗ𝜑1

𝜑2

Figure 11: Magnitude and phase of the pendulum eigen-functions.

Figure 12: Continuous eigenvalues as a function of y1 andy2. Note that the frequency ω ≈ −1. The parameter µ showsgrowth inside the limit cycle (marked in red) and decay out-side the limit cycle.

0 0.086 −𝜋 𝜋0

𝜑12 + 𝜑2

2 tan−1 ൗ𝜑1

𝜑2

Figure 13: Magnitude and phase of the eigenfunctions forthe fluid flow on the attracting slow manifold.

Figure 14: The upper left plot is the average log10 predic-tion error as the number of prediction steps increases for thefluid flow example with trajectories starting off the attrac-tor. The upper right plot shows a trajectory on the attrac-tor in linear coordinates y1 and y2. The bottom row showstwo examples of trajectories that begin off the attractor. TheKoopman model is able to reconstruct both given only theinitial condition.

Fluid flow off attractor

We now consider the case where we train a network us-ing trajectories that start off of the attracting slow manifold.Figure 14 shows the prediction performance of the Koop-man neural network for trajectories that start away fromthe bowl; in both cases, the dynamics are faithfully capturedand the dynamics are propagated forward until the limit cy-cle.

The eigenfunctions are shown in Fig. 15, where it can beseen that the mode shapes match those in the on-attractordata in Fig. 13. The continuously varying eigenvalues areshown in Fig. 16. Again, similar to the on-attractor case,the damping µ varies considerably with radius, while thefrequency is very nearly a constant −1.

Miscellaneous notes

Connection between eDMD and VAC

It has recently been shown that eDMD is equivalent to thevariational approach of conformation dynamics (VAC) [41,42, 43], first derived by Noe and Nuske in 2013 to simulatemolecular dynamics with a broad separation of timescales.Further connections between eDMD and VAC and betweenDMD and the time lagged independent component analy-sis (TICA) are explored in a recent review [24]. A key con-tribution of VAC is a variational score enabling the objec-

13

Page 14: Deep learning for universal linear embeddings of nonlinear ... · Deep learning for universal linear embeddings of nonlinear dynamics Bethany Lusch 1;2, J. Nathan Kutz , and Steven

0 0.14 −𝜋 𝜋0

−0.25 0.250 −0.25 0.250

𝜑1 𝜑2

𝜑12 + 𝜑2

2 tan−1 ൗ𝜑1

𝜑2

Figure 15: Eigenfunctions for the fluid flow example for tra-jectories starting off the attractor, corresponding to the com-plex conjugate pair of eigenvalues; the second row containsthe magnitude and phase of those eigenfunctions.

Figure 16: Parameter variations for the complex eigenvaluesin terms of y1 and y2. Note that this is a natural extension ofFig. 12, which is limited to data on the bowl.

tive assessment of Koopman models via cross-validation.Recently, eDMD has been demonstrated to improve modelpredictive control performance in nonlinear systems [27].

14