ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020....

26
ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND APPLICATIONS TO SYMPLECTIC ACCELERATED OPTIMIZATION VALENTIN DURUISSEAUX, JEREMY SCHMITT, AND MELVIN LEOK Abstract. It is well known that symplectic integrators lose their near energy preservation prop- erties when variable step sizes are used. The most common approach to combine adaptive step sizes and symplectic integrators involves the Poincar´ e transformation of the original Hamiltonian. In this article, we provide a framework for the construction of variational integrators using the Poincar´ e transformation. Since the transformed Hamiltonian is typically degenerate, the use of Hamiltonian variational integrators based on Type II or Type III generating functions is required instead of the more traditional Lagrangian variational integrators based on Type I generating func- tions. Error analysis is provided and numerical tests based on the Taylor variational integrator approach of Schmitt et al. [30] to time-adaptive variational integration of Kepler’s 2-Body prob- lem are presented. Finally, we use our adaptive framework together with the variational approach to accelerated optimization presented in Wibisono et al. [36] to design efficient variational and non-variational explicit integrators for symplectic accelerated optimization. 1. Introduction Symplectic integrators form a class of geometric numerical integrators of interest since, when applied to Hamiltonian systems, they yield discrete approximations of the flow that preserve the symplectic 2-form (see [10]). The preservation of symplecticity results in the preservation of many qualitative aspects of the underlying dynamical system. In particular, when applied to conservative Hamiltonian systems, symplectic integrators show excellent long-time near-energy preservation. However, when symplectic integrators were first used in combination with variable time-steps, the near-energy preservation was lost and the integrators performed poorly (see [5; 8]). Backward error analysis provided justification both for the excellent long-time near-energy preservation of symplectic integrators and for the poor performance experienced when using variable time-steps (see Chapter IX of [10]). Backward error analysis shows that symplectic integrators can be associated with a modified Hamiltonian in the form of a powers series in terms of the time-step. The use of a variable time-step results in a different modified Hamiltonian at every iteration where the time-step is changed, which is the source of the poor energy conservation. There has been a great effort to circumvent this problem, and there have been many successes. However, there has yet to be a unified general framework for constructing adaptive symplectic integrators. In this paper, we contribute to this effort by demonstrating how Hamiltonian variational integrators [18] can be used to systematically construct symplectic integrators that allow for the use of variable time-steps. The use of variable time-steps is motivated by the observation that the global error estimates for a numerical method depend in part on the maximum local truncation error, and this in turn is related to both the step size and the magnitude of the (r + 1)-derivatives of the solution for a r-order numerical method. For a fixed number of time-steps, the maximum local truncation error is minimized if the local truncation error is equidistributed over the time intervals. In turn, this can be achieved if, for example, the time-step is chosen to be an appropriate function of the reciprocal of the relevant derivative of the solution. This derivative can be estimated a posteriori by comparing methods with different orders of accuracy, or methods with the same order of accuracy but different error constants. Alternatively, in the Kepler two-body problem, for example, Kepler’s second law states that the line joining the planet and the Sun sweeps out equal areas during equal intervals of 1

Transcript of ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020....

Page 1: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND

APPLICATIONS TO SYMPLECTIC ACCELERATED OPTIMIZATION

VALENTIN DURUISSEAUX, JEREMY SCHMITT, AND MELVIN LEOK

Abstract. It is well known that symplectic integrators lose their near energy preservation prop-erties when variable step sizes are used. The most common approach to combine adaptive stepsizes and symplectic integrators involves the Poincare transformation of the original Hamiltonian.In this article, we provide a framework for the construction of variational integrators using thePoincare transformation. Since the transformed Hamiltonian is typically degenerate, the use ofHamiltonian variational integrators based on Type II or Type III generating functions is requiredinstead of the more traditional Lagrangian variational integrators based on Type I generating func-tions. Error analysis is provided and numerical tests based on the Taylor variational integratorapproach of Schmitt et al. [30] to time-adaptive variational integration of Kepler’s 2-Body prob-lem are presented. Finally, we use our adaptive framework together with the variational approachto accelerated optimization presented in Wibisono et al. [36] to design efficient variational andnon-variational explicit integrators for symplectic accelerated optimization.

1. Introduction

Symplectic integrators form a class of geometric numerical integrators of interest since, whenapplied to Hamiltonian systems, they yield discrete approximations of the flow that preserve thesymplectic 2-form (see [10]). The preservation of symplecticity results in the preservation of manyqualitative aspects of the underlying dynamical system. In particular, when applied to conservativeHamiltonian systems, symplectic integrators show excellent long-time near-energy preservation.However, when symplectic integrators were first used in combination with variable time-steps, thenear-energy preservation was lost and the integrators performed poorly (see [5; 8]). Backwarderror analysis provided justification both for the excellent long-time near-energy preservation ofsymplectic integrators and for the poor performance experienced when using variable time-steps (seeChapter IX of [10]). Backward error analysis shows that symplectic integrators can be associatedwith a modified Hamiltonian in the form of a powers series in terms of the time-step. The useof a variable time-step results in a different modified Hamiltonian at every iteration where thetime-step is changed, which is the source of the poor energy conservation. There has been a greateffort to circumvent this problem, and there have been many successes. However, there has yet tobe a unified general framework for constructing adaptive symplectic integrators. In this paper, wecontribute to this effort by demonstrating how Hamiltonian variational integrators [18] can be usedto systematically construct symplectic integrators that allow for the use of variable time-steps.

The use of variable time-steps is motivated by the observation that the global error estimatesfor a numerical method depend in part on the maximum local truncation error, and this in turnis related to both the step size and the magnitude of the (r + 1)-derivatives of the solution for ar-order numerical method. For a fixed number of time-steps, the maximum local truncation error isminimized if the local truncation error is equidistributed over the time intervals. In turn, this canbe achieved if, for example, the time-step is chosen to be an appropriate function of the reciprocal ofthe relevant derivative of the solution. This derivative can be estimated a posteriori by comparingmethods with different orders of accuracy, or methods with the same order of accuracy but differenterror constants. Alternatively, in the Kepler two-body problem, for example, Kepler’s second lawstates that the line joining the planet and the Sun sweeps out equal areas during equal intervals of

1

Page 2: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

2 VALENTIN DURUISSEAUX, JEREMY SCHMITT, AND MELVIN LEOK

time, so the angular velocity of the planet is proportional to the reciprocal of the radius squared,which gives an a priori bound. In essence, variable time-steps are chosen to control the errorincurred at each time-step, which in turn affects the global accuracy of the numerical trajectory.

The goal of this paper is to develop an analogue of the methods derived using the framework of[9; 27], but directly in terms of generating functions of symplectic maps. These prior results arebased on symplectic (partitioned) Runge–Kutta methods, which are related to Type I generatingfunctions [34], but we desire an explicit characterization of the flow maps of time-adaptive Hamil-tonian systems so that we can employ the Hamiltonian variational integrator framework instead.

Variational integrators provide a systematic method for constructing symplectic integrators ofarbitrarily high-order based on the discretization of Hamilton’s principle [11; 19], or equivalently,by the approximation of generating functions, but there has not been a systematic attempt to in-corporate time-adaptivity into the setting of variational integrators. This is due to the fact that thePoincare transformed Hamiltonian that is used is in general degenerate, so there is no correspondingLagrangian analogue, which prevents the use of traditional variational integrators that are basedon a Lagrangian formulation of mechanics and involve the construction of a discrete Lagrangianthat approximates a Type I generating function given by Jacobi’s solution of the Hamilton–Jacobiequation. Instead, we propose the use of Hamiltonian variational integrators [18], which are basedon Type II and Type III generating functions that have no difficulty with this degeneracy.

After a brief introduction to variational integrators in Section 2.1, we will review the constructionof Type II and Type III Hamiltonian Taylor variational integrators from [30] and present a newtheorem concerning their order of accuracy in Section 2.2. We will then present a framework forvariable time-step variational integrators in Section 3.1, derive corresponding error analysis resultsin Section 3.2, and test our approach with Hamiltonian Taylor variational integrators on Kepler’s2-Body problem in Section 3.3. Finally, in Section 4, we will design efficient variational and non-variational explicit integrators for symplectic accelerated optimization, using our adaptive approachin the variational framework to accelerated optimization introduced in [36].

2. Hamiltonian Variational Integrators

2.1. Variational Integration. Variational integrators are derived by discretizing Hamilton’s prin-ciple, instead of discretizing Hamilton’s equations directly. As a result, variational integrators aresymplectic, preserve many invariants and momentum maps, and have excellent long-time near-energy preservation (see [19]).

Type I: Traditionally, variational integrators have been designed based on the Type I generatingfunction known as the discrete Lagrangian, Ld : Q × Q 7→ R. The exact discrete Lagrangianof the true flow of Hamilton’s equations can be represented in both a variational form and in aboundary-value form. The latter is given by

LEd (q0, q1;h) =

∫ h

0L(q(t), q(t))dt,

where q(0) = q0, q(h) = q1, and q satisfies the Euler–Lagrange equations over the time interval[0, h]. A variational integrator is defined by constructing an approximation Ld : Q×Q 7→ R to LEd ,and then applying the discrete Euler–Lagrange equations,

pk = −D1Ld(qk, qk+1), pk+1 = D2Ld(qk, qk+1),

which implicitly define the integrator FLd: (qk, pk) 7→ (qk+1, pk+1), where Di denotes a partial

derivative with respect to the i-th argument. The error analysis is greatly simplified via Theorem2.3.1 of [19], which states that if a discrete Lagrangian, Ld : Q × Q → R, approximates the exactdiscrete Lagrangian LEd : Q×Q→ R to order r, i.e.,

Ld(q0, q1;h) = LEd (q0, q1;h) +O(hr+1),

Page 3: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED OPTIMIZATION3

then the discrete Hamiltonian map FLd: (qk, pk) 7→ (qk+1, pk+1), viewed as a one-step method,

has order of accuracy r. Many other properties of the integrator, such as momentum conservationproperties of the method, can be determined by analyzing the associated discrete Lagrangian, asopposed to analyzing the integrator directly.

More recently, variational integrators have been extended to the framework of Type II and TypeIII generating functions, commonly referred to as discrete Hamiltonians (see [15; 18; 29]). Hamil-tonian variational integrators are derived by discretizing Hamilton’s phase space principle.

Type II: The boundary-value formulation of the exact Type II generating function of the time-hflow of Hamilton’s equations is given by the exact discrete right Hamiltonian,

H+,Ed (q0, p1;h) = pT1 q1 −

∫ h

0

[p(t)T q(t)−H(q(t), p(t))

]dt, (2.1)

where (q, p) satisfies Hamilton’s equations with boundary conditions q(0) = q0 and p(h) = p1.A Type II Hamiltonian variational integrator is constructed by using an approximate discreteHamiltonian H+

d , and applying the discrete right Hamilton’s equations,

p0 = D1H+d (q0, p1), q1 = D2H

+d (q0, p1), (2.2)

which implicitly defines the integrator, FH+d

: (q0, p0) 7→ (q1, p1).

Theorem 2.3.1 of [19], which simplified the error analysis for Lagrangian variational integrators,has an analogue for Hamiltonian variational integrators. Theorem 2.2 in [29] states that if a discrete

right Hamiltonian H+d approximates the exact discrete right Hamiltonian H+,E

d to order r, i.e.,

H+d (q0, p1;h) = H+,E

d (q0, p1;h) +O(hr+1),

then the discrete right Hamilton’s map FH+d

: (qk, pk) 7→ (qk+1, pk+1), viewed as a one-step method,

is order r accurate.

Type III: The boundary-value formulation of the exact Type III generating function of the time-hflow of Hamilton’s equations is given by the exact discrete left Hamiltonian,

H−,Ed (q1, p0;h) = −pT0 q0 −∫ h

0

[p(t)T q(t)−H(q(t), p(t))

]dt, (2.3)

where (q, p) satisfies Hamilton’s equations with boundary conditions q(h) = q1 and p(0) = p0. AType III Hamiltonian variational integrator is constructed by using an approximate discrete leftHamiltonian H−d , and applying the discrete left Hamilton’s equations,

p1 = −D1H−d (q1, p0), q0 = −D2H

−d (q1, p0), (2.4)

which implicitly defines the integrator, FH−d

: (q0, p0) 7→ (q1, p1). As mentioned in [29], the proof

of Theorem 2.2 in [29] can be easily adjusted to prove an equivalent theorem for the discrete leftHamiltonian case, which states that if a discrete left Hamiltonian H−d approximates the exact

discrete left Hamiltonian H−,Ed to order r, i.e.,

H−d (q1, p0;h) = H−,Ed (q1, p0;h) +O(hr+1),

then the discrete left Hamilton’s map FH−d

: (qk, pk) 7→ (qk+1, pk+1), viewed as a one-step method,

is order r accurate.

Hamiltonian and Lagrangian variational integrators are not always equivalent. In particular,it was shown in [29] that even when the Hamiltonian and Lagrangian integrators are analyticallyequivalent, they might still have different numerical properties because of numerical conditioning

Page 4: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

4 VALENTIN DURUISSEAUX, JEREMY SCHMITT, AND MELVIN LEOK

issues. Even more to the point, Lagrangian variational integrators cannot always be constructedwhen the underlying Hamiltonian is degenerate, and in that situation, Hamiltonian variationalintegrators are the more natural choice. In Section 3, we will examine a transformation commonlyused to construct variable time-step symplectic integrators, which in most cases of interest resultsin a degenerate Hamiltonian. We will apply Hamiltonian variational integrators to the resultingtransformed Hamiltonian system.

Depending on the form of the Hamiltonian and the method used to design the correspondingapproximate discrete Hamiltonian, one of the Type II or Type III approaches might be moreappropriate or more convenient than the other, in the sense that it might allow for an explicitalgorithm or might allow for higher-order methods given some constraints on the type of methodspermitted. As an example, for the optimization application which will be presented in Section 4,we will prefer Type II Hamiltonian Taylor variational integrators to their type III analogues, andthis choice will be justified carefully based on the order and explicitness of the resulting methods.

Examples of Hamiltonian variational integrators include Galerkin variational integrators (see[18]), Prolongation-Collocation variational integrators (see [17]), and Taylor variational integrators(introduced in [30] and presented in the next section with new results about their order of accuracy).

2.2. Hamiltonian Taylor Variational Integrators (HTVIs). In this section, we present TypeII and Type III Hamiltonian Taylor variational integrators (first introduced in [30]), together witha new theorem concerning their order of accuracy, which is analogous to Theorem 3.1 in [30] fortheir Type I Lagrangian counterpart. A discrete approximate Hamiltonian is constructed by ap-proximating the flow map and the trajectory associated with the boundary values using a Taylormethod, and approximating the integral by a quadrature rule. The Hamiltonian Taylor variationalintegrator is then generated by the discrete Hamilton’s equations associated to that discrete Hamil-

tonian. More explicitly, we first construct the r-order and (r + 1)-order Taylor methods Ψ(r)h and

Ψ(r+1)h approximating the exact time-h flow map Φh : T ∗Q → T ∗Q. Let πT ∗Q : (q, p) 7→ p and

πQ : (q, p) 7→ q. Given a quadrature rule of order s with weights and nodes (bi, ci) for i = 1, ...,m,the Type II and Type III integrators are then constructed as follows:

Type II HTVI Type III HTVIApproximate p(0) by the solution p0 of Approximate q(0) by the solution q0 of

p1 = πT ∗Q ◦Ψ(r)h (q0, p0). q1 = πQ ◦Ψ

(r+1)h (q0, p0).

Approximate (q(cih), p(cih)) via Approximate (q(cih), p(cih)) via

(qci , pci) = Ψ(r)cih

(q0, p0). (qci , pci) = Ψ(r)cih

(q0, p0).

Approximate q1 via

q1 = πQ ◦Ψ(r+1)h (q0, p0).

Continuous Legendre transform: qci = ∂H∂pci

. Continuous Legendre transform: qci = ∂H∂pci

.

Then, the quadrature rule gives Then, the quadrature rule givesH+

d (q0, p1) = pT1 q1 − h∑m

i=1 bi[pTci qci −H(qci , pci)

]H−

d (q1, p0) = −pT0 q0 − h∑m

i=1 bi[pTci qci −H(qci , pci)

]The discrete right Hamilton’s equations (2.2) The discrete left Hamilton’s equations (2.4)then define the variational integrator. then define the variational integrator.

We now present a theorem specifying the order of accuracy of the resulting variational integrators:

Theorem 2.1. If the Hamiltonian H and ∂H∂p are Lipschitz continuous in both variables, then

the discrete Hamiltonian H±d (q0, p1;h), obtained using the above construction, approximates H±,Edwith at least order of accuracy min (r + 1, s). By Theorem 2.2 in [29] (or an analogue for the leftHamiltonian case), the associated discrete Hamiltonian map has the same order of accuracy.

Proof. See Appendix A. �

Page 5: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED OPTIMIZATION5

3. Adaptive Integrators and Variational Error Analysis

3.1. The Poincare Transformation and Discrete Hamiltonians. Given a Hamiltonian, H(q, p),and a desired transformation of time t 7→ τ , described by

dt

dτ= g(q, p),

where g is a monitor function, a new Hamiltonian system is constructed using the Poincare trans-formation,

H(q, p) = g(q, p)(H(q, p) + pt

), (3.1)

where q =

[qqt

]and p =

[ppt

]. We will make the common choice qt = t and pt = −H(q(0), p(0)),

so that H(q, p) = 0 along all integral curves through (q(0), p(0)). The time t shall be referred to asthe physical time, while τ will be referred to as the fictive time. In general, along an integral curvethrough (q(0), p(0)),

∂2H

∂p2=

[∂H∂p ∇pg(q, p)T + g(q, p)∂

2H∂p2 +∇pg(q, p)∂H∂p

T ∇pg(q, p)

∇pg(q, p)T 0

], (3.2)

which can be singular for many initial Hamiltonians H and choices of monitor function g.Most of the prior literature on variable time-step symplectic integrators cited in this paper focus

exclusively on monitor functions that are only a function of position, in which case ∂2H∂p2 is singular,

and the associated Legendre transformation, FH : T ∗Q → TQ is non-invertible, which is to saythat the resulting transformed Hamiltonian is degenerate and there is no corresponding Lagrangianformulation. Therefore, the Type II and Type III Hamiltonian variational integrator frameworksare the most general and natural way to derive variable time-step variational integrators.

The exact Type II generating function for the transformed Hamiltonian is given by

H+,Ed (q0, p1;h) = pT1 q1 −

∫ h

0

(p(τ)T ˙q(τ)− H(q(τ), p(τ))

)dτ,

where (q(τ), p(τ)) satisfy the Hamilton’s equations corresponding to the Poincare transformedHamiltonian H, with boundary conditions q(0) = q0 and p(h) = p1. This exact discrete rightHamiltonian implicitly defines a symplectic map with respect to the symplectic form ω(pk, qk) onT ∗Q via the discrete Legendre transforms given by

p0 =∂H+,E

d

∂q0, q1 =

∂H+,Ed

∂p1.

Similarly, the exact Type III generating function for the transformed Hamiltonian is given by

H−,Ed (q0, p1;h) = −pT0 q0 −∫ h

0

(p(τ)T ˙q(τ)− H(q(τ), p(τ))

)dτ,

where (q(τ), p(τ)) satisfy the Hamilton’s equations corresponding to the Poincare transformedHamiltonian H with boundary conditions q(h) = q1 and p(0) = p0. This exact discrete leftHamiltonian implicitly defines a symplectic map with respect to the symplectic form ω(pk, qk) onT ∗Q via the discrete Legendre transforms given by

p1 = −∂H−,Ed

∂q1, q0 = −

∂H−,Ed

∂p0.

Our approach is to construct Hamiltonian variational integrators using a discrete Hamiltonian

H±d that approximates the corresponding exact discrete Hamiltonian H±,Ed to order r. The resultingintegrator will be symplectic with constant step size in fictive time τ and more importantly withthe desired variable step size in physical time t via dt

dτ = g(q, t, p). It is important to note that

Page 6: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

6 VALENTIN DURUISSEAUX, JEREMY SCHMITT, AND MELVIN LEOK

this method will be symplectic in two different ways. It will be symplectic both with respect tothe symplectic form dp ∧ dq and with respect to the symplectic form dp ∧ dq. Since pt = 0, pt isconstant and dptk ∧ dqtk = 0, the symplectic form in generalized coordinates is given by

ω(pk, qk) = dpk ∧ dqk =n+1∑i=1

dpk,i ∧ dqk,i =n∑i=1

dpk,i ∧ dqk,i = ω(pk, qk).

A symplectic variable time-step method was proposed independently in [9] and [27], which ap-plied a symplectic integrator to the Poincare transformed Hamiltonian. In [9], it is noted that oneof the first applications of the Poincare transformation was by Levi-Civita, who applied it to thethree-body problem. A more in-depth discussion of such time transformations can be found in [32].Further work using this type of transformation has been published, such as [2; 3] which focused ondeveloping symplectic, explicit, splitting methods with variable time-steps.

The novelty of our approach consists in discretizing the Type II or Type III generating functionfor the flow of Hamilton’s equations, where the Hamiltonian is given by the Poincare transformation.Therefore, we are constructing variational integrators, and in particular Hamiltonian variationalintegrators (see [15; 18]). This approach works seamlessly with existing methods and theoremsof Hamiltonian variational integrators, but now the system under consideration is the transformedHamiltonian system resulting from the Poincare transformation. The methods of [9; 27] include thepossibility of applying a given variational integrator to the transformed differential equations. Ourapproach gives a framework for constructing variational integrators at the level of the generatingfunction by using the Poincare transformed discrete right Hamiltonian. In most cases, these twoapproaches will produce equivalent integrators, but our new approach allows for the method to beanalyzed at the level of the generating function, and indicates that most such symplectic methodsare best interpreted as coming from a Type II or Type III Hamiltonian generating function, asopposed to a Type I Lagrangian generating function.

Remark. Other approaches to variable time-step variational integrators can be found in [14; 20;21]. In particular, [14] is inspired by the result of [7], which states that constant time-step symplecticintegrators of autonomous Hamiltonian systems cannot exactly conserve the energy unless it agreeswith the exact flow map up to a time reparametrization. Therefore, they sought a variable time-step energy-conserving symplectic integrator in an expanded non-autonomous system. However,symplecticity is with respect to the space-time symplectic form dp ∧ dq + dH ∧ dt. The time-stepis determined by enforcing discrete energy conservation, which arises as a consequence of the factthat energy is the Noether quantity associated with time translational symmetry. An extendedHamiltonian is used, similar in spirit to the Poincare transformation. An approach that builds offthis idea and space-time symplecticity was presented in [21], and a less constrained choice of time-step was allowed. In [20], adaptive variational integrators are constructed using a transformationof the Lagrangian, which is motivated by the Poincare transformation, but it is not equivalent. Thelack of equivalence is not surprising, since the Poincare transformed Hamiltonian is degenerate fortheir choice of monitor functions. As a consequence, the phase space path is not preserved.

Note that our framework can be extended to the case where the original Hamiltonian H andthe chosen monitor function g depend explicitly on time t (inspired by [9]). Given a HamiltonianH(q, t, p), consider a desired transformation of time t 7→ τ , given by dt

dτ = g(q, t, p). The, we can

define q =

[qqt

]where qt = t and p =

[ppt

]where pt is the conjugate momentum for qt = t

with pt(0) = −H(q(0), 0, p(0)). Consider the new Hamiltonian system given by the Poincaretransformation

H(q, p) = g(q, qt, p)(H(q, qt, p) + pt

). (3.3)

Page 7: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED OPTIMIZATION7

The corresponding equations of motion in the extended phase space are then given by

˙q =∂H

∂p, ˙p = −∂H

∂q.

Suppose (Q(τ), P (τ)) are solutions to these extended equations of motion, and let (q(t), p(t)) solveHamilton’s equations for the original Hamiltonian H. Then

H(Q(τ), P (τ)) = H(Q(0), P (0)) = 0.

Thus, the components (Q(τ), P (τ)) in the original phase space of the solutions (Q(τ), P (τ)) satisfy

H(Q(τ), τ, P (τ)) = −pt(τ), H(Q(0), 0, P (0)) = −pt(0) = H(q(0), 0, p(0)).

Then, (Q(τ), P (τ)) and (q(t), p(t)) both satisfy Hamilton’s equations for the original HamiltonianH with the same initial values, so they must be the same. As before,

∂2H

∂p2=

[∂H∂p ∇pg(q, p)T + g(q, p)∂

2H∂p2 +∇pg(q, p)∂H∂p

T ∇pg(q, p)

∇pg(q, p)T 0

],

will be singular in many cases, so Hamiltonian variational integrators are the most general andnatural way to derive variable time-step variational integrators.

3.2. Variational Error Analysis. The standard error analysis for Hamiltonian variational inte-

grators assumes a non-degenerate Hamiltonian, i.e., det(∂2H∂p2

)6= 0 (see [29]), which might not be

the case for the Poincare transformed Hamiltonian. The non-degeneracy of the Hamiltonian en-sures that we can apply the usual implicit function theorem to the discrete Hamilton’s equations,and the proof of the standard error analysis theorem relies upon Lemma 2.3 of [29]:

Lemma 3.1. Let f1, g1, e1, f2, g2, e2 ∈ Cr (r-times continuously differentiable) be such that

f1(x, h) = g1(x, h) + hr+1e1(x, h), f2(x, h) = g2(x, h) + hr+1e2(x, h).

Then, there exists functions e12 and e1 bounded on compact sets such that

f2(f1(x, h), h) = g2(g1(x, h), h) + hr+1e12(g1(x, h), h),

f−11 (y) = g−1

1 (y) + hr+1e1(y).

Given a discrete HamiltonianH±d , we introduce the discrete fiber derivatives (or discrete Legendre

transforms), F±H±d ,

F+H+d (q0, p1) : (q0, p1) 7→ (D2H

+d (q0, p1), p1), F+H−d (q1, p0) : (q1, p0) 7→ (q1,−D1H

−d (q1, p0)),

F−H+d (q0, p1) : (q0, p1) 7→ (q0, D1H

+d (q0, p1)), F−H−d (q1, p0) : (q1, p0) 7→ (−D2H

−d (q1, p0), p0).

We observe that the following diagrams commute,

(q0, p0)FH+d // (q1, p1) (q0, p0)

FH−d // (q1, p1)

(q0, p1)

F−H+d

]]

F+H+d

AA

(q1, p0)

F−H−d

]]

F+H−d

AA

As such, the discrete left and right Hamiltonian maps can be expressed in terms of the discretefiber derivatives,

FH±d

(q0, p0) = F+H±d ◦ (F−H±d )−1(q0, p0) = (q1, p1),

and this observation together with Lemma 3.1 ensures that the order of accuracy of the integratoris at least of the order to which the discrete Hamiltonian H±d approximates the exact discrete

Hamiltonian H±,Ed .

Page 8: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

8 VALENTIN DURUISSEAUX, JEREMY SCHMITT, AND MELVIN LEOK

However, the Poincare transformed Hamiltonian might be degenerate so we cannot apply theusual implicit function theorem, and need to establish the invertibility of F−H±d in a different way.

The strongest general result we have been able to establish involves the case where the originalHamiltonian is autonomous, i.e., H = H(q, p), and nondegenerate, and the monitor function isautonomous as well. These assumptions hold for an interesting and useful class of problems, andwe will show that the exact discrete left and right Hamiltonians can be reduced to a particularform and that the extended variables pt1 and qt1 can be solved for explicitly. As a result, theimplicit function theorem is not needed with respect to these variables. Hamilton’s equations ofthe transformed Hamiltonian H(q, p) = g(q, p)

(H(q, p) + pt

)are

˙q =

[∇pg(q, p)(H(q, p) + pt) + ∂H

∂p g(q, p)

g(q, p)

], ˙p = −

[∇qg(q, p)(H(q, p) + pt) + ∂H

∂q g(q, p)

0

].

Using these equations, the corresponding exact discrete Hamiltonians are of the form

H+,Ed (q0, p1;h) = pT1 q1 + pt1q

t1 −

∫ h

0

(p(τ)T q(τ)− g(q(τ), p(τ))H(q(τ), p(τ))

)dτ,

H−,Ed (q1, p0;h) = −pT0 q0 − pt0qt0 −∫ h

0

(p(τ)T q(τ)− g(q(τ), p(τ))H(q(τ), p(τ))

)dτ.

As a result, only one part of these exact discrete left and right Hamiltonians requires approxi-mations of the extended variable qt and pt. Furthermore, pt = 0 implies that pt1 = pt0.

Now, let H±d be approximations to the exact discrete left and right Hamiltonians of the form

H+d (q0, p1;h) = pT1 q1(q0, p1;h) + pt1q

t1(qt0, q0, p1;h)− I1(q0, p1;h),

H−d (q1, p0;h) = −pT0 q0(q1, p0;h)− pt0qt0(qt1, q1, p0;h)− I2(q1, p0;h),

where · denotes an approximation and where I1(q0, p1;h) and I2(q1, p0;h) both approximate∫ h

0

(p(τ)T q(τ)− g(q(τ), p(τ))H(q(τ), p(τ))

)dτ.

Then, the discrete right Legendre transforms give the following relations for pt1 and qt1,[p0

pt0

]=

∂q1∂q0

Tp1 + pt1

∂qt1∂q0− ∂I1

∂q0∂qt1∂qt0pt1

, [q1

qt1

]=

[q1 + ∂q1

∂p1

Tp1 + ∂q1

∂p1

Tpt1 − ∂I1

∂p1

qt1

].

Now, since the analytic solution satisfies pt1 = pt0, there is no need to approximate pt1. There-

fore,∂qt1∂qt0

= 1. The resulting two systems can both be solved by first setting pt1 = pt0, then

implicitly solving for p1 in terms of (qt0, q0, pt1, p1), explicitly solving for q1 and finally explicitly

solving for qt1. Since p1 is not determined by qt1, the implicit function theorem is simply needed

for finding p1. Therefore, we need det(∂2H∂p2

)6= 0, and from equation (3.2), this is the same as

det(∂H∂p ∇pg(q, p)T + g(q, p)∂

2H∂p2 +∇pg(q, p)∂H∂p

T)6= 0. Note this holds for nondegenerate Hamil-

tonians H and p-independent monitor functions.Similarly, the discrete left Legendre transforms give the following relations for pt1 and qt1,[

p1

pt1

]=

∂q0∂q1

Tp0 + pt0

∂qt0∂q1

+ ∂I2∂q1

∂qt0∂qt1pt0

, [q0

qt0

]=

[q0 + ∂q0

∂p0

Tp0 + ∂q0

∂p0

Tpt0 + ∂I2

∂p0

qt0

],

which can be solved provided det(∂H∂p ∇pg(q, p)T + g(q, p)∂

2H∂p2 +∇pg(q, p)∂H∂p

T)6= 0.

The results that we have established are summarized in the following theorem:

Page 9: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED OPTIMIZATION9

Theorem 3.1. Given a nondegenerate Hamiltonian H, and a monitor function g ∈ C1([0, h]),

such that det(∂H∂p ∇pg(q, p)T + g(q, p)∂

2H∂p2 +∇pg(q, p)∂H∂p

T)6= 0. Then, if the discrete Hamiltonian

H±d , approximates the exact discrete Hamiltonian H±,Ed to order r, i.e.,

H±d (q0, p1;h) = H±,Ed (q0, p1;h) +O(hr+1),

then the discrete Hamiltonian map FH±d

: (qk, pk) 7→ (qk+1, pk+1), viewed as one-step method, is

order r accurate.

Remark 3.1. It should be noted that the assumptions that the original Hamiltonian is nonde-generate and autonomous fail to hold in the application we consider of time-adaptive variationalintegrators to the discretization of the Bregman Hamiltonian associated with accelerated optimiza-tion, as it is time-dependent. This is unavoidable, as it models a system with dissipation, whichcannot be described with an autonomous Hamiltonian as the Hamiltonian would otherwise be anintegral of motion as it is the Noether quantity associated with time translational symmetry. Inthe cases when the original Hamiltonian is degenerate or nonautonomous, we need to analyze thesolvability of the discrete Hamiltonian equations on a case-by-case basis, but as we demonstrate,this can be done in the case of the Bregman Hamiltonian with the given choices of monitor functiong(t) and discrete Hamiltonians that we consider.

3.3. Numerical Tests on Kepler’s Planar 2-Body Problem. We will now demonstrate theapproach using Hamiltonian Taylor variational integrators, presented in Section 2.2, on Kepler’splanar 2-Body Problem. For a lucid exposition, we will at first assume g(q, p) = g(q) and H(q, p) =12pTM−1p+ V (q). Consider the discrete right Hamiltonian given by approximating q1 with a first-

order Taylor method about q0, approximating p0 with a zeroth-order Taylor expansion about p0,and using the rectangular quadrature rule about the initial point:

H+d = pT1

(q0 +

1

2hg(q0)M−1p1

)+ pt1(qt0 + hg(q0)) + hg(q0)V (q0).

The corresponding variational integrator is given by

p1 =

[p0 − hg(q0)∇V (q0)− h∇g(q0)

(12pT1 M

−1p1 + V (q0) + pt0)

pt0

],

q1 =

[q0 + hg(q0)M−1p1

qt0 + hg(q0)

].

(3.4)

This integrator is merely symplectic Euler-B applied to the transformed Hamiltonian system

q1 = q0 + h∂H(q0, p1)

∂p, p1 = p0 − h

∂H(q0, p1)

∂q.

In fact, this is precisely the adaptive symplectic integrator first proposed in [9] and also presented onpage 254 of [16]. Most existing symplectic integrators can be interpreted as variational integrators,but there are also new methods that are most naturally derived as variational integrators. We willalso consider a fourth-order Hamiltonian Taylor variational integrator (HTVI), which is distinctfrom any existing symplectic method.

One of the most important aspects of implementing a variable time-step symplectic integratorof this form is a well chosen monitor function, g(q). We need g to be positive-definite, so that wenever stall or march backward in time. Noting that the above integrator is first-order, a natural

choice is to use the second-order truncation error given by − (qt1−qt0)2

2 M−1∇V (q0). Let tol be somedesired level of accuracy, then one choice for g would be,

g(q0) =tol

‖ (qt1−qt0)2

2 g(q0)M−1∇V (q0)‖=

tol

‖h2g(q0)3

2 M−1∇V (q0)‖,

Page 10: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

10 VALENTIN DURUISSEAUX, JEREMY SCHMITT, AND MELVIN LEOK

since qt1 − qt0 = hg(q0). Thus

g(q0) =

(tol

‖h2

2 M−1∇V (q0)‖

) 14

.

Experimentally, the 4th-root did not affect results very much, but required messier computations,which is the reason why we have chosen the simpler yet very similar monitor function

g(q0) =tol

‖h2

2 M−1∇V (q0)‖

, (3.5)

which achieves an error which is comparable to the chosen value of tol.Alternative choices of g, proposed in [9], include the p-independent arclength parameterization

g(q) = (2(H0 − V (q)) +∇V (q)TM−1∇V (q))−12 , (3.6)

and a choice particular to Kepler’s two-body problem,

g(q) = qT q, (3.7)

which is motivated by Kepler’s second law, which states that a line segment joining the two bodiessweeps out equal areas during equal intervals of time.

We have tested the algorithm given by equation (3.4) on Kepler’s planar two-body problem,with an eccentricity of 0.9, using the three choices of monitor function g given by (3.5), (3.6),and (3.7). Of these three choices, (3.7) is specific to Kepler’s two-body problem, while (3.5) and(3.6) are more general choices. Unlike (3.6) which is independent of the order of the method, (3.5)is based on the truncation error and thus the corresponding cost of computing this function willincrease as the order of the method increases. Simulations using Kepler’s two-body problem withan eccentricity of 0.9 over a time interval of [0, 1000] were run using the three different choicesof g and the usual symplectic Euler-B. Results indicate that symplectic Euler-B takes the moststeps and computational time to achieve a level of accuracy around 10−5. To achieve this level ofaccuracy, the choice of the truncation error monitor function, (3.5), resulted in the least number ofsteps, and the second lowest computational time. The lowest computational time belonged to (3.7),but it used significantly more steps than (3.5). The lower computational cost can be attributed tothe cheaper evaluation cost of the monitor function and its derivative. Finally, the monitor function(3.6) required the most steps and computational time of the adaptive algorithms, but it is still agood choice in general given its broad applicability. See Figures 1 and 2, which present the energyand angular momentum errors for the fixed timestep method vs. adaptive timestep method, andthe step sizes for the different monitor functions, respectively.

Next, we consider a Type II fourth-order Hamiltonian Taylor variational integrator constructedusing the strategy from Section 2.2. We will now drop the assumption of p-independent monitorfunctions and consider g(q, p). The following monitor functions were considered,

g(q) =(qT q)γ

for γ =1

2, 1, (Gamma) (3.8)

g(q) =(2(H0 − V (q)) +∇V (q)TM−1∇V (q)

)− 12 , (Arclength) (3.9)

g(q, p) = ‖pt − L(q,M−1p)‖−12 . (Energy) (3.10)

The monitor function (3.10) was originally intended to be ‖pt+H(q, p)‖−12 , but experimental results

suggested that (3.10) is the better choice. We will discuss the shortcomings of using the inverseenergy error in the next paragraph. Note that ‖L(q,M−1p)‖−1

2 also performs decently, but theaddition of pt = −H(q0, p0) showed noticeable improvement. It was noted in [9] that the inverseLagrangian has been considered as a possible choice for g in the Poincare transformation, but not inthe framework of symplectic integration. While the choice of (3.8) was generally the most efficient,(3.10) was very close in terms of efficiency and offers a more general monitor function. This also

Page 11: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED OPTIMIZATION11

0 10 20 30 40 50 60 70 80 90 100

0

2

4

Energy Error

10-4

0 10 20 30 40 50 60 70 80 90 100

0

2

4

6

Angular Momentum Error

10-14

(a) Symplectic Euler-B

0 10 20 30 40 50 60 70 80 90 100

0

1

2

Energy Error

10-4

0 10 20 30 40 50 60 70 80 90 100

0

0.5

1

Angular Momentum Error

10-13

(b) Adaptive method with monitor function (3.5)

Figure 1. Symplectic Euler-B and the adaptive algorithm with monitor functiongiven by equation (3.5) were applied to Kepler’s planar two-body problem over atime interval of [0, 100] with an eccentricity of 0.9.

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

Step 106

0

2

4

6

Step Size

10-4

0.5 1 1.5 2 2.5 3 3.5 4 4.5Step

106

0

2

4

6

Step Size

10-4

1 2 3 4 5 6 7 8 9 10 11Step

105

0

2

4

6

Step Size

10-4

Figure 2. Time-steps taken for the various choices of monitor functions. The top,middle and bottom plots correspond to (3.5), (3.6), and (3.7), respectively. Allof the monitor functions appear to increase and decrease the time-step at the samepoints along the trajectory, but clearly (3.5) allowed for the larger steps to be taken.

implies that efficiency is not limited to only q or p-independent monitor functions. However, variousattempts to construct separable transformed Hamiltonians (see [2; 3]) required the use of q or p-independent monitor functions, so this is where such monitor functions are most useful. Figure3 displays the time-steps taken for the different choices of monitor functions for this fourth-orderHamiltonian Taylor variational integrator.

The truncation error monitor function, (3.5), performed quite well for first-order methods, andthis motivated the choice of using Taylor variational integrators, since derivatives would be readilyavailable. However, its success cannot as easily be applied to higher-order methods. This is dueto the fact that for higher-order truncation errors, one obtains an implicit differential-algebraicdefinition of the monitor function. This deviates from the first-order case, where the monitorfunction can be solved for explicitly. Another seemingly natural choice for the monitor function isthe inverse of the energy error. However, Taylor variational integrators are constructed using Taylor

Page 12: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

12 VALENTIN DURUISSEAUX, JEREMY SCHMITT, AND MELVIN LEOK

0 1 2 3 4 5 6 7 8 9 10

Step

0

0.05

0.1

0.15

0.2

0.25

Step Size

Energy

Gamma

Arclength

Figure 3. Time-steps taken for the various choices of monitor functions. Theenergy (3.10) and gamma (3.8) monitor functions performed better, in terms offewest steps, lowest computational cost and lowest global error, than the arclengthmonitor function (3.9). Note (3.10) did not take the largest nor the smallest steps.

expansions about the initial point, and consequently the monitor function is mostly evaluated ator near the initial point. If the initial point is at a particularly tricky part of the dynamics andrequires a small first step, then the energy error at the first step will not reflect this, since initiallythe energy error is zero. In contrast, the inverse Lagrangian will be small at an initial point thatrequires a small first step. The inverse energy error may work well for methods that primarilyevaluate the energy error at the end point rather than the initial point.

It is also often advantageous to bound the time-step below or above. As noted in [16, page 248],

this can be done by setting a = ∆tmin∆τ and b = ∆tmax

∆τ , and defining the new monitor function as,

g = bg + a

g + b. (3.11)

Tables 1 and 2 display a comparison of bounds, computational time, steps, and error. We can notethat for methods such as the Taylor variational integrator, bounding g(q, p) does bound the step-size, but not directly. Also, compared to non-adaptive variational integrators, the adaptive methodsshowed a significant gain in efficiency for Kepler’s 2-body planar problem with high eccentricity,while low eccentricity models do not need nor do they benefit from adaptivity. A Hamiltoniandynamical system with regions of high curvature in the vector field and its norm will in generalbenefit from an adaptive scheme such as the one outlined here.

Method Monitor g(q, p) h min Step max Step min g max g Energy Error Global Error Steps TimeHTVI4 Gamma 0.1 0.0020 0.2493 0.01 8 1.43E-05 7.09E-06 181 26.9HTVI4 Energy 0.1 0.0051 0.1809 0.0001 2 1.93E-06 4.76E-06 146 28.3HTVI4 Arclength 0.1 0.0040 0.1458 0.003 0.3 1.10E-04 3.69E-05 185 70.2HTVI4 - 0.0025 0.0025 0.0025 - - 2.50E-06 2.89E-05 4000 120

Table 1. Kepler Planar 2-Body Problem, Eccentricity = 0.9

Method Monitor g(q, p) h min Step max Step min g max g Energy Error Global Error Steps TimeHTVI4 Gamma 0.1 0.00006 0.2648 0.0005 8 4.88E-05 5.60E-06 372 49.3HTVI4 Energy 0.03 0.00015 0.1462 1E-6 5 9.13E-06 4.63E-06 383 58.4HTVI4 Arclength 0.1 0.00005 0.1379 0.0008 10 1.31E-05 1.49E-05 691 146.0HTVI4 - 0.0005 0.0005 0.0005 - - 1.38E-01 7.83E-01 20000 525.2

SV - 5E-7 5E-7 5E-7 - - 3.34E-06 2.68E-05 2E7 189.2

Table 2. Kepler Planar 2-Body Problem, Eccentricity = 0.99

Page 13: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED OPTIMIZATION13

4. Application to Symplectic Accelerated Optimization

4.1. Accelerated Optimization. Efficient optimization has become one of the major concerns indata analysis. Many machine learning algorithms are designed around the minimization of a lossfunction or the maximization of a likelihood function. Due to the ever-growing scale of the datasets and size of the problems, there has been a lot of focus on first-order optimization algorithmsbecause of their low cost per iteration. The first gradient descent algorithm was proposed in [6]by Cauchy to deal with the very large systems of equations he was facing when trying to simulateorbits of celestial bodies, and many gradient-based optimization methods have been proposed sinceCauchy’s work in 1847. In 1983, Nesterov’s Accelerated Gradient method was introduced in [23],

xk = yk−1 − h∇f(yk−1), yk = xk +k − 1

k + 2(xk − xk−1), (4.1)

which converges in O(1/k2) to the minimum of the convex objective function f , improving on theO(1/k) convergence rate exhibited by the standard gradient descent methods. This O(1/k2) conver-gence rate was shown in [24] to be optimal among first-order methods using only information about∇f at consecutive iterates. This phenomenon in which an algorithm displays this improved rate ofconvergence is referred to as acceleration, and other accelerated algorithms have been derived sinceNesterov’s algorithm, such as accelerated mirror descent [22], and accelerated cubic-regularizedNewton’s method [25]. More recently, it was shown in [33] that Nesterov’s Accelerated Gradientmethod limits to the second order ODE,

x(t) +3

tx(t) +∇f(x(t)) = 0,

as h→ 0. The authors also proved that the objective function f(x(t)) converges to its optimal valueat a rate of O(1/t2) along the trajectories of this ODE. It was then shown in [36] that in continuoustime, the convergence rate of f(x(t)) can be accelerated to an arbitrary convergence rate O(1/tp),by considering flow maps generated by time-dependent Lagrangian and Hamiltonian systems. Wewill present this result in more detail in the next section together with the variational frameworkintroduced in [36] for accelerated optimization, which will be at the heart of our approach.

4.2. Variational Framework for Accelerated Optimization. In this section, we will reviewthe variational framework introduced in [36] for accelerated optimization which will be the basisfor the methods we will design. In a general space X , given a convex, continuously differentiablefunction h : X → R such that ‖∇h(x)‖ → ∞ as ‖x‖ → ∞, its corresponding Bregman divergenceis given by

Dh(x, y) = h(y)− h(x)− 〈∇h(x), y − x〉.We then define the Bregman Lagrangian and Hamiltonian

Lα,β,γ(x, v, t) = eα(t)+γ(t)[Dh(x+ e−α(t)v, x)− eβ(t)f(x)

], (4.2)

Hα,β,γ(x, r, t) = eα(t)+γ(t)[Dh∗(∇h(x) + e−γ(t)r,∇h(x)) + eβ(t)f(x)

], (4.3)

which are scalar valued functions of position x ∈ X , velocity v ∈ Rd, momentum r ∈ Rd, and time t,which are parametrized by smooth functions of time, α, β, γ, and where h∗ = supv∈TX [〈r, v〉 − h(v)]is the Legendre transform (or convex dual function) of h. These parameters α, β, γ are said to satisfythe ideal scaling conditions if

β(t) ≤ eα(t) and γ(t) = eα(t). (4.4)

A very important property of this family of Bregman Lagrangian is its closure under time dilation,proven in [36]:

Page 14: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

14 VALENTIN DURUISSEAUX, JEREMY SCHMITT, AND MELVIN LEOK

Theorem 4.1. If x(t) satisfies the Euler-Lagrange equations corresponding to the Bregman La-grangian Lα,β,γ, then the reparametrized curve y(t) = x(τ(t)) satisfies the Euler-Lagrange equationscorresponding to the Bregman Lagrangian Lα,β,γ where

α(t) = α(τ(t)) + log τ(t), β(t) = β(τ(t)), γ(t) = γ(τ(t)), (4.5)

and

Lα,β,γ(x, v, t) = τ(t)Lα,β,γ(x,

1

τ(t)v, τ(t)

). (4.6)

Furthermore α, β, γ satisfy the ideal scaling equation (4.4) if and only if α, β, γ do.

A subfamily of Bregman Lagrangians of interest, indexed by a parameter p > 0, is given by thechoice of functions

α(t) = log p− log t, β(t) = p log t+ logC, γ(t) = p log t, (4.7)

where C > 0 is a constant. The Bregman Lagrangian and Hamiltonian become

L(x, v, t) = ptp−1

[Dh

(x+

t

pv, x

)− Ctpf(x)

],

H(x, r, t) = ptp−1 [Dh∗(∇h(x) + tpr,∇h(x)) + Ctpf(x)] .

These parameter functions are of interest since they satisfy the ideal scaling equation (4.4), andthe resulting evolution x(t) of the corresponding dynamical system was shown in [36] to satisfy theaforementioned O(1/tp) convergence rate,

f(x(t))− f(x∗) ≤ O(1/tp),

where x∗ is the desired minimizer of the objective function f .

4.3. Adaptive Variational Integrators for Symplectic Accelerated Optimization. Forsimplicity of exposition, we will consider the case where h(x) = 1

2〈x, x〉. Our new approacheswill make use of the adaptive framework developed in Section 3 via carefully chosen Poincaretransformations. Recalling the discussion of Section 3.1, there might not be a Lagrangian formu-lation for the future Poincare transformed systems, so we will need to work from the Hamiltonianpoint of view to design variational integrators. When h(x) = 1

2〈x, x〉, the Bregman Hamiltonianwith parameters α, β, γ given by equation (4.7) for a specific value of p > 0 becomes

H(q, r, t) =p

2tp+1〈r, r〉+ Cpt2p−1f(q). (4.8)

As mentioned in the previous section, the solution to the corresponding Hamilton’s equations wasshown in [36] to satisfy the convergence rate

f(q(t))− f(q∗) ≤ O(1/tp),

where q∗ is the desired minimizer of the objective function f . Together with the time-dilationresult from Theorem 4.1, this implies that this entire subfamily of Bregman trajectories indexedby the parameter p can be obtained by speeding up or slowing down along the Bregman curve inspacetime corresponding to any specific value of p. We will present two new approaches based onthe adaptive framework developed in Section 3 to integrate the Bregman Hamiltonian dynamics,thereby solving the optimization problem, and then compare their performance.

Direct Approach: Our first approach will use our adaptive framework with monitor functiong(q, r) = 1 to design a variational integrator for the Bregman Hamiltonian given in equation (4.8)for a given value of p > 0. This choice of monitor function will convert the time-dependent BregmanHamiltonian into an autonomous Hamiltonian in extended phase space. More precisely, given a

Page 15: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED OPTIMIZATION15

value of p > 0, the time transformation t 7→ τ given by dtdτ = g(q, t, r) = 1 generates the Poincare

transformed Hamiltonian

H(q, r) =p

2(qt)p+1〈r, r〉+ Cp(qt)2p−1f(q) + rt, (4.9)

in the phase space with extended coordinates (q, r). This strategy is equivalent to the usual trickto remove time-dependency by considering time t as an additional position variable and addinga corresponding conjugate momentum variable, which is the energy (see [1] for an example withHamiltonian given by equation (4.3)). This shows that our adaptive framework is very general andcan also be used for other purposes than solely enforcing a desired variable time stepping.

Adaptive Approach: Our second approach will exploit the time-dilation property of the Breg-man dynamics together with our adaptive framework with a carefully tuned monitor function.More precisely, we will use adaptivity to transform the Bregman Hamiltonian corresponding to aspecific value of p > 0 into an autonomous version of the Bregman Hamiltonian corresponding toa smaller value p < p in extended phase-space. This will allow us to integrate the higher-orderBregman dynamics corresponding to the value p while benefiting from the computational efficiencyof integrating the lower-order Bregman dynamics corresponding to the value p < p. Explicitly,solving equation (4.5) for τ(t) to transform the Bregman dynamics corresponding to the values ofα, β, γ as in equation (4.7) for a given value of p into the Bregman dynamics corresponding to the

values of α, β, γ as in equation (4.7) for a given value p < p yields τ(t) = tp/p. The correspondingmonitor function is given by

dt

dτ= g(q, t, r) =

p

pt1− p

p ,

and generates the Poincare transformed Hamiltonian

H(q, r) =1

p

[p2

2(qt)p+ p

p

〈r, r〉+ Cp2(qt)2p− p

p f(q) + prt(qt)1− p

p

]. (4.10)

4.4. Presentation of Numerical Methods. In this section, we will test the performance of ournew adaptive framework by implementing variational and non-variational integrators in the casewhere X = Rd and 〈x, x〉 = xTx, and discuss the results obtained. Keeping in mind the machinelearning application where data sets are very large, we will restrict ourselves to explicit first-orderoptimization algorithms.

Looking at the forms of Hamilton’s equations in both the Direct and Adaptive approach, we cannote that the objective function f and its gradient ∇f only appear in the expression for ˙r, and arefunctions of q only. Looking back at the construction of HTVIs from Section 2.2, denoting the orderof the Taylor methods by ρ instead of r to avoid confusion with the extended momentum variable r,we can note that both the Type II and Type III approaches require ρ-order Taylor approximationsof r and (ρ + 1)-order Taylor approximations of q. This means that the highest value of ρ thatwe can choose to obtain a gradient-based algorithm is ρ = 1. Now, the starting point of the TypeIII approach is a (ρ + 1)-order Taylor approximation q0 of q0 around q0. As a consequence, thesubsequent steps in the Type III method with ρ = 0 and ρ = 1 will contain evaluations of theobjective function f and its gradient ∇f at this approximation q0. Aside from the inconvenience ofthe function evaluations not being at the iterates q0 and q1 themselves, if f is a nonlinear function,this will also generate nonlinearity in the equations for the updates. As a result, we will not beable to design an explicit algorithm, or at least not a general explicit algorithm that would workfor all the functions f considered. On the other hand, the starting point of the Type II approach isa ρ-order Taylor approximation p0 of p0 around p0. A similar issue as for the Type III case arises

when ρ = 1 due to the approximations (qci , pci) = Ψ(ρ)cih

(q0, p0). Therefore, we cannot design ageneral explicit algorithm for the Type II case with ρ = 1. The remaining possibility is to construct

Page 16: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

16 VALENTIN DURUISSEAUX, JEREMY SCHMITT, AND MELVIN LEOK

a Type II HTVI using ρ = 0. This will produce explicit gradient-based algorithms, where all theevaluations of the objective function f and its gradient ∇f are performed at the iterates q0 and q1.

Note that when ρ = 0, we have (qci , pci) = Ψ(0)cih

(q0, p0) = (q0, p0) for all i, so for given values of pand p, every quadrature rule generates the same integrator. Following the method of Section 2.2,with time step h, we will derive explicit gradient-based HTVIs:

Type II Hamiltonian Taylor Variational Integrators (HTVIs) with ρ = 0: As mentionedearlier, since ρ = 0, the choice of quadrature rule does not matter, so we can take the rectangularquadrature rule about the initial point (c1 = 0 and b1 = 1). We approximate r(0) = ˜r0 via

r1 = πT ∗Q ◦Ψ(0)h (q0, ˜r0) = ˜r0 and generate approximations (qc1 , rc1) = Ψ

(0)c1h

(q0, ˜r0) = (q0, ˜r0).

Direct Approach Adaptive Approach

˜q1 = πQ ◦Ψ(1)h (q0, ˜r0) =

[q0 + h pr0

(qt0)p+1

qt0 + h

], ˜q1 = πQ ◦Ψ

(1)h (q0, ˜r0) =

q0 + hp2r0p (qt0)

−p− pp

qt0 + hpp(qt0)1− p

p

,

H+d (q0, r1;h) = rT1 q0 + rt1q

t0 + h

p

2(qt0)p+1rT1 r1

+ hCp(qt0)2p−1f(q0) + hrt1.

H+d (q0, r1;h) = rT1

[q0 +

p2

2ph(qt0)−p− p

p r1

]+ rt1

[qt0 +

hp

p(qt0)1− p

p

]+hCp2

p(qt0)2p− p

p f(q0).

The discrete right Hamilton’s equations (2.2) The discrete right Hamilton’s equations (2.2)yield the explicit variational integrator yield the explicit variational integrator

r1 = r0 − hCp(qt0)2p−1∇f(q0),

rt1 = rt0 + hp(p+ 1)

2(qt0)p+2rT1 r1

− hCp(2p− 1)(qt0)2p−2f(q0),

q1 = q0 + hp

(qt0)p+1r1,

qt1 = qt0 + h.

r1 = r0 −p2

phC(qt0)

2p− pp∇f(q0),

rt1/2 =p3 + pp

2p(qt0)p+ p

p+1hrT1 r1 +

pp− 2p3

p(qt0)pp

+1−2phCf(q0),

rt1 =

[1− h(qt0)

− pp

(1− p

p

)]−1 (rt0 + rt1/2

),

q1 = q0 +p2

ph(qt0)

−p− pp r1,

qt1 = qt0 +p

ph(qt0)

1− pp .

Other types of first-order variational integrators can be constructed for the Poincare transformedHamiltonians, such as Prolongation-Collocation variational integrators (see [17]), Galerkin varia-tional integrators (see [18]), and higher order HTVIs. We will not consider these integrators forthe optimization application since they require that one solves systems of nonlinear equations andcannot be implemented explicitly in general. Having said that, in practice, implicit methods for thenumerical solution of ODEs that can be solved using fixed-point iterations (as opposed to Newtoniterations) can be quite competitive, as there is a good initial guess which may allow them to con-verge in a small number of iterations, and the iterations are inexpensive as they do not require theassembly of a Jacobian. The convergence of the fixed-point iteration depends on the conditioningof the system of equations, and may impose a stringent time-step restriction. This can be over-come by the use of exponential integrators [12], and in particular, symplectic and energy-preservingexponential integrators [31].

In this paper, we have also implemented other non-variational methods based on these Direct andAdaptive Approaches, and more traditional optimization methods have been tested as well (such asthe Classical 4th Order Explicit Runge–Kutta Method (ERK4), MATLAB’s Explicit ODE Solvers(ode23 and ode45), and Nesterov’s Accelerated Gradient Algorithm (NAG) (4.1)).

Page 17: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED OPTIMIZATION17

Non-Variational Symplectic Integrators based on the Direct and Adaptive Approaches:

(i) Direct and Adaptive Approaches with Splitting of the Hamiltonian: The DirectApproach with splitting of the Hamiltonian is the approach presented in [1]. The threeterms of the Poincare transformed Hamiltonian (4.9) are considered separately. They gen-erate dynamics in the extended phase-space via six vector fields, and a symmetric leapfrogcomposition of the corresponding component dynamics is constructed to obtain a symplec-tic integrator (referred to in the numerical results section as “Direct Splitting”). A newsymplectic integrator can also be obtained by adapting the approach presented in [1] tothe adaptive Poincare transformed Hamiltonian (4.10) to obtain a symplectic integrator(referred to in the numerical results section as “Adaptive Splitting”).

Direct Approach Adaptive Approach

t = t+h

2,

rt = rt +h

2

p(p+ 1)

2tp+2rT r

− h

2Cp(2p− 1)t2p−2f(q),

r = r − h

2Cpt2p−1∇f(q),

q = q + hp

tp+1r,

r = r − h

2Cpt2p−1∇f(q),

rt = rt +h

2

p(p+ 1)

2tp+2rT r

− h

2Cp(2p− 1)t2p−2f(q),

t = t+h

2.

t =

(tp/p +

h

2

)p/p

,

θ =p

p− p

[(p3

2p+p

2

)t−p−1rT r +

(p− 2p3

p

)t2p−1Cf(q)

],

rt = (rt + θ) exp

((1− p

p

)h

2t−p/p

)− θ,

r = r − hCp2

2pt2p−p/p∇f(q),

q = q + hp2

ptp+p/pr,

r = r − hCp2

2pt2p−p/p∇f(q),

θ =p

p− p

[(p3

2p+p

2

)t−p−1rT r +

(p− 2p3

p

)t2p−1Cf(q)

],

rt = (rt + θ) exp

((1− p

p

)h

2t−p/p

)− θ,

t =

(tp/p +

h

2

)p/p

.

Table 3. Updates for Direct and Adaptive Approaches with Splitting of the Hamiltonian

(ii) Phase-space Cloning and Splitting:This approach is based on the technique introduced in [26]. The dimension of the extendedphase space is doubled, and a new Hamiltonian is defined via two copies of the Poincaretransformed Hamiltonian:

H(q, ˜q, r, ˜r) = H1(q, ˜r) + H2(˜q, r),

where H1 = H2 = H. Hamilton’s equations are then given by

˙q = ∇rH2,˙q = ∇˜rH1, ˙r = −∇qH1,

˙r = ∇˜qH2.

We can then integrate this new Hamiltonian system explicitly using a Strang Splitting or aYoshida 4 or Yoshida 6 Splitting for instance (referred to as “CloningStrang”, “CloningY4”,and “CloningY6” in the numerical results section). This approach will usually require alarger number of evaluations of the objective function f and of its gradient at each step.

Page 18: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

18 VALENTIN DURUISSEAUX, JEREMY SCHMITT, AND MELVIN LEOK

4.5. Numerical Results. The numerical methods presented in the previous section have beenconducted to minimize the quartic function

f(x) =[(x− 1)TΣ(x− 1)

]2, (4.11)

where x ∈ R50 and Σij = 0.9|i−j|. This convex function achieves its minimum value 0 at x∗ = 1.

Unless specified otherwise, the termination criterion used was

|f(xk)− f(xk−1)| < δ and ‖∇f(xk)‖ < δ where δ = 10−10. (4.12)

4.5.1. Adaptive versus Direct approach. Numerical experiments show that the Adaptive approach,when the parameters are tuned carefully, enjoys a significantly better rate of convergence and amuch smaller number of steps is required to achieve convergence than the Direct approach, as canbe seen from Figure 4 and Table 4. The Adaptive approach may require a smaller fictive time steph than in the Direct approach, but the physical time steps resulting from t = τp/p in the Adaptiveapproach grow rapidly to values larger than the physical time step of the Direct Approach.

1.5 2 2.5 3 3.5 4 4.5 5

log10(Iterations)

-15

-10

-5

0

5

10

log10(Error)

Direct HTVI p=4

Direct HTVI p=10

Adaptive HTVI p=4

Adaptive HTVI p=10 .

1.5 2 2.5 3 3.5 4 4.5 5

log10(Iterations)

-15

-10

-5

0

5

10log10(Error)

Direct Cloning Y4 p=4

Direct Cloning Y4 p=10

Adaptive Cloning Y4 p=4

Adaptive Cloning Y4 p=10 .

Figure 4. Comparison of the rates of convergence between the Direct and Adaptiveapproach for the HTVI method and the Cloning method with a Yoshida 4 splitting.We can clearly see that the Adaptive approach outperforms the Direct approach.

Approach p p h Iterations Approach p p h IterationsDirect HTVI 4 - 8.00E-04 77 878 Direct CloneY4 4 - 9.00E-04 106 530Adap. HTVI 4 0.5 1.21E-04 6 867 Adap. CloneY4 4 0.5 1.15E-04 7 101Direct HTVI 10 - 4.00E-04 13 872 Direct CloneY4 10 - 3.30E-04 15 498Adap. HTVI 10 0.5 1.95E-05 5 564 Adap. CloneY4 10 0.5 1.62E-05 6 300

Table 4. Comparison of the Direct and Adaptive approach for the HTVI methodand the Cloning method with a Yoshida 4 splitting. The Adaptive approach clearlyoutperforms the Direct approach in terms of number of iterations required.

Page 19: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED OPTIMIZATION19

4.5.2. Comparison of Methods within the Direct approach and within the Adaptive approach. Nu-merical experiments were conducted to compare the various algorithms presented in Section 4.4,and the results are presented in Figure 5 and Table 5. Although the number of iterations for allmethods were of the same order of magnitude, the HTVI method and the Splitting method basedon the idea of [1] performed much better than the methods based on the phase-space Cloningidea of [26]. This is mostly due to the fact that these phase-space Cloning methods require sev-eral evaluations of the objective function f and of its gradient ∇f at each iteration (3 for StrangSplitting, 7 for Yoshida’s 4th Order Splitting, and 19 for Yoshida’s 6th Order Splitting), while theHTVI and Splitting methods only required one such evaluation at each iteration. As a result, thesephase-space Cloning methods also required much more computational time to achieve convergence.It might be possible to improve the performance of these phase-space Cloning methods by addingan extra term in the final Hamiltonian which binds the two copies of the Poincare transformedHamiltonian, as was done in [35]. However, this additional term is likely to require a larger numberof compositions when splitting the final Hamiltonian, which would require more gradient evalua-tions of the objective function at each step. Thus, even though the trick presented in [35] couldreduce the number of iterations required to achieve convergence, it seems very unlikely that theresulting algorithm would be competitive against the HTVI and the Splitting methods, in terms ofcomputational time and total number of gradient evaluations needed.

1.5 2 2.5 3 3.5 4 4.5 5 5.5 6

log10(GradientEvaluations)

-15

-10

-5

0

5

10

log10(Error)

1.5 2 2.5 3 3.5 4 4.5 5 5.5

log10(GradientEvaluations)

-15

-10

-5

0

5

10

log10(Error)

Figure 5. Comparison of the convergence, in terms of gradient evaluations needed,of the different symplectic integrators within the Direct approach (on the left), andwithin the Adaptive approach (on the right).

Method p h Iterations Method p p h IterationsDirect HTVI 4 8.7E-04 57 504 Adaptive HTVI 4 1 2.4E-04 9 361Direct Splitting 4 9.5E-04 60 881 Adaptive Splitting 4 1 2.9E-04 8 313Direct CloneStrang 4 9.7E-04 81 367 Adaptive CloneStrang 4 1 2.4E-04 10 638Direct CloneY4 4 8.9E-04 74 747 Adaptive CloneY4 4 1 2.9E-04 10 721Direct CloneY6 4 7.9E-04 87 075 Adaptive CloneY6 4 1 2.0E-04 11 549

Table 5. Number of iterations needed until convergence of the different symplecticintegrators within the Direct (on the left) and Adaptive (on the right) approaches.

Page 20: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

20 VALENTIN DURUISSEAUX, JEREMY SCHMITT, AND MELVIN LEOK

4.5.3. Dependence on p and p in the Adaptive approach. We conducted numerical experimentswith the HTVI method to study the evolution of the performance of the Adaptive approach as theparameters p and p are varied. We can deduce from the results presented in Figure 6 and Table6 that the Adaptive HTVI method becomes more and more efficient as p is increased and p isdecreased. The improvement in efficiency is very important as we increase p from p = 2 to p = 4,while it is minor but still noticeable as we increase p from p = 4 to p = 8. A possible explanationfor this behaviour is that the integrator might not be of high enough order to distinguish betweenthe p = 6 and p = 8 Bregman dynamics. Note that the fictive time step h must be reduced asp increases or p decreases, but the time relation t = τp/p ensures that the resulting physical timesteps do not become significantly smaller.

1.5 2 2.5 3 3.5 4 4.5 5

log10(Iterations)

-15

-10

-5

0

5

10

log10(Error)

1.5 2 2.5 3 3.5 4 4.5

log10(Iterations)

-15

-10

-5

0

5

10

log10(Error)

Figure 6. Evolution of the rates of convergence of the HTVI method as p is in-creased (on the left), and as p is decreased (on the right).

Method p p h Iterations Method p p h IterationsAdaptive HTVI 2 1 8.0E-04 77 855 Adaptive HTVI 4 2 3.8E-04 22 128Adaptive HTVI 4 1 2.4E-04 9 361 Adaptive HTVI 4 1 2.4E-04 9 361Adaptive HTVI 6 1 9.4E-05 7 785 Adaptive HTVI 4 0.5 1.2E-04 7 099Adaptive HTVI 8 1 6.1E-05 6 133 Adaptive HTVI 4 0.1 2.4E-05 5 689

Table 6. Evolution of the fictive time step h and number of iterations until con-vergence for the HTVI method as p increases (left), and as p decreases (right).

4.5.4. Comparison to non-symplectic methods. In this subsection, we will present the results ofnumerical experiments conducted to investigate the role that symplecticity plays in the Direct andAdaptive approaches. We first implemented some fixed time-step explicit Runge–Kutta methodssuch as the 4th Order Explicit Runge-Kutta Method (ERK4), but these failed to converge both inthe Direct and Adaptive approaches. We then considered variable time-step explicit Runge–Kuttamethods. To this end, we tested MATLAB’s differential equation solvers ode45 and ode23, whichare explicit variable time-step Runge–Kutta pairs, and the corresponding numerical results arepresented in Figure 7. The HTVI method required a significantly smaller number of iterations thanthe MATLAB solvers. Furthermore, an inherent part of the step size control in embedded Runge–Kutta methods is that, at each iteration, the underlying Runge–Kutta method may be executedseveral times to determine the appropriate time-step that satisfies the prescribed tolerances. For

Page 21: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED OPTIMIZATION21

this reason, the MATLAB solvers require more evaluations of f and ∇f at each iteration, and sincethey also required more iterations than the HTVI method, these MATLAB solvers are much lesscompetitive. It should also be noted that the MATLAB solvers did not exhibit any improvementswhen used with the Adaptive approach instead of the Direct approach, while the HTVI methodimproved significantly. This is not surprising since the MATLAB solvers ode23 and ode45 bothuse a variable time-step strategy, regardless of the approach chosen.

2 2.5 3 3.5 4 4.5

log10(Iterations)

-10

-5

0

5

10

log10(Error)

ode45

ode23

HTVI

2 2.5 3 3.5 4 4.5

log10(Iterations)

-15

-10

-5

0

5

10

log10(Error)

ode45

ode23

HTVI

Figure 7. Comparison of the HTVI method with the ode23 and ode45 MATLABfunctions in the Direct (left) and Adaptive (right) approaches with p = 10 andp = 0.5. The HTVI method requires outperforms the MATLAB solvers.

Finally, we have compared our Adaptive HTVI method to Nesterov’s Accelerated Gradient(NAG) algorithm (4.1). We relaxed the termination condition (4.12) from δ = 10−10 to δ = 10−9

since NAG failed to achieve the 10−10 tolerance even after 1010 iterations. The HTVI methodoutperformed Nesterov’s Accelerated Gradient (NAG) method (see Figure 8).

Figure 8. Comparison of the performance of the HTVI and NAG methods on thequartic objective function (4.11), both with the same initial time step of 2× 10−6.

Page 22: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

22 VALENTIN DURUISSEAUX, JEREMY SCHMITT, AND MELVIN LEOK

Remark. In [1], the authors noted that Nesterov’s Accelerated Gradient algorithm transitions intoan exponential rate of convergence once it is sufficiently close to the minimum of certain objectivefunctions, and suggested that this behaviour requires strong convexity of the objective function inthe neighborhood of the minimum. Similarly to the strategy presented in [1], a gradient flow can beincorporated into the updates of the Direct and Adaptive algorithms presented so that for certainobjective functions, the same exponential rate of convergence can be achieved close to the minimum.

5. Conclusion

Due to the degeneracy of the Hamiltonian, adaptive variational integrators based on the Poincaretransformation should be constructed using discrete Hamiltonians, which are Type II or III gen-erating functions. This has potential implications for the numerical properties of such integrators,and might explain why there has only been a limited amount of work on the construction of adap-tive variational integrators based on the traditional Lagrangian perspective. The efficiency of theresulting integrator is largely based upon a proper choice of the monitor function g, and moreresearch is needed to find a general choice of g that maintains a decent level of efficiency.

We have also noted that the gain in efficiency provided by adaptivity depends on the propertiesof the Hamiltonian dynamical system, and tends to be more significant in regions of high curvatureof the Hamiltonian in the vector field. We focused primarily on Taylor variational integrators, butGalerkin variational integrators are likely to be very promising as well, since the cost of evaluatingthe monitor function and its derivatives should be low. In addition, the Galerkin approximationscheme may help inform a better choice of monitor function, due to the extensive literature onefficient a posteriori error estimation. A posteriori error estimation, in general, would be a niceaddition to give some guarantees on global accuracy.

Finally, we used our adaptive framework together with the variational approach to acceleratedoptimization presented in [36] to design efficient Hamiltonian variational and non-variational ex-plicit integrators for symplectic accelerated optimization. We noted that a careful use of adaptivityand symplecticity can result in significantly faster algorithms. It would be desirable to under-stand at a theoretical level the role that adaptivity and symplecticity plays in the accurate andstable discretization of flows that correspond to accelerated optimization algorithms, which couldbetter inform the choice of monitor functions, and the convex function used to define the Breg-man divergence that arises in the construction of the Bregman Hamiltonian. This direction seemsparticularly promising for constructing novel optimization algorithms with superior computationalefficiency and performance.

Another possible future research direction is to consider the implications of this work for sto-chastic gradient descent methods [28], by considering it in the context of a Bregman Lagrangian orHamiltonian, but with a stochastic perturbation of the potential. This naturally leads to consid-ering stochastic generalizations of the adaptive Hamiltonian variational integrators considered inthis paper, by extending the existing work on stochastic variational integrators [4; 13].

Acknowledgments

The authors were supported in part by NSF under grants DMS-1411792, DMS-1345013, DMS-1813635, and by AFOSR under grant FA9550-18-1-0288.

Appendix A. Proof of Theorem 2.1

The proof of Theorem 2.1 is similar to the one presented in the Appendix of [30] for LagrangianTaylor variational integrators. We first start with the right Hamiltonian Taylor variational integra-tor case. Let q(t) and p(t) denote the solutions of Hamilton’s boundary value problem

q(t) = g(q(t), p(t), t), p(t) = f(q(t), p(t), t), q(0) = q0, p(h) = p1.

Page 23: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED OPTIMIZATION23

Let q1 = q(h) and p0 = p(0).

Lemma A.1. Given a r-order Taylor method Ψ(r)h approximating the exact time-h flow map cor-

responding to Hamilton’s equations, let p0 solve the problem p1 = πT ∗Q ◦Ψ(r)h (q0, p0). Then,

p0 = p0 +O(hr+1).

Proof. Solving the equation p1 = πT ∗Q ◦Ψ(r)h (q0, p0) for p0 yields p0 = p1−

∑rk=1

hk

k! f(k−1)(q0, p0, 0).

The exact solution p(t) belongs to Cr+1([0, h]) so Taylor’s Theorem gives

p0 = p1 −r∑

k=1

hk

k!f (k−1)(q0, p0, 0) +Rr(h).

Now, since p(t) belongs to Cr+1([0, h]), f (k−1) is Lipschitz continuous in its arguments for k =1, ..., r − 1. Let M be the largest of the corresponding (r − 1) Lipschitz constants with respect tothe second argument over the compact interval [0, C]. Then, using the triangle inequality,

‖p0 − p0‖ =

∥∥∥∥∥Rr(h)−r∑

k=1

hk

k!

[f (k−1)(q0, p0, 0)− f (k−1)(q0, p0, 0)

]∥∥∥∥∥ ≤Mr∑

k=1

hk

k!‖p0 − p0‖+ ‖Rr(h)‖.

Thus,(

1−M∑r

k=1hk

k!

)‖p0 − p0‖ ≤ ‖Rr(h)‖ = O(hr+1), and by continuity, there exists C ∈ (0, C) such

that ∀h ∈ (0, C), the term inside the parenthesis is bounded away from zero, which concludes the proof. �

We now show that starting the r-order Taylor method with initial conditions (q0, p0) rather than(q0, p0) will not affect the order of accuracy of the method.

Lemma A.2. The r-order Taylor method Ψ(r)h with initial conditions (q0, p0) and where p0 solves

p1 = πT ∗Q◦Ψ(r)h (q0, p0), is accurate to at least O(hr+1) for the Hamiltonian boundary-value problem.

Proof. Let (q(t), p(t)) denote the exact solution to Hamiltonian’s equations with initial values(q0, p0), and let (qd(t), pd(t)) denote the values generated by the r-order Taylor method with initialconditions (q0, p0). The Hamiltonian initial-value problem is well-posed, so denoting the Lipschitzconstant with respect to the second argument by M , we get

‖(q(t), p(t))− (qd(t), pd(t))‖ ≤ ‖(q(t), p(t))− (q(t), p(t))‖+ ‖(q(t), p(t))− (qd(t), pd(t))‖≤M‖p0 − p0‖+O(hr+1) ≤ O(hr+1),

where we have used the triangle inequality, and the fact that the local truncation error of a r-orderTaylor method is O(hr+1) to bound ‖(q(t), p(t))− (qd(t), pd(t))‖. �

We are now ready to prove Theorem 2.1 for right Hamiltonian Taylor variational integrators.

Theorem A.1. Consider a Hamiltonian H such that H and ∂H∂p are Lipschitz continuous in both

variables. Given a r-order accurate Taylor method Ψ(r)h and a s-order accurate quadrature formula

with weights and nodes (bi, ci), define the associated Taylor discrete right Hamiltonian

H+d (q0, p1;h) = pT1 q1 − h

m∑i=1

bi[pTci qci −H(qci , pci)

],

where p0 solves p1 = πT ∗Q◦Ψ(r)h (q0, p0), where q1 = πQ◦Ψ(r+1)

h (q0, p0), where (qci , pci) = Ψ(r)cih

(q0, p0),and where we use the continuous Legendre Transform to obtain qci.

Then, H+d approximates H+,E

d with order of accuracy at least min (r + 1, s). By Theorem 2.2 in[29], the associated discrete right Hamiltonian map has the same order of accuracy.

Page 24: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

24 VALENTIN DURUISSEAUX, JEREMY SCHMITT, AND MELVIN LEOK

Proof. From Lemma A.2 we have that q(cih) = qci +O(hr+1) and p(cih) = pci +O(hr+1), and since∂H∂p is Lipschitz in both variables q(cih) − qci = ∂H

∂p (q(cih), p(cih)) − ∂H∂p (qci , pci) = O(hr+1). Since

the quadrature formula is of order s accurate, equation (2.1) for H+,Ed (q0, p1;h) gives

H+,Ed (q0, p1;h) = pT1 q1 − h

m∑i=1

bi[p(cih)T q(cih)−H

(qci +O(hr+1), pci +O(hr+1)

)]+O(hs+1).

Now, since q1 = πQ ◦Ψ(r+1)h (q0, p0), it follows from Lemma A.2 that q1 = q1 +O(hr+2). Therefore,

combining this with the fact that H is Lipschitz continuous in both variables yields

H+,Ed (q0, p1;h) = pT1 q1 − h

m∑i=1

bi[pTci qci −H (qci , pci)

]+O(hr+2) +O(hs+1)

= H+d (q0, p1;h) +O(hmin (r+1,s)+1).

Therefore, H+d approximates H+,E

d with order of accuracy at least min (r + 1, s).�

Theorem 2.1 can be proven in a similar way for left Hamiltonian Taylor variational integrators.Now, q(t) and p(t) denote the solutions of the Hamilton’s boundary-value problem

q(t) = g(q(t), p(t), t), p(t) = f(q(t), p(t), t), q(h) = q1, p(0) = p0,

and let q0 = q(0) and p1 = p(h). Lemma A.1 is replaced by

Lemma A.3. Given a (r + 1)-order Taylor method Ψ(r+1)h approximating the exact time-h flow

map corresponding to Hamilton’s equations, let q0 solve the problem q1 = πQ ◦Ψ(r+1)h (q0, p0). Then,

q0 = q0 +O(hr+2).

Proof. We proceed as in the proof of Lemma A.1. We first solve q1 = πQ ◦Ψ(r+1)h (q0, p0) for q0, and

then Taylor expand the exact solution q(t) which belongs to Cr+2([0, h]). Now, q(t) is Lipschitzcontinuous in its arguments for k = 1, ..., r, so we can let M be the largest of the corresponding rLipschitz constants with respect to the first argument over the compact interval [0, C]. Then, as

before, the triangle inequality can be used to get that(

1−M∑r+1

k=1hk

k!

)‖q0 − q0‖ = O(hr+2), and

by continuity, the term inside the parenthesis is bounded away from zero.�

In analogy to Lemma A.2, we now show that starting the r-order Taylor method with initialconditions (q0, p0) rather than (q0, p0) will not affect the order of accuracy of the method.

Lemma A.4. The r-order Taylor method Ψ(r)h with initial conditions (q0, p0) and where q0 solves

q1 = πQ ◦Ψ(r)h (q0, p0), is accurate to at least O(hr+1) for the Hamiltonian boundary-value problem.

Proof. Let (q(t), p(t)) denote the exact solution to Hamiltonian’s equations with initial values(q0, p0), and let (qd(t), pd(t)) denote the values generated by the r-order Taylor method with initialconditions (q0, p0). The Hamiltonian initial-value problem is well-posed, so denoting the Lipschitzconstant with respect to the first argument by M , we get

‖(q(t), p(t))− (qd(t), pd(t))‖ ≤ ‖(q(t), p(t))− (q(t), p(t))‖+ ‖(q(t), p(t))− (qd(t), pd(t))‖≤M‖q0 − q0‖+O(hr+1) ≤ O(hr+1),

where we have used the triangle inequality, and the fact that the local truncation error of a r-orderTaylor method is O(hr+1) to bound ‖(q(t), p(t))− (qd(t), pd(t))‖.

Page 25: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED OPTIMIZATION25

We are now ready to prove Theorem 2.1 for left Hamiltonian Taylor variational integrators.

Theorem A.2. Consider a Hamiltonian H such that H and ∂H∂p are Lipschitz continuous in both

variables. Given a r-order accurate Taylor method Ψ(r)h and a s-order accurate quadrature formula

with weights and nodes (bi, ci), define the associated Taylor discrete left Hamiltonian

H−d (q1, p0;h) = −pT0 q0 − hm∑i=1

bi[pTci qci −H(qci , pci)

],

where q0 solve the problem q1 = πQ ◦ Ψ(r+1)h (q0, p0), where (qci , pci) = Ψ

(r)cih

(q0, p0), and where weuse the continuous Legendre Transform to obtain qci.

Then, H−d approximates H−,Ed with order of accuracy at least min (r + 1, s). By a result analogousto Theorem 2.2 in [29], the associated discrete left Hamiltonian map has the same order of accuracy.

Proof. From Lemma A.4 we have that q(cih) = qci + O(hr+1), and p(cih) = pci + O(hr+1), and

since ∂H∂p is Lipschitz in both variables q(cih) − qci = ∂H

∂p (q(cih), p(cih)) − ∂H∂p (qci , pci) = O(hr+1).

Since the quadrature formula is of order s accurate, equation (2.3) for H−,Ed (q1, p0;h) gives

H−,Ed (q1, p0;h) = −pT0 q0 − hm∑i=1

bi[p(cih)T q(cih)−H

(qci +O(hr+1), pci +O(hr+1)

)]+O(hs+1).

Now, since q1 = πQ ◦Ψ(r+1)h (q0, p0), it follows from Lemma A.3 that q0 = q0 +O(hr+2). Therefore,

combining this with the fact that H is Lipschitz continuous in both variables yields

H−,Ed (q1, p0;h) = H−d (q1, p0;h) +O(hmin (r+1,s)+1).

References

[1] M. Betancourt, Michael I. Jordan, and A. Wilson. On symplectic optimization. arXiv: Com-putation, 2018.

[2] S. Blanes and C. J. Budd. Explicit adaptive symplectic (easy) integrators: A scaling invariantgeneralisation of the Levi-Civita and KS regularisations. Celestial Mechanics and DynamicalAstronomy, 89(4):383–405, 2004.

[3] S. Blanes and A. Iserles. Explicit adaptive symplectic integrators for solving Hamiltoniansystems. Celestial Mechanics and Dynamical Astronomy, 114(3):297–317, 2012.

[4] N. Bou-Rabee and H. Owhadi. Stochastic variational integrators. IMA J. Numer. Anal., 29(2):421–443, 2009.

[5] J. P. Calvo and J. M. Sanz-Serna. The development of variable-step symplectic integrators,with application to the two-body problem. SIAM J. Sci. Comp., 14(4):936–952, 1993.

[6] A. Cauchy. Methode generale pour la resolution des systemes d’equations simultanees. Acad.Sci. Paris, 25:536–538, 1847.

[7] Z. Ge and J. E. Marsden. Lie–Poisson Hamilton–Jacobi theory and Lie-Poisson integrators.Physics Letters A, 133(3):134–139, 1988.

[8] B. Gladman, M. Duncan, and J. Candy. Symplectic integrators for long-time integrations incelestial mechanics. Celestial Mech. Dynamical Astronomy, 52:221–240, 1991.

[9] E. Hairer. Variable time step integration with symplectic methods. Applied Numerical Math-ematics, 25(2-3):219–227, 1997.

[10] E. Hairer, C. Lubich, and G. Wanner. Geometric numerical integration, volume 31 of SpringerSeries in Computational Mathematics. Springer-Verlag, Berlin, second edition, 2006.

[11] J. Hall and M. Leok. Spectral Variational Integrators. Numer. Math., 130(4):681–740, 2015.[12] M. Hochbruck and A. Ostermann. Exponential integrators. Acta Numerica, 19:209–286, 2010.

Page 26: ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND …mleok/pdf/DuScLe2020_avi_accopt.pdf · 2020. 12. 5. · ADAPTIVE HAMILTONIAN VARIATIONAL INTEGRATORS AND SYMPLECTIC ACCELERATED

26 VALENTIN DURUISSEAUX, JEREMY SCHMITT, AND MELVIN LEOK

[13] D. D. Holm and T. M. Tyranowski. Stochastic discrete hamiltonian variational integrators.BIT Numerical Mathematics, 58(4):1009–1048, 2018.

[14] C. Kane, J.E. Marsden, and M. Ortiz. Symplectic-energy-momentum preserving variationalintegrators. Journal of Mathematical Physics, 40(7), 1999.

[15] S. Lall and M. West. Discrete variational Hamiltonian mechanics. J. Phys. A, 39(19):5509–5519, 2006.

[16] B. Leimkuhler and S. Reich. Simulating Hamiltonian Dynamics, volume 14 of CambridgeMonographs on Applied and Computational Mathematics. Cambridge University Press, Cam-bridge, 2004.

[17] M. Leok and T. Shingel. Prolongation-collocation variational integrators. IMA J. Numer.Anal., 32(3):1194–1216, 2012.

[18] M. Leok and J. Zhang. Discrete Hamiltonian variational integrators. IMA Journal of NumericalAnalysis, 31(4):1497–1532, 2011.

[19] J. E. Marsden and M. West. Discrete mechanics and variational integrators. Acta Numer., 10:357–514, 2001.

[20] K. Modin and C. Fuhrer. Time-step adaptivity in variational integrators with application tocontact problems. Z. Angew. Math. Mech., 86(10):785–794, 2006.

[21] S. Nair. Time adaptive variational integrators: A space-time geodesic approach. Physica D:Nonlinear Phenomena, 241(4):315–325, 2012.

[22] A. S. Nemirovsky and D. B. Yudin. Problem Complexity and Method Efficiency in Optimiza-tion. Wiley - Interscience series in discrete mathematics. Wiley, 1983.

[23] Y. Nesterov. A method of solving a convex programming problem with convergence rateO(1/k2). Soviet Mathematics Doklady, 27(2):372–376, 1983.

[24] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course, volume 87 ofApplied Optimization. Kluwer Academic Publishers, Boston, MA, 2004.

[25] Y. Nesterov. Accelerating the cubic regularization of newton’s method on convex problems.Math. Program., 112:159–181, 2008.

[26] P. Pihajoki. Explicit methods in extended phase space for inseparable hamiltonian problems.Celestial Mechanics and Dynamical Astronomy, 121:211–231, 2015.

[27] S. Reich. Backward error analysis for numerical integrators. SIAM J. Numer. Anal., 36:1549–1570, 1999.

[28] H. Robbins and S. Monro. A stochastic approximation method. Ann. Math. Statist., 22(3):400–407, 09 1951.

[29] J. M. Schmitt and M. Leok. Properties of Hamiltonian variational integrators. IMA Journalof Numerical Analysis, 36(2), 2017.

[30] J. M. Schmitt, T. Shingel, and M. Leok. Lagrangian and Hamiltonian Taylor variationalintegrators. BIT Numerical Mathematics, 2017. Online First.

[31] X. Shen and M. Leok. Geometric exponential integrators. J. Comput. Phys., 382:27–42, 2019.[32] J. Struckmeier. Hamiltonian dynamics on the symplectic extended phase space for autonomous

and non- autonomous systems. Journal of Physics A, 38:1257–1278, 2005.[33] W. Su, S. Boyd, and E. Candes. A differential equation for modeling Nesterov’s Accelerated

Gradient method: theory and insights. Journal of Machine Learning Research, 17(153):1–43,2016.

[34] Yu. B. Suris. Hamiltonian methods of Runge-Kutta type and their variational interpretation.Mat. Model., 2(4):78–87, 1990.

[35] Molei Tao. Explicit symplectic approximation of nonseparable hamiltonians: Algorithm andlong time performance. Physical Review E, 94(4):043303, 2016.

[36] A. Wibisono, A. Wilson, and M. Jordan. A variational perspective on accelerated methods inoptimization. Proceedings of the National Academy of Sciences, 113(47):E7351–E7358, 2016.