Numerical optimization and adjoint state methods for large ... · Numerical optimization and...

85
Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M´ etivier 1 and the SEISCOPE group 1,2,3 1 LJK, Univ. Grenoble Alpes, CNRS, Grenoble, France 2 ISTerre, Univ. Grenoble Alpes, CNRS, Grenoble, France 3 eoazur, Univ. Nice Sophia Antipolis, CNRS, Valbonne, France http://seiscope2.osug.fr Joint Inversion Summer School Barcelonnette, France, 15-19, June, 2015 SEISCOPE L. M´ etivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 1 / 31

Transcript of Numerical optimization and adjoint state methods for large ... · Numerical optimization and...

Page 1: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Numerical optimization and adjoint state methods for large-scalenonlinear least-squares problems

Ludovic Metivier1 and the SEISCOPE group1,2,3

1 LJK, Univ. Grenoble Alpes, CNRS, Grenoble, France2 ISTerre, Univ. Grenoble Alpes, CNRS, Grenoble, France

3 Geoazur, Univ. Nice Sophia Antipolis, CNRS, Valbonne, France

http://seiscope2.osug.fr

Joint Inversion Summer School Barcelonnette, France, 15-19, June, 2015

SEISCOPE

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 1 / 31

Page 2: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Outline

1 Numerical optimization methods for large-scale smooth unconstrained minimization problems

2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation

3 Summary

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 2 / 31

Page 3: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Outline

1 Numerical optimization methods for large-scale smooth unconstrained minimization problemsNumerical optimization for nonlinear least-squares problemsSteepest descent methodNewton methodQuasi-Newton methodsWhat about the nonlinear conjugate gradient?Summary

2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation

3 Summary

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 3 / 31

Page 4: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Outline

1 Numerical optimization methods for large-scale smooth unconstrained minimization problemsNumerical optimization for nonlinear least-squares problemsSteepest descent methodNewton methodQuasi-Newton methodsWhat about the nonlinear conjugate gradient?Summary

2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation

3 Summary

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 4 / 31

Page 5: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Numerical optimization for inverse problems in geosciences

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 5 / 31

Nonlinear least-squares problem

In this presentation, we will consider the inverse problem

minm

f (m) =1

2‖dcal (m)− dobs‖2

where

dobs are data associated with a physical phenomenon and a measurementprotocol: seismic waves, electromagnetic waves, gravimeter, ultrasounds,x-ray,...

m is the parameter of interest we want to reconstruct: P and S-wavevelocities, density, anisotropy parameters, attenuation, or a collection of theseparameters

dcal (m) are synthetic data, computed numerically, often through the solutionof partial differential equations

f (m) is a misfit function which measures the discrepancy between observedand synthetic data

Page 6: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Numerical optimization for inverse problems in geosciences

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 5 / 31

Nonlinear least-squares problem

In this presentation, we will consider the inverse problem

minm

f (m) =1

2‖dcal (m)− dobs‖2

Of course, in joint inversion, we may consider a misfit function as a sum of thesefunctions associated with different measurements: the theory remains the same

Page 7: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Numerical optimization for inverse problems in geosciences

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 5 / 31

Nonlinear least-squares problem

In this presentation, we will consider the inverse problem

minm

f (m) =1

2‖dcal (m)− dobs‖2

We will also assume that f (m) is a continuous and twice differentiablefunction: the gradient is continuous, and the second-order derivatives matrixH(m) (Hessian matrix) is also continuous

The methods we are going to review are local optimization method: we putaside the global optimization methods and stochastic/genetic algorithmswhich are unaffordable for large-scale optimization problems

All the methods we review are presented in (Nocedal and Wright, 2006)

Page 8: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Local methods to find the minimum of a function

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 6 / 31

Necessary condition

To detect the extremum of a differentiable function f (m), we have the necessarycondition

∇f (m) = 0

Page 9: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Local methods to find the minimum of a function

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 6 / 31

Necessary condition

To detect the extremum of a differentiable function f (m), we have the necessarycondition

∇f (m) = 0

This is not enough: is it a minimum or maximum?

Page 10: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Local methods to find the minimum of a function

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 6 / 31

Necessary and sufficient conditions

In a local minimum, the function is locally convex: the Hessian is definite positive

∇f (m) = 0, ∇2f (m) > 0

Page 11: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Local methods to find the minimum of a function

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 6 / 31

Practical implementation

However, this not what we implement in practice. From an initial guess m0, asequence mk is built such that

the limit m∗ should satisfy the necessary condition

∇f (m∗) = 0

at each iterationf (mk+1) < f (mk )

We have to guarantee the decrease of the misfit function at each iteration

Page 12: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Outline

1 Numerical optimization methods for large-scale smooth unconstrained minimization problemsNumerical optimization for nonlinear least-squares problemsSteepest descent methodNewton methodQuasi-Newton methodsWhat about the nonlinear conjugate gradient?Summary

2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation

3 Summary

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 7 / 31

Page 13: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

How to find the zero of the gradient: first-order method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 8 / 31

The fixed-point method

We want to find m∗ such that

∇f (m∗) = 0 (1)

The simplest method is to apply the fixed point iteration on I − α∇f

mk+1 = (I − α∇f )mk = mk − α∇f (mk ), α ∈ R+∗

At convergence we should have

m∗ = (I − α∇f )m∗ = m∗ − α∇f (m∗) =⇒ ∇f (m∗) = 0

Page 14: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

How to find the zero of the gradient: first-order method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 8 / 31

Ensuring the decrease of the misfit function

We need to ensuref (mk+1) < f (mk )

We havef (m + dm) = f (m) +∇f (m)T dm + o(||dm||2)

Therefore, ifmk+1 = mk − α∇f (mk ),

we have

f (mk+1) = f (mk − α∇f (mk )) = f (mk )− α∇f (mk )T∇f (mk ) + α2o(||∇f (mk )‖2

that isf (mk+1) = f (mk )− α||∇f (mk )T ||2 + α2o(||∇f (mk )‖2

Therefore for α small enough, we can ensure the decrease condition

Page 15: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

How to find the zero of the gradient: first-order method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 8 / 31

Fixed point on I − αF = steepest-descent method

To summarize, using the fixed-point iteration on I − α∇f (m) yields the sequence

mk+1 = mk − α∇f (mk ),

We have just rediscovered the steepest-decent iteration

Page 16: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Outline

1 Numerical optimization methods for large-scale smooth unconstrained minimization problemsNumerical optimization for nonlinear least-squares problemsSteepest descent methodNewton methodQuasi-Newton methodsWhat about the nonlinear conjugate gradient?Summary

2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation

3 Summary

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 9 / 31

Page 17: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

How to find the zero of the gradient: second-order method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 10 / 31

Newton method: graphical interpretation

A faster (quadratic) convergence can be achieved for finding the zero ∇f (m) if weuse the Newton method.

Page 18: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

How to find the zero of the gradient: second-order method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 10 / 31

Newton method

We approximate ∇f (mk+1) as its first-order Taylor development mk

∇f (mk+1) ' ∇f (mk ) +

„∂∇f (mk )

∂mk

«(mk+1 −mk ) (1)

We look for the zero of this approximation

∇f (mk ) +

„∂∇f (mk )

∂mk

«(mk+1 −mk ) = 0 (2)

which yields

mk+1 = mk −„∂∇f (mk )

∂mk

«−1

∇f (mk )

Page 19: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

How to find the zero of the gradient: second-order method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 10 / 31

Notations

In the following, we use the notation

∂∇f (mk )

∂mk= H(mk ) (1)

for the Hessian operator (second-order derivatives of the misfit function).

Page 20: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

How to find the zero of the gradient: second-order method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 10 / 31

Decrease of the misfit function

Do we ensure the decrease of the misfit function?

f (mk+1) = f (mk − αkH(mk )−1∇f (mk ))= f (mk )− αk∇f (mk )T H(mk )−1∇f (mk ) + α2

ko(||H(mk )−1∇f (mk )‖2

We have∇f (mk )T H(mk )−1∇f (mk ) > 0 (1)

if and only if H(mk )−1 > 0.

Page 21: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

How to find the zero of the gradient: second-order method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 10 / 31

Difficulties

The Hessian operator may not be necessary strictly positive: the functionf (m) may not be strictly convex as the forward problem is nonlinear (f (m) isnot quadratic)

For large-scale application, how to compute H(m) and its inverse H(m−1)?

Page 22: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Outline

1 Numerical optimization methods for large-scale smooth unconstrained minimization problemsNumerical optimization for nonlinear least-squares problemsSteepest descent methodNewton methodQuasi-Newton methodsWhat about the nonlinear conjugate gradient?Summary

2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation

3 Summary

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 11 / 31

Page 23: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

The l-BFGS method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 12 / 31

Principle

l-BFGS method (Nocedal, 1980) relies on the iterative scheme

mk+1 = mk − αkQk∇f (mk ) (1)

whereQk ' H(mk )−1, sym > 0 (2)

andαk ∈ R+

∗ (3)

is a scalar parameter computed through a linesearch process

Page 24: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

The l-BFGS method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 12 / 31

l-BFGS approximation

The l-BFGS approximation consists in defining Qk as

Qk =`V T

k−1 . . .VTk−l

´Q0

k (Vk−l . . .Vk−1)+ρk−l

`V T

k−1 . . .VTk−l+1

´sk−ls

Tk−l (Vk−l+1 . . .Vk−1)

+ρk−l+1

`V T

k−1 . . .VTk−l+2

´sk−l+1s

Tk−l+1 (Vk−l+2 . . .Vk−1)

+ . . .

+ρk−1sk−1sTk−1,

(1)

where the pairs sk , yk are

sk = mk+1 −mk , yk = ∇f (mk+1)−∇f (mk ), (2)

the scalar ρk are

ρk =1

yTk sk

, (3)

and the matrices Vk are defined by

Vk = I − ρkyksTk . (4)

Page 25: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

The l-BFGS method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 12 / 31

Implementation: two-loops recursion

Page 26: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Truncated Newton method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 13 / 31

Principle

The truncated Newton method (Nash, 2000) relies on the iterative scheme

mk+1 = mk + αk ∆mk (1)

where ∆mk is computed as an approximate solution of the linear system

H(mk )∆mk = −∇f (mk ) (2)

Page 27: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Truncated Newton method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 13 / 31

Principle

The truncated Newton method (Nash, 2000) relies on the iterative scheme

mk+1 = mk + αk ∆mk (1)

where ∆mk is computed as an approximate solution of the linear system

H(mk )∆mk = −∇f (mk ) (2)

Implementation

A matrix-free conjugate gradient is used to solve this linear system (Saad,2003)

This only requires the capability to compute matrix-vector products H(mk )vfor given vectors v : the full Hessian matrix needs not to be formed explicitly

The resulting approximation of the Hessian only accounts for positiveeigenvalues of H(mk ): ∆mk is ensured to be a descent direction

Page 28: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Outline

1 Numerical optimization methods for large-scale smooth unconstrained minimization problemsNumerical optimization for nonlinear least-squares problemsSteepest descent methodNewton methodQuasi-Newton methodsWhat about the nonlinear conjugate gradient?Summary

2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation

3 Summary

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 14 / 31

Page 29: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Conjugate gradient

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 15 / 31

Conjugate gradient for symmetric positive linear systems

The conjugate gradient is an iterative method for the solution of symmetricpositive definite linear systems

Am = b (3)

The method enjoys several interesting properties

Convergence in at most n iterations for a system of size n (ok)

Fast convergence rate possible depending on the eigenvalues distribution ofA: in practice, an acceptable approximation of the solution can be obtainedin k iterations with k << n

Page 30: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Conjugate gradient

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 15 / 31

Only matrix-vector products to perform

Implementation

Page 31: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Conjugate gradient

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 15 / 31

Nonlinear conjugate gradient

How to extend the conjugate gradient for the solution of nonlinear minimizationproblems? There is a link: solving

Am = b (3)

where A is symmetric positive definite is equivalent to solve

minm

f (m) = mT Am −mT b (4)

because∇f (m) = Am − b (5)

and f is strictly convex (a single extremum which is a minimum)

Page 32: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Conjugate gradient

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 15 / 31

Implementation

Simply replace r in the preceding algorithm by ∇f (m)

Page 33: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Outline

1 Numerical optimization methods for large-scale smooth unconstrained minimization problemsNumerical optimization for nonlinear least-squares problemsSteepest descent methodNewton methodQuasi-Newton methodsWhat about the nonlinear conjugate gradient?Summary

2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation

3 Summary

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 16 / 31

Page 34: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Summary

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 17 / 31

An iterative scheme for local optimization

We have seen 4 different methods all based on the same iterative scheme

mk+1 = mk + αk ∆mk (3)

Page 35: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Summary

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 17 / 31

An iterative scheme for local optimization

We have seen 4 different methods all based on the same iterative scheme

mk+1 = mk + αk ∆mk (3)

Nonlinear optimization methods

The four method only differ in the way to compute ∆mk

Steepest descent ∆mk = −∇f (mk )Nonlinear CG ∆mk = −∇f (mk ) + βk ∆mk−1

l-BFGS ∆mk = −Qk∇f (mk ), Qk ' H−1k

Truncated Newton H(mk )∆mk = −∇f (mk ) (solved with CG)

Page 36: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Summary

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 17 / 31

Large-scale applications

From this quick overview we see that the two key quantities to be estimated forthe solution of

minm

f (m) =1

2‖dcal (m)− dobs‖2 (3)

are

The gradient of the misfit function ∇f (m)

Hessian vector products H(m)v for a given v (only for the truncated Newtonmethod)

We shall see in the next part how to compute it at a reasonablecomputational cost (memory imprint and flops) for large-scale applications

using adjoint state methods

Page 37: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Outline

1 Numerical optimization methods for large-scale smooth unconstrained minimization problems

2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation

Gradient computation of a nonlinear least-squares functionFirst-order adjoint state methodSecond-order adjoint state method

3 Summary

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 18 / 31

Page 38: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Outline

1 Numerical optimization methods for large-scale smooth unconstrained minimization problems

2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation

Gradient computation of a nonlinear least-squares functionFirst-order adjoint state methodSecond-order adjoint state method

3 Summary

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 19 / 31

Page 39: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Gradient computation of a nonlinear least-squares function

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 20 / 31

Framework

We consider the problem

minm

f (m) =1

2‖dcal (m)− dobs‖2

Page 40: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Gradient computation of a nonlinear least-squares function

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 20 / 31

For a perturbation dm we have

f (m + dm) =1

2‖dcal (m + dm)− dobs‖2

1

2‖dcal (m)− dobs + J(m)dm + o(‖dm‖2)‖2

where

J(m) =∂dcal

∂m

is the Jacobian matrix

Page 41: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Gradient computation of a nonlinear least-squares function

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 20 / 31

For a perturbation dm we have

f (m + dm) =1

2‖dcal (m + dm)− dobs‖2

1

2‖dcal (m)− dobs + J(m)dm + o(‖dm‖2)‖2

where

J(m) =∂dcal

∂m

is the Jacobian matrix

f (m + dm) =1

2‖dcal (m)− dobs‖2 + (dcal − dobs , J(m)dm) + o(‖dm‖2)

1

2‖dcal (m)− dobs‖2 +

“J(m)T (dcal − dobs ) , dm

”+ o(‖dm‖2)

Page 42: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Gradient computation of a nonlinear least-squares function

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 20 / 31

For a perturbation dm we have

f (m + dm) =1

2‖dcal (m + dm)− dobs‖2

1

2‖dcal (m)− dobs + J(m)dm + o(‖dm‖2)‖2

where

J(m) =∂dcal

∂m

is the Jacobian matrix

f (m + dm) =1

2‖dcal (m)− dobs‖2 + (dcal − dobs , J(m)dm) + o(‖dm‖2)

1

2‖dcal (m)− dobs‖2 +

“J(m)T (dcal − dobs ) , dm

”+ o(‖dm‖2)

Therefore

f (m + dm)− f (m) =“J(m)T (dcal − dobs ) , dm

”+ o(‖dm‖2)

Page 43: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Gradient computation of a nonlinear least-squares function

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 20 / 31

For a perturbation dm we have

f (m + dm) =1

2‖dcal (m + dm)− dobs‖2

1

2‖dcal (m)− dobs + J(m)dm + o(‖dm‖2)‖2

where

J(m) =∂dcal

∂m

is the Jacobian matrix

f (m + dm) =1

2‖dcal (m)− dobs‖2 + (dcal − dobs , J(m)dm) + o(‖dm‖2)

1

2‖dcal (m)− dobs‖2 +

“J(m)T (dcal − dobs ) , dm

”+ o(‖dm‖2)

Therefore

f (m + dm)− f (m) =“J(m)T (dcal − dobs ) , dm

”+ o(‖dm‖2)

∇f (m) = J(m)T (dcal − dobs )

Page 44: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Gradient computation of a nonlinear least-squares function

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 20 / 31

Implementation for large-scale applications

The size of J(m) can be problematic for large scale applications

After discretization it is a matrix with N rows and M columns where

1. N is the number of discrete data

2. M is the number of discrete model parameters

For Full Waveform Inversion for instance, we can have approximately

N ' 1010, M ' 109

This prevents from

1. Computing J(m) at each iteration of the inversion

2. Storing J(m) (or on disk but then expensive I/O and the performanceseverely decreases)

Page 45: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Gradient computation of a nonlinear least-squares function

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 20 / 31

Can we avoid computing the Jacobian matrix?

Yes, using adjoint state methods

Page 46: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Outline

1 Numerical optimization methods for large-scale smooth unconstrained minimization problems

2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation

Gradient computation of a nonlinear least-squares functionFirst-order adjoint state methodSecond-order adjoint state method

3 Summary

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 21 / 31

Page 47: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

First-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 22 / 31

Specializing the forward problem

Now the problem is specialized such that

dcal (m) = Ru(m)

where u(m) satisfiesA(m, ∂x , ∂y , ∂z )u = s,

u is the solution of the PDE (wavefield for instance) in all the volume

R is an extraction operator as, most of the time, only partial measurementsare available

Page 48: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

First-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 22 / 31

References

Adjoint state method come from optimal control theory and preliminary workof (Lions, 1968)

It has been first applied to seismic imaging by (Chavent, 1974)

A nice review of its application in this field has been proposed by (Plessix,2006)

Page 49: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

First-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31

The Lagrangian function

From constrained optimization, we introduce the function

L(m, u, λ) =1

2‖Ru − d‖2 + (A(m, ∂x , ∂y , ∂z )u − s, λ)

Page 50: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

First-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31

The Lagrangian function

From constrained optimization, we introduce the function

L(m, u, λ) =1

2‖Ru − d‖2 + (A(m, ∂x , ∂y , ∂z )u − s, λ)

Link with the misfit function

Let u(m) be the solution of the forward problem for a given m, then

L(m, u(m), λ) =1

2‖Ru(m)− d‖2 = f (m)

Page 51: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

First-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31

The Lagrangian function

From constrained optimization, we introduce the function

L(m, u, λ) =1

2‖Ru − d‖2 + (A(m, ∂x , ∂y , ∂z )u − s, λ)

Link with the misfit function

Let u(m) be the solution of the forward problem for a given m, then

L(m, u(m), λ) =1

2‖Ru(m)− d‖2 = f (m)

Link with the gradient of the misfit function

Therefore∂L(m, u(m), λ)

∂m= ∇f (m)

Page 52: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

First-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31

Expending

This means that„∂A(m, ∂x , ∂y , ∂z )

∂mu(m), λ

«+∂L(m, u(m), λ)

∂u

∂u(m)

∂m= ∇f (m)

Page 53: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

First-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31

Expending

This means that„∂A(m, ∂x , ∂y , ∂z )

∂mu(m), λ

«+∂L(m, u(m), λ)

∂u

∂u(m)

∂m= ∇f (m)

Potential simplification

Therefore, if we define λ(m) such that

∂L`m, u(m), λ(m)

´∂u

= 0

we have „∂A(m, ∂x , ∂y , ∂z )

∂mu(m), λ(m)

«= ∇f (m)

Page 54: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

First-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31

Adjoint state formula

What does mean∂L`m, u(m), λ(m)

´∂u

= 0?

Page 55: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

First-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31

Consider a perturbation du. We have

L(m, u + du, λ) =1

2‖Ru − dobs + Rdu‖2 + (A(m)u − s + A(m)du, λ)

=1

2‖Ru − dobs‖2 + (Ru − dobs ,Rdu) + (A(m)u − s, , λ)

+ (A(m)du, λ) + o(‖du‖2)

= L(m, u, λ) +“RT (Ru − dobs ), du

”+“du,A(m)Tλ

”+ o(‖du‖2)

= L(m, u, λ) +“A(m)Tλ+ RT (Ru − dobs ), du

”+ o(‖du‖2)

Page 56: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

First-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31

Consider a perturbation du. We have

L(m, u + du, λ) =1

2‖Ru − dobs + Rdu‖2 + (A(m)u − s + A(m)du, λ)

=1

2‖Ru − dobs‖2 + (Ru − dobs ,Rdu) + (A(m)u − s, , λ)

+ (A(m)du, λ) + o(‖du‖2)

= L(m, u, λ) +“RT (Ru − dobs ), du

”+“du,A(m)Tλ

”+ o(‖du‖2)

= L(m, u, λ) +“A(m)Tλ+ RT (Ru − dobs ), du

”+ o(‖du‖2)

Therefore∂L`m, u(m), λ(m)

´∂u

= A(m)Tλ+ RT (Ru − dobs )

Page 57: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

First-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31

Adjoint state equation

Remember we are looking for λ(m) such that

∂L`m, u(m), λ(m)

´∂u

= 0

This simply means that λ(m) should be the solution of the adjoint PDE

A(m)Tλ+ RT (Ru(m)− dobs ) = 0

Page 58: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

First-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31

Adjoint state equation

Remember we are looking for λ(m) such that

∂L`m, u(m), λ(m)

´∂u

= 0

This simply means that λ(m) should be the solution of the adjoint PDE

A(m)Tλ+ RT (Ru(m)− dobs ) = 0

Self-adjoint case

In some cases, the forward problem is self adjoint, and the adjoint state λ(m)is the solution of the same equation than u(m) except that the source term isdifferent

In addition, the source term implies u(m) has been computed before hand, asit depends on this field

Page 59: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

First-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31

Summary

We have seen that we can compute the gradient of the misfit functionfollowing the formula

∇f (m) =

„∂A(m, ∂x , ∂y , ∂z )

∂mu(m), λ(m)

«where u(m) satisfies

A(m, ∂x , ∂y , ∂z )u = s,

and λ(m) satisfies

A(m, ∂x , ∂y , ∂z )Tλ+ RT (Ru(m)− dobs ) = 0

Page 60: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

First-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31

Implementation issues

What are the benefits of the adjoint-state approach?

To compute the gradient, we first have to compute u(m): first PDE solve

Then we compute λ(m): second PDE solve

Finally we form the gradient through the formula

∇f (m) =

„∂A(m, ∂x , ∂y , ∂z )

∂mu(m), λ(m)

«

Page 61: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

First-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31

Implementation issues

What are the benefits of the adjoint-state approach?

To compute the gradient, we first have to compute u(m): first PDE solve

Then we compute λ(m): second PDE solve

Finally we form the gradient through the formula

∇f (m) =

„∂A(m, ∂x , ∂y , ∂z )

∂mu(m), λ(m)

«

The Jacobian matrix has never to be formed nor stored explicitly!

Page 62: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Outline

1 Numerical optimization methods for large-scale smooth unconstrained minimization problems

2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation

Gradient computation of a nonlinear least-squares functionFirst-order adjoint state methodSecond-order adjoint state method

3 Summary

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 24 / 31

Page 63: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Second-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 25 / 31

Computing Hessian-vector product

We have seen that in the particular case of the truncated Newton method, it isrequired to know how to compute, for any v , the Hessian-matrix product

H(m)v ,

However, as it is the case for the Jacobian matrix J(m) the size of matrix H(m)for large-scale application is such that it cannot be computed explicitly nor stored

Again, the adjoint-state method should allow us to overcome this difficultysee (Fichtner and Trampert, 2011; Epanomeritakis et al., 2008; Metivier et al.,

2013)

Page 64: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Second-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 25 / 31

Principle of the method

Consider the functionhv (m) = (∇f (m), v)

Page 65: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Second-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 25 / 31

Principle of the method

Consider the functionhv (m) = (∇f (m), v)

For a perturbation dm we have

hv (m + dm) = (∇f (m + dm), v)

= (∇f (m) + H(m)dm, v) + o(‖dm‖2)

= (∇f (m), v) + (H(m)dm, v) + o(‖dm‖2)

= (∇f (m), v) + (dm,H(m)v) + o(‖dm‖2)

= hv (m) + (dm,H(m)v) + o(‖dm‖2)

Page 66: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Second-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 25 / 31

Principle of the method

Consider the functionhv (m) = (∇f (m), v)

For a perturbation dm we have

hv (m + dm) = (∇f (m + dm), v)

= (∇f (m) + H(m)dm, v) + o(‖dm‖2)

= (∇f (m), v) + (H(m)dm, v) + o(‖dm‖2)

= (∇f (m), v) + (dm,H(m)v) + o(‖dm‖2)

= hv (m) + (dm,H(m)v) + o(‖dm‖2)

Hv through the gradient of hv

Therefore∇hv (m) = H(m)v

Page 67: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Second-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 25 / 31

Principle of the method

Consider the functionhv (m) = (∇f (m), v)

For a perturbation dm we have

hv (m + dm) = (∇f (m + dm), v)

= (∇f (m) + H(m)dm, v) + o(‖dm‖2)

= (∇f (m), v) + (H(m)dm, v) + o(‖dm‖2)

= (∇f (m), v) + (dm,H(m)v) + o(‖dm‖2)

= hv (m) + (dm,H(m)v) + o(‖dm‖2)

Hv through the gradient of hv

Therefore∇hv (m) = H(m)v

All we have to do is to apply the previous strategy to the function hv (m)!

Page 68: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Second-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 26 / 31

Consider the new Lagrangian function

Lv (m, u, λ, g , µ1, µ2, µ3) = (g , v) +

„∂A(m)

∂mu

«T

λ− g , µ1

!+

(A(m)Tλ− RT (Ru − d), µ2)+

(A(m)u − s, µ3)

Page 69: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Second-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 26 / 31

Consider the new Lagrangian function

Lv (m, u, λ, g , µ1, µ2, µ3) = (g , v) +

„∂A(m)

∂mu

«T

λ− g , µ1

!+

(A(m)Tλ− RT (Ru − d), µ2)+

(A(m)u − s, µ3)

For u = u(m), λ = λ(m), g = g(m) respectively solutions of

A(m)u = s, A(m)Tλ = RT (Ru(m)− dobs ), g(m) =

„∂A(m)

∂mu(m)

«T

λ(m)

we haveLv (m, u(m), λ(m), g(m), µ1, µ2, µ3) = hv (m)

Page 70: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Second-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 26 / 31

Consider the new Lagrangian function

Lv (m, u, λ, g , µ1, µ2, µ3) = (g , v) +

„∂A(m)

∂mu

«T

λ− g , µ1

!+

(A(m)Tλ− RT (Ru − d), µ2)+

(A(m)u − s, µ3)

For u = u(m), λ = λ(m), g = g(m) respectively solutions of

A(m)u = s, A(m)Tλ = RT (Ru(m)− dobs ), g(m) =

„∂A(m)

∂mu(m)

«T

λ(m)

we haveLv (m, u(m), λ(m), g(m), µ1, µ2, µ3) = hv (m)

Hence∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)

∂m= ∇hv (m) = H(m)v

Page 71: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Second-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 26 / 31

Again, we develop the previous expression

∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)

∂m= „

∂2A(m)

∂m2u(m)

«T

λ(m), µ1

!+

„∂A(m)T

∂mλ(m), µ2

«+

„∂A(m)

∂mu(m), µ3

«+

∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)

∂u

∂u

∂m+

∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)

∂λ

∂λ

∂m+

∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)

∂g

∂g

∂m

Page 72: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Second-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 26 / 31

Now we look for µ1, µ2, µ3 such that8>>>>>>>><>>>>>>>>:

∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)

∂u= 0

∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)

∂λ= 0

∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)

∂g= 0

Page 73: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Second-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 26 / 31

Now we look for µ1, µ2, µ3 such that8>>>>>>>><>>>>>>>>:

∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)

∂u= 0

∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)

∂λ= 0

∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)

∂g= 0

This is equivalent to8>>>>>>>><>>>>>>>>:

„∂A

∂mµ1

«T

λ(m) + RT Rµ2 + A(m)Tµ3 = 0

„∂A

∂mu(m)

«T

µ1 + A(m)µ2 = 0

v − µ1 = 0

Page 74: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Second-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 26 / 31

Reorganizing these equations, we find that8>>>>>>>><>>>>>>>>:

µ1 = v

A(m)µ2 = −„∂A

∂mu(m)

«T

v

A(m)Tµ3 = −„∂A

∂mv

«T

λ(m) + RT Rµ2

Page 75: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Second-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 26 / 31

Reorganizing these equations, we find that8>>>>>>>><>>>>>>>>:

µ1 = v

A(m)µ2 = −„∂A

∂mu(m)

«T

v

A(m)Tµ3 = −„∂A

∂mv

«T

λ(m) + RT Rµ2

Implementation

µ1 is given for free: it is v

µ2 is the solution of a forward problem involving a new source term whichdepends on v and u(m)

µ3 is the solution of an adjoint problem involving a new source term whichdepends on b, λ(m) and µ2

Page 76: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Second-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 26 / 31

Summary

The computation of H(m)v for a given v can be obtained through the formula

H(m)v =

„∂2A(m)

∂m2u(m)

«T

λ(m), µ1

!+„

∂A(m)T

∂mλ(m), µ2

«+

„∂A(m)

∂mu(m), µ3

« (4)

Page 77: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Second-order adjoint state method

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 26 / 31

Summary

The computation of H(m)v for a given v can be obtained through the formula

H(m)v =

„∂2A(m)

∂m2u(m)

«T

λ(m), µ1

!+„

∂A(m)T

∂mλ(m), µ2

«+

„∂A(m)

∂mu(m), µ3

« (4)

where

Forward and adjoint simulations

u(m) is computed as a solution of the forward problem

λ(m) is computed as a solution of the adjoint problem

µ2 is computed as a solution of the forward problem for a new source term

µ3 is computed as a solution of the adjoint problem for a new source term

Page 78: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Outline

1 Numerical optimization methods for large-scale smooth unconstrained minimization problems

2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation

3 Summary

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 27 / 31

Page 79: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Summary

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 28 / 31

Optimization methods for nonlinear least-squares problems

minm

f (m) =1

2‖dcal (m)− dobs‖2

Page 80: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Summary

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 28 / 31

Optimization methods for nonlinear least-squares problems

minm

f (m) =1

2‖dcal (m)− dobs‖2

An iterative scheme for local optimization

Local optimization methods are all based on the same iterative scheme

mk+1 = mk + αk ∆mk (5)

Page 81: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Summary

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 28 / 31

Optimization methods for nonlinear least-squares problems

minm

f (m) =1

2‖dcal (m)− dobs‖2

An iterative scheme for local optimization

Local optimization methods are all based on the same iterative scheme

mk+1 = mk + αk ∆mk (5)

Four Nonlinear optimization methods

The differences come from the computation of ∆mk

Steepest descent ∆mk = −∇f (mk )Nonlinear CG ∆mk = −∇f (mk ) + βk ∆mk−1

l-BFGS ∆mk = −Qk∇f (mk ), Qk ' H−1k

Truncated Newton H(mk )∆mk = −∇f (mk ) (solved with CG)

Page 82: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Summary

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 28 / 31

Adjoint methods

The gradient can be computed through the first-order adjoint method at the price

1 forward modeling

1 adjoint modeling

The Hessian-vector product (only required for truncated Newton) can becomputed through second-order adjoint method at the price

1 additional forward modeling

1 additional adjoint modeling

Page 83: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

SEISCOPE Toolbox

A set of optimization routines in FORTRAN90

Optimization routines for differentiable functions

Steepest-descent, nonlinear conjugate gradient

l-BFGS, truncated Newton

Implemented using a reverse communication protocol: the user is in charge forcomputing gradient and Hessian-vector product

Open-source code available here

https://seiscope2.obs.ujf-grenoble.fr/SEISCOPE-OPTIMIZATION-TOOLBOX

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 29 / 31

Page 84: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Acknowledgments

Thank you for your attention

National HPC facilities of GENCI-IDRIS-CINES under grant Grant 046091

Local HPC facilities of CIMENT-SCCI (Univ. Grenoble) and SIGAMM (Obs. Nice)

SEICOPE sponsors : http://seiscope2.osug.fr

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 30 / 31

Page 85: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier

Few references

Chavent, G. (1974). Identification of parameter distributed systems. In Goodson, R. and Polis, M., editors,Identification of function parameters in partial differential equations, pages 31–48. American Society ofMechanical Engineers, New York.

Epanomeritakis, I., Akcelik, V., Ghattas, O., and Bielak, J. (2008). A Newton-CG method for large-scalethree-dimensional elastic full waveform seismic inversion. Inverse Problems, 24:1–26.

Fichtner, A. and Trampert, J. (2011). Hessian kernels of seismic data functionals based upon adjointtechniques. Geophysical Journal International, 185(2):775–798.

Lions, J. L. (1968). Controle optimal de systemes gouvernes par des equations aux derivees partielles. Dunod,Paris.

Metivier, L., Brossier, R., Virieux, J., and Operto, S. (2013). Full Waveform Inversion and the truncatedNewton method. SIAM Journal On Scientific Computing, 35(2):B401–B437.

Nash, S. G. (2000). A survey of truncated Newton methods. Journal of Computational and AppliedMathematics, 124:45–59.

Nocedal, J. (1980). Updating Quasi-Newton Matrices With Limited Storage. Mathematics of Computation,35(151):773–782.

Nocedal, J. and Wright, S. J. (2006). Numerical Optimization. Springer, 2nd edition.

Plessix, R. E. (2006). A review of the adjoint-state method for computing the gradient of a functional withgeophysical applications. Geophysical Journal International, 167(2):495–503.

Saad, Y. (2003). Iterative methods for sparse linear systems. SIAM, Philadelphia.

L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 31 / 31