Low-rank Techniques: from Matrix Equations to High … · 2020. 3. 23. · Sergey Dolgov (U Bath)...

Low-rank Techniques:

from Matrix Equations to High-dimensional PDEs

Peter Benner

Joint work withSergey Dolgov (U Bath)

Boris Khoromskii and Venera Khoromskaia (MPI DCTS and MPI MIS Leipzig)Patrick Kurschner and Jens Saak (MPI DCTS)

Akwum Onwunta (MPI DCTS/U Maryland soon)

Martin Stoll (TU Chemnitz)

17 November 2017

Landscapes in Mathematical SciencesUniversity of Bath

The Max Planck Institute (MPI) in Magdeburg

The Max Planck Society

operates 84 institutes — 79 in Germany, 2 inItaly, 1 each in The Netherlands, Luxembourg,and the USA;

has ∼ 23,000 employees;

had 18 Nobel Laureates since 1948.

”The first MPI in engineering. . . ”

MPI Magdeburg

founded 1998

4 departments (directors)

9 research groups

budget ∼ 15 Mio. EUR

∼ 230 employees

130 scientists,

doing research in

biotechnologychemical engineeringprocess engineeringenergy conversionapplied math

©Peter Benner Low-Rank Techniques 2/40

Outline

1. Motivation

2. Solving Large-Scale Sylvester and Lyapunov EquationsSome BasicsThe Low-Rank StructureLR-ADI MethodThe New LR-ADI Applied to Lyapunov Equations

3. From Matrix Equations to PDEs in d DimensionsThe Curse of DimensionalityTensor TechniquesNumerical Examples


Outline

1. Motivation

2. Solving Large-Scale Sylvester and Lyapunov Equations

3. From Matrix Equations to PDEs in d Dimensions


Prelude — 20 years of working in low-rank methods

Model Reduction in Frequency Domain

Approximate the linear control system

x = Ax + Bu, A ∈ Rn×n, B ∈ Rn×m,y = Cx + Du, C ∈ Rq×n, D ∈ Rq×m,

by reduced-order system

˙x = Ax + Bu, A ∈ Rr×r , B ∈ Rr×m,

y = C x + Du, C ∈ Rq×r , D ∈ Rq×m

of order r n, such that

‖ y − y ‖ = ‖Gu − Gu ‖ ≤ ‖G − G ‖ · ‖ u ‖ < tolerance · ‖ u ‖,

whereG(s) = C(sIn − A)−1B + D, G(s) = C(sIr − A)−1B + D

are the associated transfer functions of the system.



Balanced Truncation [Moore ’81]

System Σ :

x(t) = Ax(t) + Bu(t),

y(t) = Cx(t),with A stable, i.e., Λ (A) ⊂ C−,

is balanced, if system Gramians = solutions P,Q of Lyapunov equations

AP + PAT + BBT = 0, ATQ + QA + CTC = 0,

satisfy: P = Q = diag(σ1, . . . , σn) with σ1 ≥ σ2 ≥ . . . ≥ σn > 0.

σ1, . . . , σn are the Hankel singular values (HSVs) of Σ.

Compute balanced realization of Σ via state-space transformation

T : (A,B,C) 7→ (TAT−1,TB,CT−1)

=

([A11 A12

A21 A22

],

[B1

B2

],[C1 C2

]).

Truncation (A, B, C) = (A11,B1,C1).




. . . is one of the greatest model reduction techniques for linear systems




. . . is one of the greatest model reduction techniques for linear systems since

the reduced-order model is stable with HSVs σ1, . . . , σr ;

it allows adaptive choice of r via computable error bound (there is a free lunch!):

‖ y − y ‖2 ≤ ‖G − G‖H∞‖ u ‖2 ≤(

2∑n

k=r+1σk

)‖ u ‖2.




. . . is one of the greatest model reduction techniques for linear systems!

But: as suggested in [Moore ’81], it is an O(n3) method!(Bottleneck: solution of the Lyapunov equations, followed by SVD of their Cholesky factors to

determine necessary parts of T and T−1.)




. . . is one of the greatest model reduction techniques for linear systems!

But: as suggested in [Moore ’81], it is an O(n3) method!(Bottleneck: solution of the Lyapunov equations, followed by SVD of their Cholesky factors to

determine necessary parts of T and T−1.)

In 1997, we derived a method to compute low-rank factors S ,R ∈ Rn×`, ` n, suchthat P ≈ SST , Q ≈ RRT [1] and then showed that the truncation could be obtained viacheap `S × `R SVD of STR [2]!

(Some similar ideas: low-rank Lyapunov solver [Saad ’90, Jaimoukha/Kasenally ’94], small-scaleSVD in BT [Penzl ’98, Li/White ’99].)

P. Benner and E.S. Quintana-Ortı. Solving stable generalized Lyapunov equations with the matrix sign function. Numerical Algorithms,20(1):75–100, 1999. (Preprint SFB393 97–23)

P. Benner, E.S. Quintana-Ortı, and G. Quintana-Ortı. Balanced truncation model reduction of large-scale dense systems on parallel computers.Mathematical and Computer Modeling of Dynamical Systems, 6(4):383–405, 2000.


Outline

1. Motivation

2. Solving Large-Scale Sylvester and Lyapunov EquationsSome BasicsThe Low-Rank StructureLR-ADI MethodThe New LR-ADI Applied to Lyapunov Equations

3. From Matrix Equations to PDEs in d Dimensions


Solving Large-Scale Sylvester and Lyapunov Equations

Sylvester equation

James Joseph Sylvester(September 3, 1814 – March 15, 1897)

AX + XB = C .

Lyapunov equation

Alexander Michailowitsch Ljapunow(June 6, 1857 – November 3, 1918)

AX + XAT = C , C = CT .



Generalizations of Sylvester (AX + XB = C) and Lyapunov (AX + XAT = C) Equations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Generalized Sylvester equation:

AXD + EXB = C .

Generalized Lyapunov equation:

AXET + EXAT = C , C = CT .

Stein equation:X − AXB = C .

(Generalized) discrete Lyapunov/Stein equation:

EXET − AXAT = C , C = CT .

Note:

Consider only regular cases, having a unique solution!

Solutions of symmetric cases are symmetric, X = XT ∈ Rn×n; otherwise, X ∈ Rn×`

with n 6= ` in general.


Applications

Gramian-based model order reduction

Linear stability analysis

Continuation algorithms for detecting Hopf bifurcations

Determining metastable equilibrium points of stochastic differential equations

Block-triangularization of matrices and matrix pencils

Computing covariance matrices, e.g., in biochemical reaction networks

Numerical solution of fractional partial differential equations

Solving algebraic Riccati equations via Newton’s method

. . .


Existence of Solutions of Linear Matrix Equations

Exemplarily, consider the generalized Sylvester equation

AXD + EXB = C . (1)

Vectorization (using Kronecker product) representation as linear system:(DT ⊗ A + BT ⊗ E︸︷︷︸

=:A

)vec(X )︸︷︷︸

=:x

= vec(C )︸︷︷︸=:c

⇐⇒ Ax = c .

=⇒ ”(1) has a unique solution ⇐⇒ A is nonsingular”

Lemma

Λ (A) = αj + βk | αj ∈ Λ (A,E ), βk ∈ Λ (B,D).Hence, (1) has unique solution ⇐⇒ Λ (A,E ) ∩ −Λ (B,D) = ∅.

Example: Lyapunov equation AX + XAT = C has unique solution⇐⇒ @ µ ∈ C : ±µ ∈ Λ (A).



The Low-Rank Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Sylvester Equations

Find X ∈ Rn×m solving

AX − XB = FGT ,

where A ∈ Rn×n, B ∈ Rm×m, F ∈ Rn×r , G ∈ Rm×r .

If n,m large, but r n,m X has a small numerical rank.[Penzl 1999, Grasedyck 2004,

Antoulas/Sorensen/Zhou 2002]

rank(X , τ) = f min(n,m)

300 600 900

0

−10

u

σ155 ≈ u

singular values of 1600× 900 example

σ(X )

Compute low-rank solution factors Z ∈ Rn×f , Y ∈ Rm×f ,D ∈ Rf×f , such that X ≈ ZDY T with f min(n,m).



The Low-Rank Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Lyapunov Equations

Find X ∈ Rn×n solving

AX+XAT = −FFT ,

where A ∈ Rn×n, F ∈ Rn×r .

If n large, but r n X has a small numerical rank.[Penzl 1999, Grasedyck 2004,

Antoulas/Sorensen/Zhou 2002]

rank(X , τ) = f n

300 600 900

0

−10

u

σ155 ≈ u

singular values of 1600× 900 example

σ(X )

Compute low-rank solution factors Z ∈ Rn×f ,D ∈ Rf×f , such that X ≈ ZDZT with f n.



Theoretical Foundation of the Low-rank Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Lyapunov case: [Grasedyck ’04]

AX + XAT + BBT = 0 ⇐⇒ (In ⊗ A + A⊗ In)︸︷︷︸=:A

vec(X ) = − vec(BBT ).







For stable M, i.e., Λ (M) ⊂ C−, apply

M−1 = −∫ ∞

0

exp(tM)dt

to A and approximate the integral via (sinc) quadrature ⇒

A−1 ≈ −k∑

i=−k

ωi exp(tkA),

with error ∼ exp(−√k) (exp(−k) if A = AT ), then an approximate Lyapunov solution is

given by







For stable M, i.e., Λ (M) ⊂ C−, apply

M−1 = −∫ ∞

0

exp(tM)dt

to A and approximate the integral via (sinc) quadrature ⇒

A−1 ≈ −k∑

i=−k

ωi exp(tkA),

with error ∼ exp(−√k) (exp(−k) if A = AT ), then an approximate Lyapunov solution is

given by

vec(X ) ≈ vec(Xk) =k∑

i=−k

ωi exp(tiA) vec(BBT ).








i=−k


Now observe that

exp(tiA) = exp (ti (In ⊗ A + A⊗ In)) ≡ exp(tiA)⊗ exp(tiA).








i=−k


Now observe that

exp(tiA) = exp (ti (In ⊗ A + A⊗ In)) ≡ exp(tiA)⊗ exp(tiA).

Hence,

vec(Xk) =k∑

i=−k

ωi (exp(tiA)⊗ exp(tiA)) vec(BBT )







Hence,

vec(Xk) =k∑

i=−k

ωi (exp(tiA)⊗ exp(tiA)) vec(BBT )

=⇒ Xk =k∑

i=−k

ωi exp(tiA)BBT exp(tiAT ) ≡

k∑i=−k

ωiBiBTi ,

so that rank (Xk) ≤ (2k + 1)m with

||X − Xk ||2 . exp(−√k) ( exp(−k) for A = AT )!

Extended to AX + XAT +∑m

j=1 NjXNTj + BBT = 0. [B./Breiten 2012]


Exploiting the Low-Rank Structure I

For simplicity, consider again the Lyapunov equation AX + XAT + BBT = 0.

Definition

The matrix sign function of M ∈ Rn×n with no purely imaginary eigenvalues is

sign (M) = sign

(T

[J− 0

0 J+

]T−1

)= T

[−I 0

0 I

]T−1

with J± containing all Jordan blocks of M corresponding to eigenvalues withpositive/negative real parts.




Definition

The matrix sign function of M ∈ Rn×n with no purely imaginary eigenvalues is

sign (M) = sign

(T

[J− 0

0 J+

]T−1

)= T

[−I 0

0 I

]T−1

with J± containing all Jordan blocks of M corresponding to eigenvalues withpositive/negative real parts.

Observations

1. sign

([A BBT

0 −AT

])=

[−I 2X

0 I

].

2. sign (M) = limk→∞Mk with Mk+1 = 12(Mk + M−1

k ) if M0 = M.




Observations

1. sign

([A BBT

0 −AT

])=

[−I 2X

0 I

].


k ) if M0 = M.

Sign function iteration for solving Lyapunov equations

M0 =

[A BBT

0 −AT

], and inversion formula for block-triangular matrices:

Ak+1 ← 1

2(Ak + A−1

k )

Bk+1Bk+1 ← 1

2(BkB

Tk + A−1

k BkBTk A−T

k )




Observations

1. sign

([A BBT

0 −AT

])=

[−I 2X

0 I

].


k ) if M0 = M.


M0 =

[A BBT

0 −AT


Ak+1 ← 1

2(Ak + A−1

k )

Bk+1BTk+1 ← 1

2(BkB

Tk + A−1

k Bk(A−1k Bk)T )




Observations

1. sign

([A BBT

0 −AT

])=

[−I 2X

0 I

].


k ) if M0 = M.


M0 =

[A BBT

0 −AT


Ak+1 ← 1

2(Ak + A−1

k )

Bk+1BTk+1 ← 1

2(BkB

Tk + A−1

k Bk(A−1k Bk)T )

so that Bk → 1√2Z with X = ZZT .




Observations

1. sign

([A BBT

0 −AT

])=

[−I 2X

0 I

].


k ) if M0 = M.

Factored sign function iteration for Lyapunov equations [B./Quintana-Ortı 1997/99]

Ak+1 ← 1

2(Ak + A−1

k )

Bk+1 ← 1√2

[Bk , A−1k Bk ]

Problem: number of columns in Bk doubles each iteration!




Observations

1. sign

([A BBT

0 −AT

])=

[−I 2X

0 I

].


k ) if M0 = M.

Factored sign function iteration for Lyapunov equations [B./Quintana-Ortı 1997/99]

Ak+1 ← 1

2(Ak + A−1

k )

Bk+1 ← 1√2

[Bk , A−1k Bk ]

Problem: number of columns in Bk doubles each iteration!

Cure: truncation operator Bk+1 ← Tε(

1√2[Bk , A

−1k Bk ]

)with, e.g., Tε returning the

scaled left singular vectors of the truncated SVD w.r.t. the numerical rank tolerance ε.


Exploiting the Low-Rank Structure II

Exploiting the Low-rank Structure for Large and Sparse Coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Sylvester equation AX − XB = FGT is equivalent to linear system of equations(Im ⊗ A− BT ⊗ In

)vec(X ) = vec(FGT ).

This cannot be used for numerical solutions unless nm ≤ 1, 000 (or so), as

it requires O(n2m2) of storage;

direct solver needs O(n3m3) flops;

low (tensor-)rank of right-hand side is ignored;

in Lyapunov case, symmetry and possible definiteness are not respected.


Exploiting the Low-Rank Structure II

Exploiting the Low-rank Structure for Large and Sparse Coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Sylvester equation AX − XB = FGT is equivalent to linear system of equations(Im ⊗ A− BT ⊗ In

)vec(X ) = vec(FGT ).

This cannot be used for numerical solutions unless nm ≤ 1, 000 (or so), asit requires O(n2m2) of storage;

direct solver needs O(n3m3) flops;

low (tensor-)rank of right-hand side is ignored;

in Lyapunov case, symmetry and possible definiteness are not respected.

Possible solvers:Standard Krylov subspace solvers in operator from [Hochbruck, Starke, Reichel, . . . ]

Block-Tensor-Krylov subspace methods with truncation [Kressner/Tobler,

Bollhofer/Eppler, B./Breiten, . . . ]

Galerkin-type methods based on (extended, rational) Krylov subspace methods[Jaimoukha, Kasenally, Jbilou, Simoncini, Druskin, Knizhermann,. . . ]

Doubling-type methods [Smith, Chu et al., B./Sadkane/El Khoury, . . . ]

ADI methods [Wachspress, Reichel, Li, Penzl, B., Saak, Kurschner, Opmeer/Reis, . . . ]



LR-ADI Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Sylvester and Stein equations

Let α

k

6= β

k

with α

k

/∈ Λ(B), β

k

/∈ Λ(A), then

AX − XB = FGT︸︷︷︸Sylvester equation

⇔ X = A

k

XB

k

+ (β

k

− α

k

)F

k

G

k

H︸︷︷︸Stein equation

with the Cayley-like (Mobius) transformations

A

k

:= (A− β

k

In)−1(A− α

k

In), B

k

:= (B − α

k

Im)−1(B − β

k

Im),

F

k

:= (A− β

k

In)−1F , G

k

:= (B − α

k

Im)−HG .

fix point iteration

Xk = A

k

Xk−1B

k

+ (β

k

− α

k

)F

k

G

k

H

for k ≥ 1, X0 ∈ Rn×m.

[Wachspress 1988]



LR-ADI Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Sylvester and Stein equations

Let αk 6= βk with αk /∈ Λ(B), βk /∈ Λ(A), then

AX − XB = FGT︸︷︷︸Sylvester equation

⇔ X = AkXBk + (βk − αk)FkGkH︸︷︷︸

Stein equation

with the Cayley-like (Mobius) transformations

Ak := (A− βkIn)−1(A− αkIn), Bk := (B − αkIm)−1(B − βkIm),

Fk := (A− βkIn)−1F , Gk := (B − αkIm)−HG .

alternating directions implicit (ADI) iteration

Xk = AkXk−1Bk + (βk − αk)FkGkH

for k ≥ 1, X0 ∈ Rn×m. [Wachspress 1988]



LR-ADI Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Sylvester ADI iteration [Wachspress 1988]

Xk = AkXk−1Bk + (βk − αk)FkGHk ,Ak := (A− βk In)−1(A− αk In), Bk := (B − αk Im)−1(B − βk Im),

Fk := (A− βk In)−1F ∈ Rn×r , Gk := (B − αk Im)−HG ∈ Cm×r .

Now set X0 = 0 and find factorization Xk = ZkDkYHk

X1 = A1X0B1 + (β1 − α1)F1GH1

⇒ V1 := Z1= (A− β1In)−1F ∈ Rn×r ,

D1= (β1 − α1)Ir ∈ Rr×r

,

W1 := Y1 = (B − α1Im)−HG ∈ Cm×r .



LR-ADI Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .





X1 = (β1 − α1)(A− β1In)−1FGT (B − α1Im)−1

⇒ V1 := Z1= (A− β1In)−1F ∈ Rn×r ,

D1= (β1 − α1)Ir ∈ Rr×r ,

W1 := Y1 = (B − α1Im)−HG ∈ Cm×r .



LR-ADI Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .





X2 = A2X1B2 + (β2 − α2)F2GH2V2 = V1 + (β2 − α1)(A + β2I )

−1V1 ∈ Rn×r ,

W2 = W1 + (α2 − β1)(B + α2I )−HW1 ∈ Rm×r ,

Z2 = [Z1, V2],

D2 = diag (D1, (β2 − α2)Ir ),

Y2 = [Y1, W2].



LR-ADI Algorithm [B. 2005, Li/Truhar 2008, B./Li/Truhar 2009]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Algorithm 1: Low-rank Sylvester ADI / factored ADI (fADI)

Input : Matrices defining AX − XB = FGT and shift parameters α1, . . . , αkmax,β1, . . . , βkmax.

Output: Z , D, Y such that ZDY H ≈ X .

1 Z1 = V1 = (A− β1In)−1F ,

2 Y1 = W1 = (B − α1Im)−HG .3 D1 = (β1 − α1)Ir4 for k = 2, . . . , kmax do5 Vk = Vk−1 + (βk − αk−1)(A− βk In)−1Vk−1.

6 Wk = Wk−1 + (αk − βk−1)(B − αk In)−HWk−1.7 Update solution factors

Zk = [Zk−1,Vk ], Yk = [Yk−1,Wk ], Dk = diag (Dk−1, (βk−αk)Ir ).



LR-ADI Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Disadvantages of Low-Rank ADI as of 2012:1. No efficient stopping criteria:

Difference in iterates norm of added columns/step: not reliable, stops oftentoo late.Residual is a full dense matrix, can not be calculated as such.

2. Requires complex arithmetic for real coefficients when complex shifts areused.

3. Expensive (only semi-automatic) set-up phase to precompute ADI shifts.

None of these disadvantages exists as of today

Key observation: residual is low-rank matrix, with rank less than or equalto that of right-hand side!

=⇒ speed-ups old vs. new LR-ADI can be up to 20!


Other Low-rank Lyapunov Solvers. . .

. . . for Lyapunov equation 0 = AX + XAT + BBT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Projection-based methods for Lyapunov equations with A + AT < 0:

1. Compute orthonormal basis range (Z), Z ∈ Rn×r , for subspace Z ⊂ Rn, dimZ = r .

2. Set A := ZTAZ , B := ZTB.

3. Solve small-size Lyapunov equation AX + X AT + BBT = 0.

4. Use X ≈ ZXZT .

Examples:

Krylov subspace methods, i.e., for m = 1:

Z = K(A,B, r) = spanB,AB,A2B, . . . ,Ar−1B

[Saad 1990, Jaimoukha/Kasenally 1994, Jbilou 2002–2008].

Extended Krylov subspace method (EKSM) [Simoncini 2007],

Z = K(A,B, r) ∪ K(A−1,B, r).

Rational Krylov subspace methods (RKSM) [Druskin/Simoncini 2011].


The New LR-ADI Applied to Lyapunov Equations

Example: an ocean circulation problem [Van Gijzen et al. 1998]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

FEM discretization of a simple 3D ocean circulation model (barotropic,constant depth) stiffness matrix −A with n = 42, 249, choose artificialconstant term B = rand(n,5).

Convergence history:

0 100 200 300 400 50010−10

10−5

100

TOL

coldim(Z )

‖R‖/‖B

TB‖

LR-ADI with adaptive shifts vs. EKSM

LR-ADIEKSM

CPU times: LR-ADI ≈ 110 sec, EKSM ≈ 135 sec.



Summary & Outlook (Matrix Equations). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Numerical enhancements of low-rank ADI for large Sylvester/Lyapunovequations:

1. low-rank residuals, reformulated implementation,2. compute real low-rank factors in the presence of complex shifts,3. self-generating shift strategies (quantification in progress).

For diffusion-convection-reaction example:332.02 sec. down to 17.24 sec. acceleration by factor almost 20.

Generalized version enables derivation of low-rank solvers for variousgeneralized Sylvester equations.Ongoing work:

Apply LR-ADI in Newton methods for algebraic Riccati equations

R(X ) = AX + XAT + GGT − XSSTX = 0,

D(X ) = AXAT − EXET + GGT + ATXF (Ir + FTXF )−1FTXA = 0.

For nonlinear AREs seeP. Benner, P. Kurschner, J. Saak. Low-rank Newton-ADI methods for large nonsymmetric algebraic Riccati equations. J. FranklinInst., 2015.


Outline

1. Motivation

2. Solving Large-Scale Sylvester and Lyapunov Equations

3. From Matrix Equations to PDEs in d DimensionsThe Curse of DimensionalityTensor TechniquesNumerical Examples


From Matrix Equations to PDEs in d Dimensions

The Curse of Dimensionality [Bellman 1957]

Increase matrix size of discretized differential operator for h→ h2

by factor 2d .

Rapid Increase of Dimensionality, called Curse of Dimensionality (d > 3).

Consider −∆u = f in [ 0 , 1 ]× [ 0 , 1 ] ⊂ R2, uniformly discretized as

(I ⊗ A + A⊗ I ) x =: Ax = b ⇐⇒ AX + XAT = B

with x = vec (X ) and b = vec (B) with low-rank right hand side B ≈ b1bT2 .

Low-rankness of X := VWT ≈ X follows from properties of A and B, e.g.,[Penzl ’00, Grasedyck ’04].

We solve this using low-rank Krylov subspace solvers. These essentially requirematrix-vector multiplication and vector computations.

Hence, A vec (Xk ) = A vec(VkW

Tk

)= vec

([AVk , Vk ] [Wk , AWk ]T

)The rank of [AVk Vk ] ∈ Rn,2r , [Wk AWk ] ∈ Rnt ,2r increases but can be controlled usingtruncation operator Tε (like in sign function solver for Lyapunov equations). Low-rankKrylov subspace solvers. [Kressner/Tobler, B/Breiten, Savostyanov/Dolgov, . . . ].

Ideas correspond to separation of variables in continuous-time: b(x , y) ≈ b1(x)b2(y).





by factor 2d .



(I ⊗ A + A⊗ I ) x =: Ax = b ⇐⇒ AX + XAT = B





Tk

)= vec


)

The rank of [AVk Vk ] ∈ Rn,2r , [Wk AWk ] ∈ Rnt ,2r increases but can be controlled usingtruncation operator Tε (like in sign function solver for Lyapunov equations). Low-rankKrylov subspace solvers. [Kressner/Tobler, B/Breiten, Savostyanov/Dolgov, . . . ].






by factor 2d .



(I ⊗ A + A⊗ I ) x =: Ax = b ⇐⇒ AX + XAT = B





Tk

)= vec


)The rank of [AVk Vk ] ∈ Rn,2r , [Wk AWk ] ∈ Rnt ,2r increases but can be controlled usingtruncation operator Tε (like in sign function solver for Lyapunov equations). Low-rankKrylov subspace solvers. [Kressner/Tobler, B/Breiten, Savostyanov/Dolgov, . . . ].




Ideas extend to d ≥ 3: let Aj ∈ Rnj×nj , j = 1, . . . , d , be 1d-discretization matrices

corresponding to ”grid points” x(1)j , x

(2)j , . . . , x

(nj )

j , then under certain assumptions, theLaplace operator (w/ homogeneous boundary conditions) can be discretized as

A = A1 ⊗ I ⊗ · · · ⊗ I + I ⊗ A2 ⊗ I ⊗ · · · ⊗ I + . . .+ I ⊗ · · · ⊗ I ⊗ Ad .

Note: if nj = n ∀ j , then A ∈ Rnd×nd !

If source term can be written as f (x1, x2, . . . , xd) = f1(x1)f2(x2) · · · fd(xd), discretizedright-hand side becomes

b = b1 ⊗ b2 ⊗ · · · ⊗ bd ∈ Rnd with bj = [ fj(x(1)j ), . . . , fj(x

(n)j ) ].

Now if the solution x of Ax = b

1. could be written as x = x1 ⊗ x2 ⊗ · · · ⊗ xd with xj ∈ Rn,

2. and could be computed without forming A, b, but working only with Aj , bj ,

then we could hope to reduce the storage and computational complexity from nd to dn!This requires low-rank tensor techniques.

(Note: for x we wıll need a more complex representation as 1. does not hold in general!)





(2)j , . . . , x

(nj )


A = A1 ⊗ I ⊗ · · · ⊗ I + I ⊗ A2 ⊗ I ⊗ · · · ⊗ I + . . .+ I ⊗ · · · ⊗ I ⊗ Ad .




(n)j ) ].




then we could hope to reduce the storage and computational complexity from nd to dn!

This requires low-rank tensor techniques.






(2)j , . . . , x

(nj )


A = A1 ⊗ I ⊗ · · · ⊗ I + I ⊗ A2 ⊗ I ⊗ · · · ⊗ I + . . .+ I ⊗ · · · ⊗ I ⊗ Ad .




(n)j ) ].




then we could hope to reduce the storage and computational complexity from nd to dn!This requires low-rank tensor techniques.



Tensor Techniques

Separation of Variables and Low-rank Approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

n

Approximate: x(i1, . . . , id)︸︷︷︸tensor

≈∑α

x(1)α (i1)x(2)

α (i2) · · · x(d)α (id)︸︷︷︸

tensor product decomposition

.

Goals:

Store and manipulate x O(dn) cost instead of O(nd).

Solve equations Ax = b O(dn2) cost instead of O(n2d).


Data Compression in 2D: Low-Rank Matrices

Discrete separation of variables:x1,1 · · · x1,n

......

xn,1 · · · xn,n

=r∑

α=1

v1,α

...vn,α

[wα,1 · · · wα,n]

+O(ε).

Diagrams:

x

i1 i2

≈ v wα

i1i2

Rank r n.

mem(v) + mem(w) = 2nr n2 = mem(x).

Singular Value Decomposition (SVD)=⇒ ε(r) optimal w.r.t. spectral/Frobenius norm.


Data Compression in Higher Dimensions

Tensor Trains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Matrix Product States/Tensor Train (TT) format[Wilson ’75, White ’93, Verstraete ’04, Oseledets ’09/’11]:

For indices

ip . . . iq = (ip − 1)np+1 · · · nq + (ip+1 − 1)np+2 · · · nq + · · ·+ (iq−1 − 1)nq + iq,

the TT format can be expressed as

x(i1 . . . id) =r∑

α=1

x(1)α1

(i1) · x(2)α1,α2

(i2) · x(3)α2,α3

(i3) · · · x(d)αd−1,αd

(id)

or

x(i1 . . . id) = x(1)(i1) · · · x(d)(id), x(k)(ik) ∈ Rrk−1×rk .

or

x(k) x(k+1) x(d)x(k−1)x(2)x(1)α1 α2 αk−2 αk−1 αk αk+1 αd−1

i1 i2 ik−1 ik ik+1 id

Storage: O(dnr 2).


Overloading Tensor Operations

Always work with factors x(k) ∈ Rrk−1×nk×rk instead of full tensors.

Sum z = x + y increase of tensor rank rz = rx + ry .

TT format for a high-dimensional operator

A(i1 . . . id , j1 . . . jd) = A(1)(i1, j1) · · ·A(d)(id , jd)

Matrix-vector multiplication y = Ax ; tensor rank ry = rA · rx .

Additions and multiplications increase TT ranks.

Decrease ranks quasi-optimally via truncation operator Tε using SVD (orQR).

Iterative linear solvers (and preconditioners) can be implemented in TTformat; we use, e.g., TT-GMRES.


Numerical Examples

Our work

Apply low-rank iterative solvers in TT format to discrete optimality systems resultingfrom

PDE-constrained optimization problems under uncertainty,

discretized by stochastic Galerkin method, i.e., applying Galerkin projection to weakformulation using

implicit Euler (discontinuous Galerkin, dG(0)) in time,

classical finite element methods (FEM) in physical space,

generalized polynomial chaos in N-dimensional parameter space (resulting fromparameterizing uncertain parameters).

This yields naturally a (1 + 2(3) + N)-way tensor representation of solution, which weapproximate in low-rank TT format.

Take home message

Biggest problem solved so far has n = 1.29 · 1015 unknowns (optimality system forunsteady incompressible Navier-Stokes control problem with uncertain viscosity).

Would require ≈ 10 petabytes (PB) = 10, 000 TB to store the solution vector!

Using low-rank tensor techniques, we need ≈ 7 · 107 bytes = 70 GB to solve theoptimality system in MATLAB in less than one hour!


Numerical Examples

Our work



discretized by stochastic Galerkin method.This yields naturally a (1 + 2(3) + N)-way tensor representation of solution, which weapproximate in low-rank TT format.

Take home message





Numerical Examples

Our work



discretized by stochastic Galerkin methodThis yields naturally a (1 + 2(3) + N)-way tensor representation of solution, which weapproximate in low-rank TT format.

Take home message





Optimal Control of Unsteady Heat Equation

Consider the optimization problem

J (t, y , u) =1

2||y − y ||2L2(0,T ;D)⊗L2(Ω) +

α

2||std(y)||2L2(0,T ;D) +

β

2||u||2L2(0,T ;D)⊗L2(Ω)

subject, P-almost surely, to∂y(t, x, ω)

∂t−∇ · (a(x, ω)∇y(t, x, ω)) = u(t, x, ω), in (0,T ]×D × Ω,

y(t, x, ω) = 0, on (0,T ]× ∂D × Ω,

y(0, x, ω) = y0, in D × Ω,

where

for any z : D × Ω→ R, z(x, ·) is a random variable defined on the completeprobability space (Ω,F ,P) for each x ∈ D,∃ 0 < amin < amax <∞ s.t. P(ω ∈ Ω : a(x , ω) ∈ [amin, amax] ∀ x ∈ D) = 1.


Stochastic Galerkin Finite Element Method

Weak formulation of the random PDE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Seek y ∈ H1(0,T ;H1

0 (D)⊗ L2(Ω))

such that, P-almost surely,

〈yt , v〉+ B(y , v) = `(u, v) ∀ v ∈ H10 (D)⊗ L2(Ω),

with the coercive1 bilinear form

B(y , v) :=

∫Ω

∫Da(x, ω)∇y(x, ω) · ∇v(x, ω)dx dP(ω), v , y ∈ H1

0 (D)⊗ L2(Ω),

and

`(u, v) = 〈u(x, ω), v(x, ω)〉

=:

∫Ω

∫Du(x, ω)v(x, ω)dx dP(ω), u, v ∈ H1

0 (D)⊗ L2(Ω).

Coercivity and boundedness of B + Lax-Milgram =⇒ unique solution exists.

1due to the positivity assumption on a(x, ω)



Weak formulation of the optimality system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Theorem [Chen/Quarteroni 2014, B./Onwunta/Stoll 2016]

Under appropriate regularity assumptions, there exists a unique adjoint state p andoptimal solution (y , u, p) to the optimal control problem for the random unsteady heatequation, satisfying the stochastic optimality conditions (KKT system) for t ∈ (0,T ]P-almost surely

〈yt , v〉+ B(y , v) = `(u, v), ∀ v ∈ H10 (D)⊗ L2(Ω),

〈pt ,w〉 − B∗(p,w) = `(

(y − y) +α

2S(y),w

), ∀w ∈ H1

0 (D)⊗ L2(Ω),

`(βu − p, w) = 0, ∀ w ∈ L2(D)⊗ L2(Ω),

where

S(y) is the Frechet derivative of ‖std(y)‖2L2(0,T ;D);

B∗ is the adjoint operator of B.



Discretization of the random PDE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

y , p, u are approximated using standard Galerkin ansatz, yieldingapproximations of the form

z(t, x, ω) =P−1∑k=0

J∑j=1

zjk(t)φj(x)ψk(ξ) =P−1∑k=0

zk(t, x)ψk(ξ).

Here,

φjJj=1 are linear finite elements;

ψkP−1k=0 are the P = (N+n)!

N!n!multivariate Legendre polynomials of degree ≤ n.

Implicit Euler/dG(0) used for temporal discretization w/ constant time step τ .


The Fully Discretized Optimal Control Problem

Discrete first order optimality conditions/KKT system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . τM1 0 −KTt

0 βτM2 τN T

−Kt τN 0

yup

=

τMαy0d

,where

M1 = D ⊗ Gα ⊗M =: D ⊗Mα, M2 = D ⊗ G0 ⊗M,

Kt = Int ⊗[∑N

i=0 Gi ⊗ Ki

]+ (C ⊗ G0 ⊗M),

N = Int ⊗ G0 ⊗M,

and

G0 = diag(⟨ψ2

0

⟩,⟨ψ2

1

⟩, . . . ,

⟨ψ2P−1

⟩), Gi (j , k) =

⟨ξiψjψk

⟩, i = 1, . . . ,N,

Gα = G0 + αdiag(

0,⟨ψ2

1

⟩, . . . ,

⟨ψ2P−1

⟩)(with first moments 〈 . 〉 w.r.t. P),

K0 = M + τK0, Ki = τKi , i = 1, . . . ,N,

M,Ki ∈ RJ×J are the mass and stiffness matrices w.r.t. the spatial discretization, where Ki

corresponds to the contributions of the ith KLE term to the stiffness,

C = −diag(ones,−1), D = diag(

12, 1, . . . , 1, 1

2

)∈ Rnt×nt .


The Fully Discretized Optimal Control Problem

Discrete first order optimality conditions/KKT system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . τM1 0 −KTt

0 βτM2 τN T

−Kt τN 0

yup

=

τMαy0d

,

Linear system with 3JPnt unknowns!


Solving the Optimality System

Optimality system leads to saddle point problem[A BT

B 0

].

Very large scale setting, (block-)structured sparsity iterative solution.

Krylov subspace methods for indefinite symmetric systems: MINRES, . . . .

Requires good preconditioner.

Famous three-iterations-convergence result [Murphy/Golub/Wathen 2000]: usingideal preconditioner

P :=

[A 00 −S

]with the Schur complement S := −BA−1BT ,

MINRES finds the exact solution in at most three steps.

Motivates to use mean-based approximate Schur complement preconditioner[A 0

0 S

].

Implemented in TT format TT-MINRES.


Numerical Results

Mean-Based Preconditioned TT-MINRES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

TT-MINRES # iter (t) # iter (t) # iter (t)

nt 25 26 28

dim(A) = 3JPnt 10, 671, 360 21, 342, 720 85, 370, 880

α = 1, tol = 10−3

β = 10−5 6 (285.5) 6 (300.0) 8 (372.2)

β = 10−6 4 (77.6) 4 (130.9) 4 (126.7)

β = 10−8 4 (56.7) 4 (59.4) 4 (64.9)

α = 0, tol = 10−3

β = 10−5 4 (207.3) 6 (366.5) 6 (229.5)

β = 10−6 4 (153.9) 4 (158.3) 4 (172.0)

β = 10−8 2 (35.2) 2 (37.8) 2 (40.0)


Optimal Control 3D Stokes-Brinkman w/ Uncertain Viscosity

State

Control

Mean Standard deviation

Full size: nxnξnt ≈ 3 · 109. Reduction:mem(TT )

mem(full)= 0.002.


Conclusions (High-dimensional PDEs)

Low-rank tensor solver for unsteady heat and Navier-Stokes equations withuncertain viscosity.

Similar techniques used for Stokes(-Brinkman) optimal control problems.

Adapted AMEn (TT) solver to saddle point systems.

To consider next:

Navier-Stokes: many parameters coming from uncertain geometry orKarhunen-Loeve expansion of random fields;Initial results: the more parameters, the more significant is the complexityreduction w.r.t. memory — up to a factor of 109 for the control problem for abackward facing step.exploit multicore technology (need efficient parallelization of AMEn).


References for Optimal Control under Uncertainty

P. Benner, S. Dolgov, A. Onwunta, and M. Stoll.Solving optimal control problems governed by random Navier-Stokes equations using low-rank methods.arXiv Preprint arXiv:1703.06097, March 2017.

P. Benner, S. Dolgov, A. Onwunta, and M. Stoll.Low-rank solvers for unsteady Stokes-Brinkman optimal control problem with random data.Computer Methods in Applied Mechanics and Engineering, 304:26–54, 2016.

P. Benner, A. Onwunta, and M. Stoll.Low rank solution of unsteady diffusion equations with stochastic coefficients.SIAM/ASA Journal on Uncertainty Quantification, 3(1):622–649, 2015.

P. Benner, A. Onwunta, and M. Stoll.Block-diagonal preconditioning for optimal control problems constrained by PDEs with uncertain inputs.SIAM Journal on Matrix Analysis and Applications, 37(2):491–518, 2016.

C.E. Powell and D.J. Silvester.Preconditioning steady-state Navier–Stokes equations with random data.SIAM Journal on Scientific Computing, 34(5):A2482–A2506, 2012.

E. Rosseel and G.N. Wells.Optimal control with stochastic PDE constraints and uncertain controls.Computer Methods in Applied Mechanics and Engineering, 213–216:152–167, 2012.

M. Stoll and T. Breiten.A low-rank in time approach to PDE-constrained optimization.SIAM Journal on Scientific Computing, 37(1):B1–B29, 2015.


Low-rank Techniques: from Matrix Equations to High … · 2020. 3. 23. · Sergey Dolgov (U Bath)...

Documents

Transcript of Low-rank Techniques: from Matrix Equations to High … · 2020. 3. 23. · Sergey Dolgov (U Bath)...