Low-rank Techniques: from Matrix Equations to High … · 2020. 3. 23. · Sergey Dolgov (U Bath)...
Transcript of Low-rank Techniques: from Matrix Equations to High … · 2020. 3. 23. · Sergey Dolgov (U Bath)...
Low-rank Techniques:
from Matrix Equations to High-dimensional PDEs
Peter Benner
Joint work withSergey Dolgov (U Bath)
Boris Khoromskii and Venera Khoromskaia (MPI DCTS and MPI MIS Leipzig)Patrick Kurschner and Jens Saak (MPI DCTS)
Akwum Onwunta (MPI DCTS/U Maryland soon)
Martin Stoll (TU Chemnitz)
17 November 2017
Landscapes in Mathematical SciencesUniversity of Bath
The Max Planck Institute (MPI) in Magdeburg
The Max Planck Society
operates 84 institutes — 79 in Germany, 2 inItaly, 1 each in The Netherlands, Luxembourg,and the USA;
has ∼ 23,000 employees;
had 18 Nobel Laureates since 1948.
”The first MPI in engineering. . . ”
MPI Magdeburg
founded 1998
4 departments (directors)
9 research groups
budget ∼ 15 Mio. EUR
∼ 230 employees
130 scientists,
doing research in
biotechnologychemical engineeringprocess engineeringenergy conversionapplied math
©Peter Benner Low-Rank Techniques 2/40
Outline
1. Motivation
2. Solving Large-Scale Sylvester and Lyapunov EquationsSome BasicsThe Low-Rank StructureLR-ADI MethodThe New LR-ADI Applied to Lyapunov Equations
3. From Matrix Equations to PDEs in d DimensionsThe Curse of DimensionalityTensor TechniquesNumerical Examples
©Peter Benner Low-Rank Techniques 3/40
Outline
1. Motivation
2. Solving Large-Scale Sylvester and Lyapunov Equations
3. From Matrix Equations to PDEs in d Dimensions
©Peter Benner Low-Rank Techniques 4/40
Prelude — 20 years of working in low-rank methods
Model Reduction in Frequency Domain
Approximate the linear control system
x = Ax + Bu, A ∈ Rn×n, B ∈ Rn×m,y = Cx + Du, C ∈ Rq×n, D ∈ Rq×m,
by reduced-order system
˙x = Ax + Bu, A ∈ Rr×r , B ∈ Rr×m,
y = C x + Du, C ∈ Rq×r , D ∈ Rq×m
of order r n, such that
‖ y − y ‖ = ‖Gu − Gu ‖ ≤ ‖G − G ‖ · ‖ u ‖ < tolerance · ‖ u ‖,
whereG(s) = C(sIn − A)−1B + D, G(s) = C(sIr − A)−1B + D
are the associated transfer functions of the system.
©Peter Benner Low-Rank Techniques 5/40
Prelude — 20 years of working in low-rank methods
Balanced Truncation [Moore ’81]
System Σ :
x(t) = Ax(t) + Bu(t),
y(t) = Cx(t),with A stable, i.e., Λ (A) ⊂ C−,
is balanced, if system Gramians = solutions P,Q of Lyapunov equations
AP + PAT + BBT = 0, ATQ + QA + CTC = 0,
satisfy: P = Q = diag(σ1, . . . , σn) with σ1 ≥ σ2 ≥ . . . ≥ σn > 0.
σ1, . . . , σn are the Hankel singular values (HSVs) of Σ.
Compute balanced realization of Σ via state-space transformation
T : (A,B,C) 7→ (TAT−1,TB,CT−1)
=
([A11 A12
A21 A22
],
[B1
B2
],[C1 C2
]).
Truncation (A, B, C) = (A11,B1,C1).
©Peter Benner Low-Rank Techniques 6/40
Prelude — 20 years of working in low-rank methods
Balanced Truncation [Moore ’81]
System Σ :
x(t) = Ax(t) + Bu(t),
y(t) = Cx(t),with A stable, i.e., Λ (A) ⊂ C−,
is balanced, if system Gramians = solutions P,Q of Lyapunov equations
AP + PAT + BBT = 0, ATQ + QA + CTC = 0,
satisfy: P = Q = diag(σ1, . . . , σn) with σ1 ≥ σ2 ≥ . . . ≥ σn > 0.
σ1, . . . , σn are the Hankel singular values (HSVs) of Σ.
Compute balanced realization of Σ via state-space transformation
T : (A,B,C) 7→ (TAT−1,TB,CT−1)
=
([A11 A12
A21 A22
],
[B1
B2
],[C1 C2
]).
Truncation (A, B, C) = (A11,B1,C1).
©Peter Benner Low-Rank Techniques 6/40
Prelude — 20 years of working in low-rank methods
Balanced Truncation [Moore ’81]
System Σ :
x(t) = Ax(t) + Bu(t),
y(t) = Cx(t),with A stable, i.e., Λ (A) ⊂ C−,
is balanced, if system Gramians = solutions P,Q of Lyapunov equations
AP + PAT + BBT = 0, ATQ + QA + CTC = 0,
satisfy: P = Q = diag(σ1, . . . , σn) with σ1 ≥ σ2 ≥ . . . ≥ σn > 0.
σ1, . . . , σn are the Hankel singular values (HSVs) of Σ.
Compute balanced realization of Σ via state-space transformation
T : (A,B,C) 7→ (TAT−1,TB,CT−1)
=
([A11 A12
A21 A22
],
[B1
B2
],[C1 C2
]).
Truncation (A, B, C) = (A11,B1,C1).
©Peter Benner Low-Rank Techniques 6/40
Prelude — 20 years of working in low-rank methods
Balanced Truncation [Moore ’81]
System Σ :
x(t) = Ax(t) + Bu(t),
y(t) = Cx(t),with A stable, i.e., Λ (A) ⊂ C−,
is balanced, if system Gramians = solutions P,Q of Lyapunov equations
AP + PAT + BBT = 0, ATQ + QA + CTC = 0,
satisfy: P = Q = diag(σ1, . . . , σn) with σ1 ≥ σ2 ≥ . . . ≥ σn > 0.
σ1, . . . , σn are the Hankel singular values (HSVs) of Σ.
Compute balanced realization of Σ via state-space transformation
T : (A,B,C) 7→ (TAT−1,TB,CT−1)
=
([A11 A12
A21 A22
],
[B1
B2
],[C1 C2
]).
Truncation (A, B, C) = (A11,B1,C1).
©Peter Benner Low-Rank Techniques 6/40
Prelude — 20 years of working in low-rank methods
Balanced Truncation [Moore ’81]
. . . is one of the greatest model reduction techniques for linear systems
©Peter Benner Low-Rank Techniques 6/40
Prelude — 20 years of working in low-rank methods
Balanced Truncation [Moore ’81]
. . . is one of the greatest model reduction techniques for linear systems since
the reduced-order model is stable with HSVs σ1, . . . , σr ;
it allows adaptive choice of r via computable error bound (there is a free lunch!):
‖ y − y ‖2 ≤ ‖G − G‖H∞‖ u ‖2 ≤(
2∑n
k=r+1σk
)‖ u ‖2.
©Peter Benner Low-Rank Techniques 6/40
Prelude — 20 years of working in low-rank methods
Balanced Truncation [Moore ’81]
. . . is one of the greatest model reduction techniques for linear systems since
the reduced-order model is stable with HSVs σ1, . . . , σr ;
it allows adaptive choice of r via computable error bound (there is a free lunch!):
‖ y − y ‖2 ≤ ‖G − G‖H∞‖ u ‖2 ≤(
2∑n
k=r+1σk
)‖ u ‖2.
©Peter Benner Low-Rank Techniques 6/40
Prelude — 20 years of working in low-rank methods
Balanced Truncation [Moore ’81]
. . . is one of the greatest model reduction techniques for linear systems!
But: as suggested in [Moore ’81], it is an O(n3) method!(Bottleneck: solution of the Lyapunov equations, followed by SVD of their Cholesky factors to
determine necessary parts of T and T−1.)
©Peter Benner Low-Rank Techniques 6/40
Prelude — 20 years of working in low-rank methods
Balanced Truncation [Moore ’81]
. . . is one of the greatest model reduction techniques for linear systems!
But: as suggested in [Moore ’81], it is an O(n3) method!(Bottleneck: solution of the Lyapunov equations, followed by SVD of their Cholesky factors to
determine necessary parts of T and T−1.)
In 1997, we derived a method to compute low-rank factors S ,R ∈ Rn×`, ` n, suchthat P ≈ SST , Q ≈ RRT [1] and then showed that the truncation could be obtained viacheap `S × `R SVD of STR [2]!
(Some similar ideas: low-rank Lyapunov solver [Saad ’90, Jaimoukha/Kasenally ’94], small-scaleSVD in BT [Penzl ’98, Li/White ’99].)
P. Benner and E.S. Quintana-Ortı. Solving stable generalized Lyapunov equations with the matrix sign function. Numerical Algorithms,20(1):75–100, 1999. (Preprint SFB393 97–23)
P. Benner, E.S. Quintana-Ortı, and G. Quintana-Ortı. Balanced truncation model reduction of large-scale dense systems on parallel computers.Mathematical and Computer Modeling of Dynamical Systems, 6(4):383–405, 2000.
©Peter Benner Low-Rank Techniques 6/40
Outline
1. Motivation
2. Solving Large-Scale Sylvester and Lyapunov EquationsSome BasicsThe Low-Rank StructureLR-ADI MethodThe New LR-ADI Applied to Lyapunov Equations
3. From Matrix Equations to PDEs in d Dimensions
©Peter Benner Low-Rank Techniques 7/40
Solving Large-Scale Sylvester and Lyapunov Equations
Sylvester equation
James Joseph Sylvester(September 3, 1814 – March 15, 1897)
AX + XB = C .
Lyapunov equation
Alexander Michailowitsch Ljapunow(June 6, 1857 – November 3, 1918)
AX + XAT = C , C = CT .
©Peter Benner Low-Rank Techniques 8/40
Solving Large-Scale Sylvester and Lyapunov Equations
Sylvester equation
James Joseph Sylvester(September 3, 1814 – March 15, 1897)
AX + XB = C .
Lyapunov equation
Alexander Michailowitsch Ljapunow(June 6, 1857 – November 3, 1918)
AX + XAT = C , C = CT .
©Peter Benner Low-Rank Techniques 8/40
Solving Large-Scale Sylvester and Lyapunov Equations
Generalizations of Sylvester (AX + XB = C) and Lyapunov (AX + XAT = C) Equations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Generalized Sylvester equation:
AXD + EXB = C .
Generalized Lyapunov equation:
AXET + EXAT = C , C = CT .
Stein equation:X − AXB = C .
(Generalized) discrete Lyapunov/Stein equation:
EXET − AXAT = C , C = CT .
Note:
Consider only regular cases, having a unique solution!
Solutions of symmetric cases are symmetric, X = XT ∈ Rn×n; otherwise, X ∈ Rn×`
with n 6= ` in general.
©Peter Benner Low-Rank Techniques 9/40
Solving Large-Scale Sylvester and Lyapunov Equations
Generalizations of Sylvester (AX + XB = C) and Lyapunov (AX + XAT = C) Equations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Generalized Sylvester equation:
AXD + EXB = C .
Generalized Lyapunov equation:
AXET + EXAT = C , C = CT .
Stein equation:X − AXB = C .
(Generalized) discrete Lyapunov/Stein equation:
EXET − AXAT = C , C = CT .
Note:
Consider only regular cases, having a unique solution!
Solutions of symmetric cases are symmetric, X = XT ∈ Rn×n; otherwise, X ∈ Rn×`
with n 6= ` in general.
©Peter Benner Low-Rank Techniques 9/40
Applications
Gramian-based model order reduction
Linear stability analysis
Continuation algorithms for detecting Hopf bifurcations
Determining metastable equilibrium points of stochastic differential equations
Block-triangularization of matrices and matrix pencils
Computing covariance matrices, e.g., in biochemical reaction networks
Numerical solution of fractional partial differential equations
Solving algebraic Riccati equations via Newton’s method
. . .
©Peter Benner Low-Rank Techniques 10/40
Existence of Solutions of Linear Matrix Equations
Exemplarily, consider the generalized Sylvester equation
AXD + EXB = C . (1)
Vectorization (using Kronecker product) representation as linear system:(DT ⊗ A + BT ⊗ E︸ ︷︷ ︸
=:A
)vec(X )︸ ︷︷ ︸
=:x
= vec(C )︸ ︷︷ ︸=:c
⇐⇒ Ax = c .
=⇒ ”(1) has a unique solution ⇐⇒ A is nonsingular”
Lemma
Λ (A) = αj + βk | αj ∈ Λ (A,E ), βk ∈ Λ (B,D).Hence, (1) has unique solution ⇐⇒ Λ (A,E ) ∩ −Λ (B,D) = ∅.
Example: Lyapunov equation AX + XAT = C has unique solution⇐⇒ @ µ ∈ C : ±µ ∈ Λ (A).
©Peter Benner Low-Rank Techniques 11/40
Existence of Solutions of Linear Matrix Equations
Exemplarily, consider the generalized Sylvester equation
AXD + EXB = C . (1)
Vectorization (using Kronecker product) representation as linear system:(DT ⊗ A + BT ⊗ E︸ ︷︷ ︸
=:A
)vec(X )︸ ︷︷ ︸
=:x
= vec(C )︸ ︷︷ ︸=:c
⇐⇒ Ax = c .
=⇒ ”(1) has a unique solution ⇐⇒ A is nonsingular”
Lemma
Λ (A) = αj + βk | αj ∈ Λ (A,E ), βk ∈ Λ (B,D).Hence, (1) has unique solution ⇐⇒ Λ (A,E ) ∩ −Λ (B,D) = ∅.
Example: Lyapunov equation AX + XAT = C has unique solution⇐⇒ @ µ ∈ C : ±µ ∈ Λ (A).
©Peter Benner Low-Rank Techniques 11/40
Existence of Solutions of Linear Matrix Equations
Exemplarily, consider the generalized Sylvester equation
AXD + EXB = C . (1)
Vectorization (using Kronecker product) representation as linear system:(DT ⊗ A + BT ⊗ E︸ ︷︷ ︸
=:A
)vec(X )︸ ︷︷ ︸
=:x
= vec(C )︸ ︷︷ ︸=:c
⇐⇒ Ax = c .
=⇒ ”(1) has a unique solution ⇐⇒ A is nonsingular”
Lemma
Λ (A) = αj + βk | αj ∈ Λ (A,E ), βk ∈ Λ (B,D).Hence, (1) has unique solution ⇐⇒ Λ (A,E ) ∩ −Λ (B,D) = ∅.
Example: Lyapunov equation AX + XAT = C has unique solution⇐⇒ @ µ ∈ C : ±µ ∈ Λ (A).
©Peter Benner Low-Rank Techniques 11/40
Existence of Solutions of Linear Matrix Equations
Exemplarily, consider the generalized Sylvester equation
AXD + EXB = C . (1)
Vectorization (using Kronecker product) representation as linear system:(DT ⊗ A + BT ⊗ E︸ ︷︷ ︸
=:A
)vec(X )︸ ︷︷ ︸
=:x
= vec(C )︸ ︷︷ ︸=:c
⇐⇒ Ax = c .
=⇒ ”(1) has a unique solution ⇐⇒ A is nonsingular”
Lemma
Λ (A) = αj + βk | αj ∈ Λ (A,E ), βk ∈ Λ (B,D).Hence, (1) has unique solution ⇐⇒ Λ (A,E ) ∩ −Λ (B,D) = ∅.
Example: Lyapunov equation AX + XAT = C has unique solution⇐⇒ @ µ ∈ C : ±µ ∈ Λ (A).
©Peter Benner Low-Rank Techniques 11/40
Existence of Solutions of Linear Matrix Equations
Exemplarily, consider the generalized Sylvester equation
AXD + EXB = C . (1)
Vectorization (using Kronecker product) representation as linear system:(DT ⊗ A + BT ⊗ E︸ ︷︷ ︸
=:A
)vec(X )︸ ︷︷ ︸
=:x
= vec(C )︸ ︷︷ ︸=:c
⇐⇒ Ax = c .
=⇒ ”(1) has a unique solution ⇐⇒ A is nonsingular”
Lemma
Λ (A) = αj + βk | αj ∈ Λ (A,E ), βk ∈ Λ (B,D).Hence, (1) has unique solution ⇐⇒ Λ (A,E ) ∩ −Λ (B,D) = ∅.
Example: Lyapunov equation AX + XAT = C has unique solution⇐⇒ @ µ ∈ C : ±µ ∈ Λ (A).
©Peter Benner Low-Rank Techniques 11/40
Solving Large-Scale Sylvester and Lyapunov Equations
The Low-Rank Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sylvester Equations
Find X ∈ Rn×m solving
AX − XB = FGT ,
where A ∈ Rn×n, B ∈ Rm×m, F ∈ Rn×r , G ∈ Rm×r .
If n,m large, but r n,m X has a small numerical rank.[Penzl 1999, Grasedyck 2004,
Antoulas/Sorensen/Zhou 2002]
rank(X , τ) = f min(n,m)
300 600 900
0
−10
u
σ155 ≈ u
singular values of 1600× 900 example
σ(X )
Compute low-rank solution factors Z ∈ Rn×f , Y ∈ Rm×f ,D ∈ Rf×f , such that X ≈ ZDY T with f min(n,m).
©Peter Benner Low-Rank Techniques 12/40
Solving Large-Scale Sylvester and Lyapunov Equations
The Low-Rank Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lyapunov Equations
Find X ∈ Rn×n solving
AX+XAT = −FFT ,
where A ∈ Rn×n, F ∈ Rn×r .
If n large, but r n X has a small numerical rank.[Penzl 1999, Grasedyck 2004,
Antoulas/Sorensen/Zhou 2002]
rank(X , τ) = f n
300 600 900
0
−10
u
σ155 ≈ u
singular values of 1600× 900 example
σ(X )
Compute low-rank solution factors Z ∈ Rn×f ,D ∈ Rf×f , such that X ≈ ZDZT with f n.
©Peter Benner Low-Rank Techniques 12/40
Solving Large-Scale Sylvester and Lyapunov Equations
Theoretical Foundation of the Low-rank Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lyapunov case: [Grasedyck ’04]
AX + XAT + BBT = 0 ⇐⇒ (In ⊗ A + A⊗ In)︸ ︷︷ ︸=:A
vec(X ) = − vec(BBT ).
©Peter Benner Low-Rank Techniques 13/40
Solving Large-Scale Sylvester and Lyapunov Equations
Theoretical Foundation of the Low-rank Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lyapunov case: [Grasedyck ’04]
AX + XAT + BBT = 0 ⇐⇒ (In ⊗ A + A⊗ In)︸ ︷︷ ︸=:A
vec(X ) = − vec(BBT ).
For stable M, i.e., Λ (M) ⊂ C−, apply
M−1 = −∫ ∞
0
exp(tM)dt
to A and approximate the integral via (sinc) quadrature ⇒
A−1 ≈ −k∑
i=−k
ωi exp(tkA),
with error ∼ exp(−√k) (exp(−k) if A = AT ), then an approximate Lyapunov solution is
given by
©Peter Benner Low-Rank Techniques 13/40
Solving Large-Scale Sylvester and Lyapunov Equations
Theoretical Foundation of the Low-rank Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lyapunov case: [Grasedyck ’04]
AX + XAT + BBT = 0 ⇐⇒ (In ⊗ A + A⊗ In)︸ ︷︷ ︸=:A
vec(X ) = − vec(BBT ).
For stable M, i.e., Λ (M) ⊂ C−, apply
M−1 = −∫ ∞
0
exp(tM)dt
to A and approximate the integral via (sinc) quadrature ⇒
A−1 ≈ −k∑
i=−k
ωi exp(tkA),
with error ∼ exp(−√k) (exp(−k) if A = AT ), then an approximate Lyapunov solution is
given by
vec(X ) ≈ vec(Xk) =k∑
i=−k
ωi exp(tiA) vec(BBT ).
©Peter Benner Low-Rank Techniques 13/40
Solving Large-Scale Sylvester and Lyapunov Equations
Theoretical Foundation of the Low-rank Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lyapunov case: [Grasedyck ’04]
AX + XAT + BBT = 0 ⇐⇒ (In ⊗ A + A⊗ In)︸ ︷︷ ︸=:A
vec(X ) = − vec(BBT ).
vec(X ) ≈ vec(Xk) =k∑
i=−k
ωi exp(tiA) vec(BBT ).
Now observe that
exp(tiA) = exp (ti (In ⊗ A + A⊗ In)) ≡ exp(tiA)⊗ exp(tiA).
©Peter Benner Low-Rank Techniques 13/40
Solving Large-Scale Sylvester and Lyapunov Equations
Theoretical Foundation of the Low-rank Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lyapunov case: [Grasedyck ’04]
AX + XAT + BBT = 0 ⇐⇒ (In ⊗ A + A⊗ In)︸ ︷︷ ︸=:A
vec(X ) = − vec(BBT ).
vec(X ) ≈ vec(Xk) =k∑
i=−k
ωi exp(tiA) vec(BBT ).
Now observe that
exp(tiA) = exp (ti (In ⊗ A + A⊗ In)) ≡ exp(tiA)⊗ exp(tiA).
Hence,
vec(Xk) =k∑
i=−k
ωi (exp(tiA)⊗ exp(tiA)) vec(BBT )
©Peter Benner Low-Rank Techniques 13/40
Solving Large-Scale Sylvester and Lyapunov Equations
Theoretical Foundation of the Low-rank Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lyapunov case: [Grasedyck ’04]
AX + XAT + BBT = 0 ⇐⇒ (In ⊗ A + A⊗ In)︸ ︷︷ ︸=:A
vec(X ) = − vec(BBT ).
Hence,
vec(Xk) =k∑
i=−k
ωi (exp(tiA)⊗ exp(tiA)) vec(BBT )
=⇒ Xk =k∑
i=−k
ωi exp(tiA)BBT exp(tiAT ) ≡
k∑i=−k
ωiBiBTi ,
so that rank (Xk) ≤ (2k + 1)m with
||X − Xk ||2 . exp(−√k) ( exp(−k) for A = AT )!
Extended to AX + XAT +∑m
j=1 NjXNTj + BBT = 0. [B./Breiten 2012]
©Peter Benner Low-Rank Techniques 13/40
Exploiting the Low-Rank Structure I
For simplicity, consider again the Lyapunov equation AX + XAT + BBT = 0.
Definition
The matrix sign function of M ∈ Rn×n with no purely imaginary eigenvalues is
sign (M) = sign
(T
[J− 0
0 J+
]T−1
)= T
[−I 0
0 I
]T−1
with J± containing all Jordan blocks of M corresponding to eigenvalues withpositive/negative real parts.
©Peter Benner Low-Rank Techniques 14/40
Exploiting the Low-Rank Structure I
For simplicity, consider again the Lyapunov equation AX + XAT + BBT = 0.
Definition
The matrix sign function of M ∈ Rn×n with no purely imaginary eigenvalues is
sign (M) = sign
(T
[J− 0
0 J+
]T−1
)= T
[−I 0
0 I
]T−1
with J± containing all Jordan blocks of M corresponding to eigenvalues withpositive/negative real parts.
Observations
1. sign
([A BBT
0 −AT
])=
[−I 2X
0 I
].
2. sign (M) = limk→∞Mk with Mk+1 = 12(Mk + M−1
k ) if M0 = M.
©Peter Benner Low-Rank Techniques 14/40
Exploiting the Low-Rank Structure I
For simplicity, consider again the Lyapunov equation AX + XAT + BBT = 0.
Observations
1. sign
([A BBT
0 −AT
])=
[−I 2X
0 I
].
2. sign (M) = limk→∞Mk with Mk+1 = 12(Mk + M−1
k ) if M0 = M.
Sign function iteration for solving Lyapunov equations
M0 =
[A BBT
0 −AT
], and inversion formula for block-triangular matrices:
Ak+1 ← 1
2(Ak + A−1
k )
Bk+1Bk+1 ← 1
2(BkB
Tk + A−1
k BkBTk A−T
k )
©Peter Benner Low-Rank Techniques 14/40
Exploiting the Low-Rank Structure I
For simplicity, consider again the Lyapunov equation AX + XAT + BBT = 0.
Observations
1. sign
([A BBT
0 −AT
])=
[−I 2X
0 I
].
2. sign (M) = limk→∞Mk with Mk+1 = 12(Mk + M−1
k ) if M0 = M.
Sign function iteration for solving Lyapunov equations
M0 =
[A BBT
0 −AT
], and inversion formula for block-triangular matrices:
Ak+1 ← 1
2(Ak + A−1
k )
Bk+1BTk+1 ← 1
2(BkB
Tk + A−1
k Bk(A−1k Bk)T )
©Peter Benner Low-Rank Techniques 14/40
Exploiting the Low-Rank Structure I
For simplicity, consider again the Lyapunov equation AX + XAT + BBT = 0.
Observations
1. sign
([A BBT
0 −AT
])=
[−I 2X
0 I
].
2. sign (M) = limk→∞Mk with Mk+1 = 12(Mk + M−1
k ) if M0 = M.
Sign function iteration for solving Lyapunov equations
M0 =
[A BBT
0 −AT
], and inversion formula for block-triangular matrices:
Ak+1 ← 1
2(Ak + A−1
k )
Bk+1BTk+1 ← 1
2(BkB
Tk + A−1
k Bk(A−1k Bk)T )
so that Bk → 1√2Z with X = ZZT .
©Peter Benner Low-Rank Techniques 14/40
Exploiting the Low-Rank Structure I
For simplicity, consider again the Lyapunov equation AX + XAT + BBT = 0.
Observations
1. sign
([A BBT
0 −AT
])=
[−I 2X
0 I
].
2. sign (M) = limk→∞Mk with Mk+1 = 12(Mk + M−1
k ) if M0 = M.
Factored sign function iteration for Lyapunov equations [B./Quintana-Ortı 1997/99]
Ak+1 ← 1
2(Ak + A−1
k )
Bk+1 ← 1√2
[Bk , A−1k Bk ]
Problem: number of columns in Bk doubles each iteration!
©Peter Benner Low-Rank Techniques 14/40
Exploiting the Low-Rank Structure I
For simplicity, consider again the Lyapunov equation AX + XAT + BBT = 0.
Observations
1. sign
([A BBT
0 −AT
])=
[−I 2X
0 I
].
2. sign (M) = limk→∞Mk with Mk+1 = 12(Mk + M−1
k ) if M0 = M.
Factored sign function iteration for Lyapunov equations [B./Quintana-Ortı 1997/99]
Ak+1 ← 1
2(Ak + A−1
k )
Bk+1 ← 1√2
[Bk , A−1k Bk ]
Problem: number of columns in Bk doubles each iteration!
Cure: truncation operator Bk+1 ← Tε(
1√2[Bk , A
−1k Bk ]
)with, e.g., Tε returning the
scaled left singular vectors of the truncated SVD w.r.t. the numerical rank tolerance ε.
©Peter Benner Low-Rank Techniques 14/40
Exploiting the Low-Rank Structure II
Exploiting the Low-rank Structure for Large and Sparse Coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sylvester equation AX − XB = FGT is equivalent to linear system of equations(Im ⊗ A− BT ⊗ In
)vec(X ) = vec(FGT ).
This cannot be used for numerical solutions unless nm ≤ 1, 000 (or so), as
it requires O(n2m2) of storage;
direct solver needs O(n3m3) flops;
low (tensor-)rank of right-hand side is ignored;
in Lyapunov case, symmetry and possible definiteness are not respected.
©Peter Benner Low-Rank Techniques 15/40
Exploiting the Low-Rank Structure II
Exploiting the Low-rank Structure for Large and Sparse Coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sylvester equation AX − XB = FGT is equivalent to linear system of equations(Im ⊗ A− BT ⊗ In
)vec(X ) = vec(FGT ).
This cannot be used for numerical solutions unless nm ≤ 1, 000 (or so), as
it requires O(n2m2) of storage;
direct solver needs O(n3m3) flops;
low (tensor-)rank of right-hand side is ignored;
in Lyapunov case, symmetry and possible definiteness are not respected.
©Peter Benner Low-Rank Techniques 15/40
Exploiting the Low-Rank Structure II
Exploiting the Low-rank Structure for Large and Sparse Coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sylvester equation AX − XB = FGT is equivalent to linear system of equations(Im ⊗ A− BT ⊗ In
)vec(X ) = vec(FGT ).
This cannot be used for numerical solutions unless nm ≤ 1, 000 (or so), as
it requires O(n2m2) of storage;
direct solver needs O(n3m3) flops;
low (tensor-)rank of right-hand side is ignored;
in Lyapunov case, symmetry and possible definiteness are not respected.
©Peter Benner Low-Rank Techniques 15/40
Exploiting the Low-Rank Structure II
Exploiting the Low-rank Structure for Large and Sparse Coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sylvester equation AX − XB = FGT is equivalent to linear system of equations(Im ⊗ A− BT ⊗ In
)vec(X ) = vec(FGT ).
This cannot be used for numerical solutions unless nm ≤ 1, 000 (or so), as
it requires O(n2m2) of storage;
direct solver needs O(n3m3) flops;
low (tensor-)rank of right-hand side is ignored;
in Lyapunov case, symmetry and possible definiteness are not respected.
©Peter Benner Low-Rank Techniques 15/40
Exploiting the Low-Rank Structure II
Exploiting the Low-rank Structure for Large and Sparse Coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sylvester equation AX − XB = FGT is equivalent to linear system of equations(Im ⊗ A− BT ⊗ In
)vec(X ) = vec(FGT ).
This cannot be used for numerical solutions unless nm ≤ 1, 000 (or so), as
it requires O(n2m2) of storage;
direct solver needs O(n3m3) flops;
low (tensor-)rank of right-hand side is ignored;
in Lyapunov case, symmetry and possible definiteness are not respected.
©Peter Benner Low-Rank Techniques 15/40
Exploiting the Low-Rank Structure II
Exploiting the Low-rank Structure for Large and Sparse Coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sylvester equation AX − XB = FGT is equivalent to linear system of equations(Im ⊗ A− BT ⊗ In
)vec(X ) = vec(FGT ).
This cannot be used for numerical solutions unless nm ≤ 1, 000 (or so), asit requires O(n2m2) of storage;
direct solver needs O(n3m3) flops;
low (tensor-)rank of right-hand side is ignored;
in Lyapunov case, symmetry and possible definiteness are not respected.
Possible solvers:Standard Krylov subspace solvers in operator from [Hochbruck, Starke, Reichel, . . . ]
Block-Tensor-Krylov subspace methods with truncation [Kressner/Tobler,
Bollhofer/Eppler, B./Breiten, . . . ]
Galerkin-type methods based on (extended, rational) Krylov subspace methods[Jaimoukha, Kasenally, Jbilou, Simoncini, Druskin, Knizhermann,. . . ]
Doubling-type methods [Smith, Chu et al., B./Sadkane/El Khoury, . . . ]
ADI methods [Wachspress, Reichel, Li, Penzl, B., Saak, Kurschner, Opmeer/Reis, . . . ]
©Peter Benner Low-Rank Techniques 15/40
Solving Large-Scale Sylvester and Lyapunov Equations
LR-ADI Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sylvester and Stein equations
Let α
k
6= β
k
with α
k
/∈ Λ(B), β
k
/∈ Λ(A), then
AX − XB = FGT︸ ︷︷ ︸Sylvester equation
⇔ X = A
k
XB
k
+ (β
k
− α
k
)F
k
G
k
H︸ ︷︷ ︸Stein equation
with the Cayley-like (Mobius) transformations
A
k
:= (A− β
k
In)−1(A− α
k
In), B
k
:= (B − α
k
Im)−1(B − β
k
Im),
F
k
:= (A− β
k
In)−1F , G
k
:= (B − α
k
Im)−HG .
fix point iteration
Xk = A
k
Xk−1B
k
+ (β
k
− α
k
)F
k
G
k
H
for k ≥ 1, X0 ∈ Rn×m.
[Wachspress 1988]
©Peter Benner Low-Rank Techniques 16/40
Solving Large-Scale Sylvester and Lyapunov Equations
LR-ADI Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sylvester and Stein equations
Let αk 6= βk with αk /∈ Λ(B), βk /∈ Λ(A), then
AX − XB = FGT︸ ︷︷ ︸Sylvester equation
⇔ X = AkXBk + (βk − αk)FkGkH︸ ︷︷ ︸
Stein equation
with the Cayley-like (Mobius) transformations
Ak := (A− βkIn)−1(A− αkIn), Bk := (B − αkIm)−1(B − βkIm),
Fk := (A− βkIn)−1F , Gk := (B − αkIm)−HG .
alternating directions implicit (ADI) iteration
Xk = AkXk−1Bk + (βk − αk)FkGkH
for k ≥ 1, X0 ∈ Rn×m. [Wachspress 1988]
©Peter Benner Low-Rank Techniques 16/40
Solving Large-Scale Sylvester and Lyapunov Equations
LR-ADI Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sylvester ADI iteration [Wachspress 1988]
Xk = AkXk−1Bk + (βk − αk)FkGHk ,Ak := (A− βk In)−1(A− αk In), Bk := (B − αk Im)−1(B − βk Im),
Fk := (A− βk In)−1F ∈ Rn×r , Gk := (B − αk Im)−HG ∈ Cm×r .
Now set X0 = 0 and find factorization Xk = ZkDkYHk
X1 = A1X0B1 + (β1 − α1)F1GH1
⇒ V1 := Z1= (A− β1In)−1F ∈ Rn×r ,
D1= (β1 − α1)Ir ∈ Rr×r
,
W1 := Y1 = (B − α1Im)−HG ∈ Cm×r .
©Peter Benner Low-Rank Techniques 17/40
Solving Large-Scale Sylvester and Lyapunov Equations
LR-ADI Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sylvester ADI iteration [Wachspress 1988]
Xk = AkXk−1Bk + (βk − αk)FkGHk ,Ak := (A− βk In)−1(A− αk In), Bk := (B − αk Im)−1(B − βk Im),
Fk := (A− βk In)−1F ∈ Rn×r , Gk := (B − αk Im)−HG ∈ Cm×r .
Now set X0 = 0 and find factorization Xk = ZkDkYHk
X1 = (β1 − α1)(A− β1In)−1FGT (B − α1Im)−1
⇒ V1 := Z1= (A− β1In)−1F ∈ Rn×r ,
D1= (β1 − α1)Ir ∈ Rr×r ,
W1 := Y1 = (B − α1Im)−HG ∈ Cm×r .
©Peter Benner Low-Rank Techniques 17/40
Solving Large-Scale Sylvester and Lyapunov Equations
LR-ADI Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sylvester ADI iteration [Wachspress 1988]
Xk = AkXk−1Bk + (βk − αk)FkGHk ,Ak := (A− βk In)−1(A− αk In), Bk := (B − αk Im)−1(B − βk Im),
Fk := (A− βk In)−1F ∈ Rn×r , Gk := (B − αk Im)−HG ∈ Cm×r .
Now set X0 = 0 and find factorization Xk = ZkDkYHk
X2 = A2X1B2 + (β2 − α2)F2GH2V2 = V1 + (β2 − α1)(A + β2I )
−1V1 ∈ Rn×r ,
W2 = W1 + (α2 − β1)(B + α2I )−HW1 ∈ Rm×r ,
Z2 = [Z1, V2],
D2 = diag (D1, (β2 − α2)Ir ),
Y2 = [Y1, W2].
©Peter Benner Low-Rank Techniques 17/40
Solving Large-Scale Sylvester and Lyapunov Equations
LR-ADI Algorithm [B. 2005, Li/Truhar 2008, B./Li/Truhar 2009]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Algorithm 1: Low-rank Sylvester ADI / factored ADI (fADI)
Input : Matrices defining AX − XB = FGT and shift parameters α1, . . . , αkmax,β1, . . . , βkmax.
Output: Z , D, Y such that ZDY H ≈ X .
1 Z1 = V1 = (A− β1In)−1F ,
2 Y1 = W1 = (B − α1Im)−HG .3 D1 = (β1 − α1)Ir4 for k = 2, . . . , kmax do5 Vk = Vk−1 + (βk − αk−1)(A− βk In)−1Vk−1.
6 Wk = Wk−1 + (αk − βk−1)(B − αk In)−HWk−1.7 Update solution factors
Zk = [Zk−1,Vk ], Yk = [Yk−1,Wk ], Dk = diag (Dk−1, (βk−αk)Ir ).
©Peter Benner Low-Rank Techniques 18/40
Solving Large-Scale Sylvester and Lyapunov Equations
LR-ADI Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Disadvantages of Low-Rank ADI as of 2012:1. No efficient stopping criteria:
Difference in iterates norm of added columns/step: not reliable, stops oftentoo late.Residual is a full dense matrix, can not be calculated as such.
2. Requires complex arithmetic for real coefficients when complex shifts areused.
3. Expensive (only semi-automatic) set-up phase to precompute ADI shifts.
None of these disadvantages exists as of today
Key observation: residual is low-rank matrix, with rank less than or equalto that of right-hand side!
=⇒ speed-ups old vs. new LR-ADI can be up to 20!
©Peter Benner Low-Rank Techniques 19/40
Solving Large-Scale Sylvester and Lyapunov Equations
LR-ADI Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Disadvantages of Low-Rank ADI as of 2012:1. No efficient stopping criteria:
Difference in iterates norm of added columns/step: not reliable, stops oftentoo late.Residual is a full dense matrix, can not be calculated as such.
2. Requires complex arithmetic for real coefficients when complex shifts areused.
3. Expensive (only semi-automatic) set-up phase to precompute ADI shifts.
None of these disadvantages exists as of today
Key observation: residual is low-rank matrix, with rank less than or equalto that of right-hand side!
=⇒ speed-ups old vs. new LR-ADI can be up to 20!
©Peter Benner Low-Rank Techniques 19/40
Solving Large-Scale Sylvester and Lyapunov Equations
LR-ADI Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Disadvantages of Low-Rank ADI as of 2012:1. No efficient stopping criteria:
Difference in iterates norm of added columns/step: not reliable, stops oftentoo late.Residual is a full dense matrix, can not be calculated as such.
2. Requires complex arithmetic for real coefficients when complex shifts areused.
3. Expensive (only semi-automatic) set-up phase to precompute ADI shifts.
None of these disadvantages exists as of today
Key observation: residual is low-rank matrix, with rank less than or equalto that of right-hand side!
=⇒ speed-ups old vs. new LR-ADI can be up to 20!
©Peter Benner Low-Rank Techniques 19/40
Other Low-rank Lyapunov Solvers. . .
. . . for Lyapunov equation 0 = AX + XAT + BBT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Projection-based methods for Lyapunov equations with A + AT < 0:
1. Compute orthonormal basis range (Z), Z ∈ Rn×r , for subspace Z ⊂ Rn, dimZ = r .
2. Set A := ZTAZ , B := ZTB.
3. Solve small-size Lyapunov equation AX + X AT + BBT = 0.
4. Use X ≈ ZXZT .
Examples:
Krylov subspace methods, i.e., for m = 1:
Z = K(A,B, r) = spanB,AB,A2B, . . . ,Ar−1B
[Saad 1990, Jaimoukha/Kasenally 1994, Jbilou 2002–2008].
Extended Krylov subspace method (EKSM) [Simoncini 2007],
Z = K(A,B, r) ∪ K(A−1,B, r).
Rational Krylov subspace methods (RKSM) [Druskin/Simoncini 2011].
©Peter Benner Low-Rank Techniques 20/40
Other Low-rank Lyapunov Solvers. . .
. . . for Lyapunov equation 0 = AX + XAT + BBT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Projection-based methods for Lyapunov equations with A + AT < 0:
1. Compute orthonormal basis range (Z), Z ∈ Rn×r , for subspace Z ⊂ Rn, dimZ = r .
2. Set A := ZTAZ , B := ZTB.
3. Solve small-size Lyapunov equation AX + X AT + BBT = 0.
4. Use X ≈ ZXZT .
Examples:
Krylov subspace methods, i.e., for m = 1:
Z = K(A,B, r) = spanB,AB,A2B, . . . ,Ar−1B
[Saad 1990, Jaimoukha/Kasenally 1994, Jbilou 2002–2008].
Extended Krylov subspace method (EKSM) [Simoncini 2007],
Z = K(A,B, r) ∪ K(A−1,B, r).
Rational Krylov subspace methods (RKSM) [Druskin/Simoncini 2011].
©Peter Benner Low-Rank Techniques 20/40
The New LR-ADI Applied to Lyapunov Equations
Example: an ocean circulation problem [Van Gijzen et al. 1998]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
FEM discretization of a simple 3D ocean circulation model (barotropic,constant depth) stiffness matrix −A with n = 42, 249, choose artificialconstant term B = rand(n,5).
Convergence history:
0 100 200 300 400 50010−10
10−5
100
TOL
coldim(Z )
‖R‖/‖B
TB‖
LR-ADI with adaptive shifts vs. EKSM
LR-ADIEKSM
CPU times: LR-ADI ≈ 110 sec, EKSM ≈ 135 sec.
©Peter Benner Low-Rank Techniques 21/40
The New LR-ADI Applied to Lyapunov Equations
Example: an ocean circulation problem [Van Gijzen et al. 1998]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
FEM discretization of a simple 3D ocean circulation model (barotropic,constant depth) stiffness matrix −A with n = 42, 249, choose artificialconstant term B = rand(n,5).
Convergence history:
0 100 200 300 400 50010−10
10−5
100
TOL
coldim(Z )
‖R‖/‖B
TB‖
LR-ADI with adaptive shifts vs. EKSM
LR-ADIEKSM
CPU times: LR-ADI ≈ 110 sec, EKSM ≈ 135 sec.
©Peter Benner Low-Rank Techniques 21/40
The New LR-ADI Applied to Lyapunov Equations
Example: an ocean circulation problem [Van Gijzen et al. 1998]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
FEM discretization of a simple 3D ocean circulation model (barotropic,constant depth) stiffness matrix −A with n = 42, 249, choose artificialconstant term B = rand(n,5).
Convergence history:
0 100 200 300 400 50010−10
10−5
100
TOL
coldim(Z )
‖R‖/‖B
TB‖
LR-ADI with adaptive shifts vs. EKSM
LR-ADIEKSM
CPU times: LR-ADI ≈ 110 sec, EKSM ≈ 135 sec.
©Peter Benner Low-Rank Techniques 21/40
Solving Large-Scale Sylvester and Lyapunov Equations
Summary & Outlook (Matrix Equations). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Numerical enhancements of low-rank ADI for large Sylvester/Lyapunovequations:
1. low-rank residuals, reformulated implementation,2. compute real low-rank factors in the presence of complex shifts,3. self-generating shift strategies (quantification in progress).
For diffusion-convection-reaction example:332.02 sec. down to 17.24 sec. acceleration by factor almost 20.
Generalized version enables derivation of low-rank solvers for variousgeneralized Sylvester equations.Ongoing work:
Apply LR-ADI in Newton methods for algebraic Riccati equations
R(X ) = AX + XAT + GGT − XSSTX = 0,
D(X ) = AXAT − EXET + GGT + ATXF (Ir + FTXF )−1FTXA = 0.
For nonlinear AREs seeP. Benner, P. Kurschner, J. Saak. Low-rank Newton-ADI methods for large nonsymmetric algebraic Riccati equations. J. FranklinInst., 2015.
©Peter Benner Low-Rank Techniques 22/40
Outline
1. Motivation
2. Solving Large-Scale Sylvester and Lyapunov Equations
3. From Matrix Equations to PDEs in d DimensionsThe Curse of DimensionalityTensor TechniquesNumerical Examples
©Peter Benner Low-Rank Techniques 23/40
From Matrix Equations to PDEs in d Dimensions
The Curse of Dimensionality [Bellman 1957]
Increase matrix size of discretized differential operator for h→ h2
by factor 2d .
Rapid Increase of Dimensionality, called Curse of Dimensionality (d > 3).
Consider −∆u = f in [ 0 , 1 ]× [ 0 , 1 ] ⊂ R2, uniformly discretized as
(I ⊗ A + A⊗ I ) x =: Ax = b ⇐⇒ AX + XAT = B
with x = vec (X ) and b = vec (B) with low-rank right hand side B ≈ b1bT2 .
Low-rankness of X := VWT ≈ X follows from properties of A and B, e.g.,[Penzl ’00, Grasedyck ’04].
We solve this using low-rank Krylov subspace solvers. These essentially requirematrix-vector multiplication and vector computations.
Hence, A vec (Xk ) = A vec(VkW
Tk
)= vec
([AVk , Vk ] [Wk , AWk ]T
)The rank of [AVk Vk ] ∈ Rn,2r , [Wk AWk ] ∈ Rnt ,2r increases but can be controlled usingtruncation operator Tε (like in sign function solver for Lyapunov equations). Low-rankKrylov subspace solvers. [Kressner/Tobler, B/Breiten, Savostyanov/Dolgov, . . . ].
Ideas correspond to separation of variables in continuous-time: b(x , y) ≈ b1(x)b2(y).
©Peter Benner Low-Rank Techniques 24/40
From Matrix Equations to PDEs in d Dimensions
The Curse of Dimensionality [Bellman 1957]
Increase matrix size of discretized differential operator for h→ h2
by factor 2d .
Rapid Increase of Dimensionality, called Curse of Dimensionality (d > 3).
Consider −∆u = f in [ 0 , 1 ]× [ 0 , 1 ] ⊂ R2, uniformly discretized as
(I ⊗ A + A⊗ I ) x =: Ax = b ⇐⇒ AX + XAT = B
with x = vec (X ) and b = vec (B) with low-rank right hand side B ≈ b1bT2 .
Low-rankness of X := VWT ≈ X follows from properties of A and B, e.g.,[Penzl ’00, Grasedyck ’04].
We solve this using low-rank Krylov subspace solvers. These essentially requirematrix-vector multiplication and vector computations.
Hence, A vec (Xk ) = A vec(VkW
Tk
)= vec
([AVk , Vk ] [Wk , AWk ]T
)The rank of [AVk Vk ] ∈ Rn,2r , [Wk AWk ] ∈ Rnt ,2r increases but can be controlled usingtruncation operator Tε (like in sign function solver for Lyapunov equations). Low-rankKrylov subspace solvers. [Kressner/Tobler, B/Breiten, Savostyanov/Dolgov, . . . ].
Ideas correspond to separation of variables in continuous-time: b(x , y) ≈ b1(x)b2(y).
©Peter Benner Low-Rank Techniques 24/40
From Matrix Equations to PDEs in d Dimensions
The Curse of Dimensionality [Bellman 1957]
Increase matrix size of discretized differential operator for h→ h2
by factor 2d .
Rapid Increase of Dimensionality, called Curse of Dimensionality (d > 3).
Consider −∆u = f in [ 0 , 1 ]× [ 0 , 1 ] ⊂ R2, uniformly discretized as
(I ⊗ A + A⊗ I ) x =: Ax = b ⇐⇒ AX + XAT = B
with x = vec (X ) and b = vec (B) with low-rank right hand side B ≈ b1bT2 .
Low-rankness of X := VWT ≈ X follows from properties of A and B, e.g.,[Penzl ’00, Grasedyck ’04].
We solve this using low-rank Krylov subspace solvers. These essentially requirematrix-vector multiplication and vector computations.
Hence, A vec (Xk ) = A vec(VkW
Tk
)= vec
([AVk , Vk ] [Wk , AWk ]T
)The rank of [AVk Vk ] ∈ Rn,2r , [Wk AWk ] ∈ Rnt ,2r increases but can be controlled usingtruncation operator Tε (like in sign function solver for Lyapunov equations). Low-rankKrylov subspace solvers. [Kressner/Tobler, B/Breiten, Savostyanov/Dolgov, . . . ].
Ideas correspond to separation of variables in continuous-time: b(x , y) ≈ b1(x)b2(y).
©Peter Benner Low-Rank Techniques 24/40
From Matrix Equations to PDEs in d Dimensions
The Curse of Dimensionality [Bellman 1957]
Increase matrix size of discretized differential operator for h→ h2
by factor 2d .
Rapid Increase of Dimensionality, called Curse of Dimensionality (d > 3).
Consider −∆u = f in [ 0 , 1 ]× [ 0 , 1 ] ⊂ R2, uniformly discretized as
(I ⊗ A + A⊗ I ) x =: Ax = b ⇐⇒ AX + XAT = B
with x = vec (X ) and b = vec (B) with low-rank right hand side B ≈ b1bT2 .
Low-rankness of X := VWT ≈ X follows from properties of A and B, e.g.,[Penzl ’00, Grasedyck ’04].
We solve this using low-rank Krylov subspace solvers. These essentially requirematrix-vector multiplication and vector computations.
Hence, A vec (Xk ) = A vec(VkW
Tk
)= vec
([AVk , Vk ] [Wk , AWk ]T
)The rank of [AVk Vk ] ∈ Rn,2r , [Wk AWk ] ∈ Rnt ,2r increases but can be controlled usingtruncation operator Tε (like in sign function solver for Lyapunov equations). Low-rankKrylov subspace solvers. [Kressner/Tobler, B/Breiten, Savostyanov/Dolgov, . . . ].
Ideas correspond to separation of variables in continuous-time: b(x , y) ≈ b1(x)b2(y).
©Peter Benner Low-Rank Techniques 24/40
From Matrix Equations to PDEs in d Dimensions
The Curse of Dimensionality [Bellman 1957]
Increase matrix size of discretized differential operator for h→ h2
by factor 2d .
Rapid Increase of Dimensionality, called Curse of Dimensionality (d > 3).
Consider −∆u = f in [ 0 , 1 ]× [ 0 , 1 ] ⊂ R2, uniformly discretized as
(I ⊗ A + A⊗ I ) x =: Ax = b ⇐⇒ AX + XAT = B
with x = vec (X ) and b = vec (B) with low-rank right hand side B ≈ b1bT2 .
Low-rankness of X := VWT ≈ X follows from properties of A and B, e.g.,[Penzl ’00, Grasedyck ’04].
We solve this using low-rank Krylov subspace solvers. These essentially requirematrix-vector multiplication and vector computations.
Hence, A vec (Xk ) = A vec(VkW
Tk
)= vec
([AVk , Vk ] [Wk , AWk ]T
)
The rank of [AVk Vk ] ∈ Rn,2r , [Wk AWk ] ∈ Rnt ,2r increases but can be controlled usingtruncation operator Tε (like in sign function solver for Lyapunov equations). Low-rankKrylov subspace solvers. [Kressner/Tobler, B/Breiten, Savostyanov/Dolgov, . . . ].
Ideas correspond to separation of variables in continuous-time: b(x , y) ≈ b1(x)b2(y).
©Peter Benner Low-Rank Techniques 24/40
From Matrix Equations to PDEs in d Dimensions
The Curse of Dimensionality [Bellman 1957]
Increase matrix size of discretized differential operator for h→ h2
by factor 2d .
Rapid Increase of Dimensionality, called Curse of Dimensionality (d > 3).
Consider −∆u = f in [ 0 , 1 ]× [ 0 , 1 ] ⊂ R2, uniformly discretized as
(I ⊗ A + A⊗ I ) x =: Ax = b ⇐⇒ AX + XAT = B
with x = vec (X ) and b = vec (B) with low-rank right hand side B ≈ b1bT2 .
Low-rankness of X := VWT ≈ X follows from properties of A and B, e.g.,[Penzl ’00, Grasedyck ’04].
We solve this using low-rank Krylov subspace solvers. These essentially requirematrix-vector multiplication and vector computations.
Hence, A vec (Xk ) = A vec(VkW
Tk
)= vec
([AVk , Vk ] [Wk , AWk ]T
)The rank of [AVk Vk ] ∈ Rn,2r , [Wk AWk ] ∈ Rnt ,2r increases but can be controlled usingtruncation operator Tε (like in sign function solver for Lyapunov equations). Low-rankKrylov subspace solvers. [Kressner/Tobler, B/Breiten, Savostyanov/Dolgov, . . . ].
Ideas correspond to separation of variables in continuous-time: b(x , y) ≈ b1(x)b2(y).
©Peter Benner Low-Rank Techniques 24/40
From Matrix Equations to PDEs in d Dimensions
The Curse of Dimensionality [Bellman 1957]
Increase matrix size of discretized differential operator for h→ h2
by factor 2d .
Rapid Increase of Dimensionality, called Curse of Dimensionality (d > 3).
Consider −∆u = f in [ 0 , 1 ]× [ 0 , 1 ] ⊂ R2, uniformly discretized as
(I ⊗ A + A⊗ I ) x =: Ax = b ⇐⇒ AX + XAT = B
with x = vec (X ) and b = vec (B) with low-rank right hand side B ≈ b1bT2 .
Low-rankness of X := VWT ≈ X follows from properties of A and B, e.g.,[Penzl ’00, Grasedyck ’04].
We solve this using low-rank Krylov subspace solvers. These essentially requirematrix-vector multiplication and vector computations.
Hence, A vec (Xk ) = A vec(VkW
Tk
)= vec
([AVk , Vk ] [Wk , AWk ]T
)The rank of [AVk Vk ] ∈ Rn,2r , [Wk AWk ] ∈ Rnt ,2r increases but can be controlled usingtruncation operator Tε (like in sign function solver for Lyapunov equations). Low-rankKrylov subspace solvers. [Kressner/Tobler, B/Breiten, Savostyanov/Dolgov, . . . ].
Ideas correspond to separation of variables in continuous-time: b(x , y) ≈ b1(x)b2(y).
©Peter Benner Low-Rank Techniques 24/40
From Matrix Equations to PDEs in d Dimensions
Ideas extend to d ≥ 3: let Aj ∈ Rnj×nj , j = 1, . . . , d , be 1d-discretization matrices
corresponding to ”grid points” x(1)j , x
(2)j , . . . , x
(nj )
j , then under certain assumptions, theLaplace operator (w/ homogeneous boundary conditions) can be discretized as
A = A1 ⊗ I ⊗ · · · ⊗ I + I ⊗ A2 ⊗ I ⊗ · · · ⊗ I + . . .+ I ⊗ · · · ⊗ I ⊗ Ad .
Note: if nj = n ∀ j , then A ∈ Rnd×nd !
If source term can be written as f (x1, x2, . . . , xd) = f1(x1)f2(x2) · · · fd(xd), discretizedright-hand side becomes
b = b1 ⊗ b2 ⊗ · · · ⊗ bd ∈ Rnd with bj = [ fj(x(1)j ), . . . , fj(x
(n)j ) ].
Now if the solution x of Ax = b
1. could be written as x = x1 ⊗ x2 ⊗ · · · ⊗ xd with xj ∈ Rn,
2. and could be computed without forming A, b, but working only with Aj , bj ,
then we could hope to reduce the storage and computational complexity from nd to dn!This requires low-rank tensor techniques.
(Note: for x we wıll need a more complex representation as 1. does not hold in general!)
©Peter Benner Low-Rank Techniques 25/40
From Matrix Equations to PDEs in d Dimensions
Ideas extend to d ≥ 3: let Aj ∈ Rnj×nj , j = 1, . . . , d , be 1d-discretization matrices
corresponding to ”grid points” x(1)j , x
(2)j , . . . , x
(nj )
j , then under certain assumptions, theLaplace operator (w/ homogeneous boundary conditions) can be discretized as
A = A1 ⊗ I ⊗ · · · ⊗ I + I ⊗ A2 ⊗ I ⊗ · · · ⊗ I + . . .+ I ⊗ · · · ⊗ I ⊗ Ad .
Note: if nj = n ∀ j , then A ∈ Rnd×nd !
If source term can be written as f (x1, x2, . . . , xd) = f1(x1)f2(x2) · · · fd(xd), discretizedright-hand side becomes
b = b1 ⊗ b2 ⊗ · · · ⊗ bd ∈ Rnd with bj = [ fj(x(1)j ), . . . , fj(x
(n)j ) ].
Now if the solution x of Ax = b
1. could be written as x = x1 ⊗ x2 ⊗ · · · ⊗ xd with xj ∈ Rn,
2. and could be computed without forming A, b, but working only with Aj , bj ,
then we could hope to reduce the storage and computational complexity from nd to dn!This requires low-rank tensor techniques.
(Note: for x we wıll need a more complex representation as 1. does not hold in general!)
©Peter Benner Low-Rank Techniques 25/40
From Matrix Equations to PDEs in d Dimensions
Ideas extend to d ≥ 3: let Aj ∈ Rnj×nj , j = 1, . . . , d , be 1d-discretization matrices
corresponding to ”grid points” x(1)j , x
(2)j , . . . , x
(nj )
j , then under certain assumptions, theLaplace operator (w/ homogeneous boundary conditions) can be discretized as
A = A1 ⊗ I ⊗ · · · ⊗ I + I ⊗ A2 ⊗ I ⊗ · · · ⊗ I + . . .+ I ⊗ · · · ⊗ I ⊗ Ad .
Note: if nj = n ∀ j , then A ∈ Rnd×nd !
If source term can be written as f (x1, x2, . . . , xd) = f1(x1)f2(x2) · · · fd(xd), discretizedright-hand side becomes
b = b1 ⊗ b2 ⊗ · · · ⊗ bd ∈ Rnd with bj = [ fj(x(1)j ), . . . , fj(x
(n)j ) ].
Now if the solution x of Ax = b
1. could be written as x = x1 ⊗ x2 ⊗ · · · ⊗ xd with xj ∈ Rn,
2. and could be computed without forming A, b, but working only with Aj , bj ,
then we could hope to reduce the storage and computational complexity from nd to dn!
This requires low-rank tensor techniques.
(Note: for x we wıll need a more complex representation as 1. does not hold in general!)
©Peter Benner Low-Rank Techniques 25/40
From Matrix Equations to PDEs in d Dimensions
Ideas extend to d ≥ 3: let Aj ∈ Rnj×nj , j = 1, . . . , d , be 1d-discretization matrices
corresponding to ”grid points” x(1)j , x
(2)j , . . . , x
(nj )
j , then under certain assumptions, theLaplace operator (w/ homogeneous boundary conditions) can be discretized as
A = A1 ⊗ I ⊗ · · · ⊗ I + I ⊗ A2 ⊗ I ⊗ · · · ⊗ I + . . .+ I ⊗ · · · ⊗ I ⊗ Ad .
Note: if nj = n ∀ j , then A ∈ Rnd×nd !
If source term can be written as f (x1, x2, . . . , xd) = f1(x1)f2(x2) · · · fd(xd), discretizedright-hand side becomes
b = b1 ⊗ b2 ⊗ · · · ⊗ bd ∈ Rnd with bj = [ fj(x(1)j ), . . . , fj(x
(n)j ) ].
Now if the solution x of Ax = b
1. could be written as x = x1 ⊗ x2 ⊗ · · · ⊗ xd with xj ∈ Rn,
2. and could be computed without forming A, b, but working only with Aj , bj ,
then we could hope to reduce the storage and computational complexity from nd to dn!This requires low-rank tensor techniques.
(Note: for x we wıll need a more complex representation as 1. does not hold in general!)
©Peter Benner Low-Rank Techniques 25/40
Tensor Techniques
Separation of Variables and Low-rank Approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
n
Approximate: x(i1, . . . , id)︸ ︷︷ ︸tensor
≈∑α
x(1)α (i1)x(2)
α (i2) · · · x(d)α (id)︸ ︷︷ ︸
tensor product decomposition
.
Goals:
Store and manipulate x O(dn) cost instead of O(nd).
Solve equations Ax = b O(dn2) cost instead of O(n2d).
©Peter Benner Low-Rank Techniques 26/40
Data Compression in 2D: Low-Rank Matrices
Discrete separation of variables:x1,1 · · · x1,n
......
xn,1 · · · xn,n
=r∑
α=1
v1,α
...vn,α
[wα,1 · · · wα,n]
+O(ε).
Diagrams:
x
i1 i2
≈ v wα
i1i2
Rank r n.
mem(v) + mem(w) = 2nr n2 = mem(x).
Singular Value Decomposition (SVD)=⇒ ε(r) optimal w.r.t. spectral/Frobenius norm.
©Peter Benner Low-Rank Techniques 27/40
Data Compression in Higher Dimensions
Tensor Trains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Matrix Product States/Tensor Train (TT) format[Wilson ’75, White ’93, Verstraete ’04, Oseledets ’09/’11]:
For indices
ip . . . iq = (ip − 1)np+1 · · · nq + (ip+1 − 1)np+2 · · · nq + · · ·+ (iq−1 − 1)nq + iq,
the TT format can be expressed as
x(i1 . . . id) =r∑
α=1
x(1)α1
(i1) · x(2)α1,α2
(i2) · x(3)α2,α3
(i3) · · · x(d)αd−1,αd
(id)
or
x(i1 . . . id) = x(1)(i1) · · · x(d)(id), x(k)(ik) ∈ Rrk−1×rk .
or
x(k) x(k+1) x(d)x(k−1)x(2)x(1)α1 α2 αk−2 αk−1 αk αk+1 αd−1
i1 i2 ik−1 ik ik+1 id
Storage: O(dnr 2).
©Peter Benner Low-Rank Techniques 28/40
Overloading Tensor Operations
Always work with factors x(k) ∈ Rrk−1×nk×rk instead of full tensors.
Sum z = x + y increase of tensor rank rz = rx + ry .
TT format for a high-dimensional operator
A(i1 . . . id , j1 . . . jd) = A(1)(i1, j1) · · ·A(d)(id , jd)
Matrix-vector multiplication y = Ax ; tensor rank ry = rA · rx .
Additions and multiplications increase TT ranks.
Decrease ranks quasi-optimally via truncation operator Tε using SVD (orQR).
Iterative linear solvers (and preconditioners) can be implemented in TTformat; we use, e.g., TT-GMRES.
©Peter Benner Low-Rank Techniques 29/40
Numerical Examples
Our work
Apply low-rank iterative solvers in TT format to discrete optimality systems resultingfrom
PDE-constrained optimization problems under uncertainty,
discretized by stochastic Galerkin method, i.e., applying Galerkin projection to weakformulation using
implicit Euler (discontinuous Galerkin, dG(0)) in time,
classical finite element methods (FEM) in physical space,
generalized polynomial chaos in N-dimensional parameter space (resulting fromparameterizing uncertain parameters).
This yields naturally a (1 + 2(3) + N)-way tensor representation of solution, which weapproximate in low-rank TT format.
Take home message
Biggest problem solved so far has n = 1.29 · 1015 unknowns (optimality system forunsteady incompressible Navier-Stokes control problem with uncertain viscosity).
Would require ≈ 10 petabytes (PB) = 10, 000 TB to store the solution vector!
Using low-rank tensor techniques, we need ≈ 7 · 107 bytes = 70 GB to solve theoptimality system in MATLAB in less than one hour!
©Peter Benner Low-Rank Techniques 30/40
Numerical Examples
Our work
Apply low-rank iterative solvers in TT format to discrete optimality systems resultingfrom
PDE-constrained optimization problems under uncertainty,
discretized by stochastic Galerkin method.This yields naturally a (1 + 2(3) + N)-way tensor representation of solution, which weapproximate in low-rank TT format.
Take home message
Biggest problem solved so far has n = 1.29 · 1015 unknowns (optimality system forunsteady incompressible Navier-Stokes control problem with uncertain viscosity).
Would require ≈ 10 petabytes (PB) = 10, 000 TB to store the solution vector!
Using low-rank tensor techniques, we need ≈ 7 · 107 bytes = 70 GB to solve theoptimality system in MATLAB in less than one hour!
©Peter Benner Low-Rank Techniques 30/40
Numerical Examples
Our work
Apply low-rank iterative solvers in TT format to discrete optimality systems resultingfrom
PDE-constrained optimization problems under uncertainty,
discretized by stochastic Galerkin methodThis yields naturally a (1 + 2(3) + N)-way tensor representation of solution, which weapproximate in low-rank TT format.
Take home message
Biggest problem solved so far has n = 1.29 · 1015 unknowns (optimality system forunsteady incompressible Navier-Stokes control problem with uncertain viscosity).
Would require ≈ 10 petabytes (PB) = 10, 000 TB to store the solution vector!
Using low-rank tensor techniques, we need ≈ 7 · 107 bytes = 70 GB to solve theoptimality system in MATLAB in less than one hour!
©Peter Benner Low-Rank Techniques 30/40
Numerical Examples
Our work
Apply low-rank iterative solvers in TT format to discrete optimality systems resultingfrom
PDE-constrained optimization problems under uncertainty,
discretized by stochastic Galerkin methodThis yields naturally a (1 + 2(3) + N)-way tensor representation of solution, which weapproximate in low-rank TT format.
Take home message
Biggest problem solved so far has n = 1.29 · 1015 unknowns (optimality system forunsteady incompressible Navier-Stokes control problem with uncertain viscosity).
Would require ≈ 10 petabytes (PB) = 10, 000 TB to store the solution vector!
Using low-rank tensor techniques, we need ≈ 7 · 107 bytes = 70 GB to solve theoptimality system in MATLAB in less than one hour!
©Peter Benner Low-Rank Techniques 30/40
Optimal Control of Unsteady Heat Equation
Consider the optimization problem
J (t, y , u) =1
2||y − y ||2L2(0,T ;D)⊗L2(Ω) +
α
2||std(y)||2L2(0,T ;D) +
β
2||u||2L2(0,T ;D)⊗L2(Ω)
subject, P-almost surely, to∂y(t, x, ω)
∂t−∇ · (a(x, ω)∇y(t, x, ω)) = u(t, x, ω), in (0,T ]×D × Ω,
y(t, x, ω) = 0, on (0,T ]× ∂D × Ω,
y(0, x, ω) = y0, in D × Ω,
where
for any z : D × Ω→ R, z(x, ·) is a random variable defined on the completeprobability space (Ω,F ,P) for each x ∈ D,∃ 0 < amin < amax <∞ s.t. P(ω ∈ Ω : a(x , ω) ∈ [amin, amax] ∀ x ∈ D) = 1.
©Peter Benner Low-Rank Techniques 31/40
Stochastic Galerkin Finite Element Method
Weak formulation of the random PDE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Seek y ∈ H1(0,T ;H1
0 (D)⊗ L2(Ω))
such that, P-almost surely,
〈yt , v〉+ B(y , v) = `(u, v) ∀ v ∈ H10 (D)⊗ L2(Ω),
with the coercive1 bilinear form
B(y , v) :=
∫Ω
∫Da(x, ω)∇y(x, ω) · ∇v(x, ω)dx dP(ω), v , y ∈ H1
0 (D)⊗ L2(Ω),
and
`(u, v) = 〈u(x, ω), v(x, ω)〉
=:
∫Ω
∫Du(x, ω)v(x, ω)dx dP(ω), u, v ∈ H1
0 (D)⊗ L2(Ω).
Coercivity and boundedness of B + Lax-Milgram =⇒ unique solution exists.
1due to the positivity assumption on a(x, ω)
©Peter Benner Low-Rank Techniques 32/40
Stochastic Galerkin Finite Element Method
Weak formulation of the optimality system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Theorem [Chen/Quarteroni 2014, B./Onwunta/Stoll 2016]
Under appropriate regularity assumptions, there exists a unique adjoint state p andoptimal solution (y , u, p) to the optimal control problem for the random unsteady heatequation, satisfying the stochastic optimality conditions (KKT system) for t ∈ (0,T ]P-almost surely
〈yt , v〉+ B(y , v) = `(u, v), ∀ v ∈ H10 (D)⊗ L2(Ω),
〈pt ,w〉 − B∗(p,w) = `(
(y − y) +α
2S(y),w
), ∀w ∈ H1
0 (D)⊗ L2(Ω),
`(βu − p, w) = 0, ∀ w ∈ L2(D)⊗ L2(Ω),
where
S(y) is the Frechet derivative of ‖std(y)‖2L2(0,T ;D);
B∗ is the adjoint operator of B.
©Peter Benner Low-Rank Techniques 33/40
Stochastic Galerkin Finite Element Method
Discretization of the random PDE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
y , p, u are approximated using standard Galerkin ansatz, yieldingapproximations of the form
z(t, x, ω) =P−1∑k=0
J∑j=1
zjk(t)φj(x)ψk(ξ) =P−1∑k=0
zk(t, x)ψk(ξ).
Here,
φjJj=1 are linear finite elements;
ψkP−1k=0 are the P = (N+n)!
N!n!multivariate Legendre polynomials of degree ≤ n.
Implicit Euler/dG(0) used for temporal discretization w/ constant time step τ .
©Peter Benner Low-Rank Techniques 34/40
The Fully Discretized Optimal Control Problem
Discrete first order optimality conditions/KKT system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . τM1 0 −KTt
0 βτM2 τN T
−Kt τN 0
yup
=
τMαy0d
,where
M1 = D ⊗ Gα ⊗M =: D ⊗Mα, M2 = D ⊗ G0 ⊗M,
Kt = Int ⊗[∑N
i=0 Gi ⊗ Ki
]+ (C ⊗ G0 ⊗M),
N = Int ⊗ G0 ⊗M,
and
G0 = diag(⟨ψ2
0
⟩,⟨ψ2
1
⟩, . . . ,
⟨ψ2P−1
⟩), Gi (j , k) =
⟨ξiψjψk
⟩, i = 1, . . . ,N,
Gα = G0 + αdiag(
0,⟨ψ2
1
⟩, . . . ,
⟨ψ2P−1
⟩)(with first moments 〈 . 〉 w.r.t. P),
K0 = M + τK0, Ki = τKi , i = 1, . . . ,N,
M,Ki ∈ RJ×J are the mass and stiffness matrices w.r.t. the spatial discretization, where Ki
corresponds to the contributions of the ith KLE term to the stiffness,
C = −diag(ones,−1), D = diag(
12, 1, . . . , 1, 1
2
)∈ Rnt×nt .
©Peter Benner Low-Rank Techniques 35/40
The Fully Discretized Optimal Control Problem
Discrete first order optimality conditions/KKT system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . τM1 0 −KTt
0 βτM2 τN T
−Kt τN 0
yup
=
τMαy0d
,
Linear system with 3JPnt unknowns!
©Peter Benner Low-Rank Techniques 35/40
Solving the Optimality System
Optimality system leads to saddle point problem[A BT
B 0
].
Very large scale setting, (block-)structured sparsity iterative solution.
Krylov subspace methods for indefinite symmetric systems: MINRES, . . . .
Requires good preconditioner.
Famous three-iterations-convergence result [Murphy/Golub/Wathen 2000]: usingideal preconditioner
P :=
[A 00 −S
]with the Schur complement S := −BA−1BT ,
MINRES finds the exact solution in at most three steps.
Motivates to use mean-based approximate Schur complement preconditioner[A 0
0 S
].
Implemented in TT format TT-MINRES.
©Peter Benner Low-Rank Techniques 36/40
Solving the Optimality System
Optimality system leads to saddle point problem[A BT
B 0
].
Very large scale setting, (block-)structured sparsity iterative solution.
Krylov subspace methods for indefinite symmetric systems: MINRES, . . . .
Requires good preconditioner.
Famous three-iterations-convergence result [Murphy/Golub/Wathen 2000]: usingideal preconditioner
P :=
[A 00 −S
]with the Schur complement S := −BA−1BT ,
MINRES finds the exact solution in at most three steps.
Motivates to use mean-based approximate Schur complement preconditioner[A 0
0 S
].
Implemented in TT format TT-MINRES.
©Peter Benner Low-Rank Techniques 36/40
Solving the Optimality System
Optimality system leads to saddle point problem[A BT
B 0
].
Very large scale setting, (block-)structured sparsity iterative solution.
Krylov subspace methods for indefinite symmetric systems: MINRES, . . . .
Requires good preconditioner.
Famous three-iterations-convergence result [Murphy/Golub/Wathen 2000]: usingideal preconditioner
P :=
[A 00 −S
]with the Schur complement S := −BA−1BT ,
MINRES finds the exact solution in at most three steps.
Motivates to use mean-based approximate Schur complement preconditioner[A 0
0 S
].
Implemented in TT format TT-MINRES.
©Peter Benner Low-Rank Techniques 36/40
Solving the Optimality System
Optimality system leads to saddle point problem[A BT
B 0
].
Very large scale setting, (block-)structured sparsity iterative solution.
Krylov subspace methods for indefinite symmetric systems: MINRES, . . . .
Requires good preconditioner.
Famous three-iterations-convergence result [Murphy/Golub/Wathen 2000]: usingideal preconditioner
P :=
[A 00 −S
]with the Schur complement S := −BA−1BT ,
MINRES finds the exact solution in at most three steps.
Motivates to use mean-based approximate Schur complement preconditioner[A 0
0 S
].
Implemented in TT format TT-MINRES.
©Peter Benner Low-Rank Techniques 36/40
Solving the Optimality System
Optimality system leads to saddle point problem[A BT
B 0
].
Very large scale setting, (block-)structured sparsity iterative solution.
Krylov subspace methods for indefinite symmetric systems: MINRES, . . . .
Requires good preconditioner.
Famous three-iterations-convergence result [Murphy/Golub/Wathen 2000]: usingideal preconditioner
P :=
[A 00 −S
]with the Schur complement S := −BA−1BT ,
MINRES finds the exact solution in at most three steps.
Motivates to use mean-based approximate Schur complement preconditioner[A 0
0 S
].
Implemented in TT format TT-MINRES.
©Peter Benner Low-Rank Techniques 36/40
Solving the Optimality System
Optimality system leads to saddle point problem[A BT
B 0
].
Very large scale setting, (block-)structured sparsity iterative solution.
Krylov subspace methods for indefinite symmetric systems: MINRES, . . . .
Requires good preconditioner.
Famous three-iterations-convergence result [Murphy/Golub/Wathen 2000]: usingideal preconditioner
P :=
[A 00 −S
]with the Schur complement S := −BA−1BT ,
MINRES finds the exact solution in at most three steps.
Motivates to use mean-based approximate Schur complement preconditioner[A 0
0 S
].
Implemented in TT format TT-MINRES.
©Peter Benner Low-Rank Techniques 36/40
Numerical Results
Mean-Based Preconditioned TT-MINRES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TT-MINRES # iter (t) # iter (t) # iter (t)
nt 25 26 28
dim(A) = 3JPnt 10, 671, 360 21, 342, 720 85, 370, 880
α = 1, tol = 10−3
β = 10−5 6 (285.5) 6 (300.0) 8 (372.2)
β = 10−6 4 (77.6) 4 (130.9) 4 (126.7)
β = 10−8 4 (56.7) 4 (59.4) 4 (64.9)
α = 0, tol = 10−3
β = 10−5 4 (207.3) 6 (366.5) 6 (229.5)
β = 10−6 4 (153.9) 4 (158.3) 4 (172.0)
β = 10−8 2 (35.2) 2 (37.8) 2 (40.0)
©Peter Benner Low-Rank Techniques 37/40
Optimal Control 3D Stokes-Brinkman w/ Uncertain Viscosity
State
Control
Mean Standard deviation
Full size: nxnξnt ≈ 3 · 109. Reduction:mem(TT )
mem(full)= 0.002.
©Peter Benner Low-Rank Techniques 38/40
Conclusions (High-dimensional PDEs)
Low-rank tensor solver for unsteady heat and Navier-Stokes equations withuncertain viscosity.
Similar techniques used for Stokes(-Brinkman) optimal control problems.
Adapted AMEn (TT) solver to saddle point systems.
To consider next:
Navier-Stokes: many parameters coming from uncertain geometry orKarhunen-Loeve expansion of random fields;Initial results: the more parameters, the more significant is the complexityreduction w.r.t. memory — up to a factor of 109 for the control problem for abackward facing step.exploit multicore technology (need efficient parallelization of AMEn).
©Peter Benner Low-Rank Techniques 39/40
Conclusions (High-dimensional PDEs)
Low-rank tensor solver for unsteady heat and Navier-Stokes equations withuncertain viscosity.
Similar techniques used for Stokes(-Brinkman) optimal control problems.
Adapted AMEn (TT) solver to saddle point systems.
To consider next:
Navier-Stokes: many parameters coming from uncertain geometry orKarhunen-Loeve expansion of random fields;Initial results: the more parameters, the more significant is the complexityreduction w.r.t. memory — up to a factor of 109 for the control problem for abackward facing step.exploit multicore technology (need efficient parallelization of AMEn).
©Peter Benner Low-Rank Techniques 39/40
References for Optimal Control under Uncertainty
P. Benner, S. Dolgov, A. Onwunta, and M. Stoll.Solving optimal control problems governed by random Navier-Stokes equations using low-rank methods.arXiv Preprint arXiv:1703.06097, March 2017.
P. Benner, S. Dolgov, A. Onwunta, and M. Stoll.Low-rank solvers for unsteady Stokes-Brinkman optimal control problem with random data.Computer Methods in Applied Mechanics and Engineering, 304:26–54, 2016.
P. Benner, A. Onwunta, and M. Stoll.Low rank solution of unsteady diffusion equations with stochastic coefficients.SIAM/ASA Journal on Uncertainty Quantification, 3(1):622–649, 2015.
P. Benner, A. Onwunta, and M. Stoll.Block-diagonal preconditioning for optimal control problems constrained by PDEs with uncertain inputs.SIAM Journal on Matrix Analysis and Applications, 37(2):491–518, 2016.
C.E. Powell and D.J. Silvester.Preconditioning steady-state Navier–Stokes equations with random data.SIAM Journal on Scientific Computing, 34(5):A2482–A2506, 2012.
E. Rosseel and G.N. Wells.Optimal control with stochastic PDE constraints and uncertain controls.Computer Methods in Applied Mechanics and Engineering, 213–216:152–167, 2012.
M. Stoll and T. Breiten.A low-rank in time approach to PDE-constrained optimization.SIAM Journal on Scientific Computing, 37(1):B1–B29, 2015.
©Peter Benner Low-Rank Techniques 40/40