Krylov subspace methods de Sturler/crcd_01c.pdf · ©2001 Eric de Sturler A cunning plan: Since...
Transcript of Krylov subspace methods de Sturler/crcd_01c.pdf · ©2001 Eric de Sturler A cunning plan: Since...
©2001 Eric de Sturler
Eric de Sturler
Department of Computer Science
University of Illinois at Urbana-Champaign
[email protected] www-faculty.cs.uiuc.edu/sturler
Krylov subspace methods
Hermitian Problems
©2001 Eric de Sturler
Consider again how GMRES builds an orthogonal basis for the
Krylov space :Km+1(A, r0)
Verify that the (Arnoldi) algorithmv1 = r0/ær0æ2;
for generates the following recurrence:k = 1 : m,
vk+1 = Avk;
for .j = 1 : k, AVm = Vm+1Hm+1,m
h j,k = v jHvk+1;
What does look like?vk+1 = vk+1 − h j,kvk; Hm+1,m
end
Prove is orthogonal.hk+1,k = ævk+1æ2; Vm+1
vk+1 = vk+1/hk+1,k;
end Note .Hm+1,m = Vm+1H AVm
and . So bothrange(Vm) = Km(A, r0) range(Vm+1) = Km+1(A, r0)
and from GCR contained in .range(Um) range(Cm) range(Vm+1)
MINRES
CRCD_01c.PRZ 1-2
©2001 Eric de Sturler
Now consider A being Hermitian: AH= A
Another way to write the recurrence relation from Arnoldi:
, AVm = Vm+1Hm= VmHm + vm+1´m
Thm+1,m
where is the upper part of .Hm m %m Hm
So, .VmHAVm = Vm
HVmHm + Vm
Hvm+1´m
Thm+1,m = Hm
since , and so (VmHAVm )
H= Vm
HA
HVm = Vm
HAVm A
H= A
must be Hermitian as well.Hm
This has some important consequences ...
MINRES
©2001 Eric de Sturler
A Hermitian upper Hessenberg matrix is tridiagonal!
This means that (in exact arithmetic) we need to orthogonalize each
new vector only against the vectors and .Av i v i−1 v i
We could solve the least squares problem in the same way as for
GMRES, except that we save on orthogonalizations (inner products
and vector updates).
What is the computational cost of iterations of GMRES?m
Theorem: Let be Hermitian and let be the vectorsA v1, v2, ¢ , vm
generated by the Arnoldi algorithm (so they span ). Then Km(A, v1)
and so .Av iΩv1, v2, ¢ , vi−2 Av iΩ span v1, v2,¢, v i−2
Proof:
MINRES
CRCD_01c.PRZ 3-4
©2001 Eric de Sturler
The algorithm now proceeds as follows:
Lanczos recurrence: (T for tridiagonal).AVm = Vm+1Tm
Lanczos is Arnoldi in the Hermitian case (2 orthogonalizations).
Solve just as in GMRES:ym = arg minær0 − AVmyæ2
We have , AVm = Vm+1Tm= Vm+1Q
m
Rm
and we compute (solving least squares problem).ym = Rm−1Qm
HVm+1
H r0
Every step we update the QR-decomposition of and solve Ti
. Riyi = Qi
H´1ær0æ2
At end we update and .xm = x0 +Vmym rm = r0 − Vm+1ym
Note that each step we only orthogonalize on previous 2 vectors.
What would seem an obvious improvement. Can we do that here?
MINRES
©2001 Eric de Sturler
Since we only orthogonalize on the previous two vectors, we would
like to discard the other vectors.
However, we need them for the update at the end.
Can we update every step and discard the vectors ?v i
The problem is that changes and hence changes (in general)Rm ym
completely. So we need all previous .v i
We need a trick.
MINRES
CRCD_01c.PRZ 5-6
©2001 Eric de Sturler
A cunning plan:
Since changes completely every step, apply a change of variablesymAlternative for update : VmymTake and .Wm = VmRm
−1 ym = Rmym = RmRm−1Qm
H´1ær0æ2 = QmH´1ær0æ2
Then and each iteration only the last component of Wmym = Vmym changes. So we can update without keeping all .ym Wmym wi
from the Givens QR decomposition of a tridiagonal matrix isRm
uppertriangular with 2 upper diagonals.
columns are found by solving each iteration.Wm WmRm = Vm
So looking at the last (=the new) column we have:
, only not known:wmrm,m +wm−1rm−1,m + wm−2rm−2,m = vm wm
wm = rm,m−1 (vm − wm−1rm−1,m −wm−2rm−2,m)
MINRES
©2001 Eric de Sturler
Update solution: xm = x0 +Vmym = x0 +Wmym
Since , contrary to , changes only in its last position we can doym ymthe update iteration-wise:
xm = x0 + i=1
m
wiyi,m = x0 + i=1
m−1
w iy i,m + wmym,m = xm−1 + wmym,m
How many vectors do we need to keep (indepedenf of # iterations)?
Do we need to continue the iteration?rm
What would be an update formula for ?rm
MINRES
CRCD_01c.PRZ 7-8
©2001 Eric de Sturler
MINRES: Ax = b
choose and , set ;x0 d r0 = b − Ax0 tol k = 0
while doærkæ > tol
k = k + 1;
vk+1 = Avk − tk,kvk − tk−1,kvk−1;
tk+1,k = ævk+1æ2; vk+1 = vk+1/tk+1,k;
Update QR: Qk+1 = QkGk;Rk = GkH(Qk
HTk); yk,k = qk
H´1ær0æ2
, , t Qk Rk yk hQk
H´1ær0æ2;
wk = rk,k−1 (vk − wk−1rk−1,k − wk−2rk−2,k);
xk = xk−1 + wkyk,kend
MINRES
©2001 Eric de Sturler
Hermitian matrices: Error minimization in the A-norm
We are solving with initial guess and Ax = b x0 d r0 = b − Ax0 is the solution to .x Ax = b
The error at iteration is , where is the ithi ei = x − (x0 + zi) z iKi(A, r0)
update to the initial guess.
Theorem:
Let be Hermitian, then the vector satisfies A z iKi(A, r0)
iff satisfieszi = arg minæx − (x0 + z)æA : zKi(A, r0) r i h r0 − Az i.r iΩK
i(A, r0)
The most important algorithm of this class is the Conjugate
Gradient Algorithm.
Conjugate Gradients
CRCD_01c.PRZ 9-10
©2001 Eric de Sturler
Proof:
zi = arg minæx − (x0 + z)æA : zKi(A, r0) w (x − x0) − ziΩAKi(A, r0)
We know . Ki(A, r0) = span r0, r1,¢, ri−1This gives rkΩA(x − x0 − zi ) for k = 0,¢, i − 1 g
…A(x − x0 − zi), rk for k = 0,¢, i − 1 g
…b − Ax0 − Azi, rk for k = 0,¢, i − 1 g
…r0 − Azi, rk for k = 0,¢, i − 1 g
…r i, rk for k = 0,¢, i − 1 g
r iΩKi(A, r0)
Conjugate Gradients
©2001 Eric de Sturler
Lanczos iteration:
Choose ; q1 0 = 0; q0 = 0;
for doi = 1, 2,¢
qi+1 = Aq i;
i = …Aq i, q i ; qi+1 = qi+1 − iqi; q i+1 = q i+1 − i−1qi−1;
i = æqi+1æ2; q i+1 = qi+1/ i;
end
Show sets .qi+1 = qi+1 − i−1q i−1 qi+1Ωq i−1(one argument is the symmetry of the Hessenberg matrix for
Arnoldi, give another)
This generates the recurrence relation:
, where , .AQi = Q iT i + iqi+1´ iT Q i = [q1 q2 £ qi ] Ti =
1 1 0 £
1 2 2 •
0 2 • •
§ •• •
Conjugate Gradients
CRCD_01c.PRZ 11-12
©2001 Eric de Sturler
Use Lanczos orthonormal basis for minimizing A-norm of error.
iff satisfieszi = arg minæx − (x0 + z)æA : zKi(A, r0) r i h r0 − Az i.riΩK
i(A, r0)
q1 = r0/ær0æ2;
Lanczos method:AQ i = Q iT i + iqi+1´iT
Solve r0 − AQiyiΩQ i w Qi
H(ær0æ2q1 − AQ iyi) = 0w
.Qi
H(ær0æ2q1 − AQ iy i) = 0w ær0æ2´1 −Qi
HAQ iy i = 0
Notice .range(Qi) = spanr0, r1,¢, r i−1
AQ i = Q iT i + iqi+1´iT u Q i
HAQi = T i
So we reduced the problem to solve :ær0æ2´1 − Tiyi = 0
yi = T i−1
−1 ´1ær0æ2
Conjugate Gradients
©2001 Eric de Sturler
In order to update step-by-step we use same trick as in MINRES:
Let then , where is unit lowerTi = L iDiLi
H yi = Li
−HDi
−1Li
−1´1ær0æ2 L i
bi-diagonal with lower diagonal coeff.s , index d columnl1, l2,¢, l i−2
Change of variables:
and : Pi = QiL i
−H yi = Di
−1L i
−1´1ær0æ2 Qiy i = Piy i
Notice that each iteration only the last component of changes.yiFrom we get a recurrence for : P iLi
H = Qi p i pi + l i−1p i−1 = q i (p1 = q1)
So every new step we compute a new , we update theqi+1decomposition of and from that and .Ti yi+1 p i+1
xi = xi−1 + piyi,i (where is ith comp of vector )r i = r i−1 −Apiy i,i = qi+1iyi,i yi,i yi
Conjugate Gradients
CRCD_01c.PRZ 13-14
©2001 Eric de Sturler
(Easier form of) CG algorithm: Ax = b
Choose ; x0 t r0 = b − Ax0; p1 = r0 i = 0
while doæriæ2 > tol
i = i + 1;
i =…r i−1,r i−1
…pi−1,Api−1 ;
x i = x i + ip i;
r i = r i−1 − iApi;
i =…ri,ri
…ri−1,ri−1 ;
p i = r i − ipi−1;
end
Conjugate Gradients
©2001 Eric de Sturler
0 20 40 60 80 100 120 140-10
-8
-6
-4
-2
0
2
CG
GMRES
log10|r|2
# iterations (matvecs)
Conjugate Gradients
CRCD_01c.PRZ 15-16
©2001 Eric de Sturler
Eric de Sturler
Department of Computer Science
University of Illinois at Urbana-Champaign
[email protected] www-faculty.cs.uiuc.edu/sturler
Krylov subspace methods
Comparing Methods
©2001 Eric de Sturler
GMRES: Ax = b
choose (e.g. ) and x0 x0 = 0 tol
r0 = b − Ax0; k = 0; v1 = r0/ær0æ2;
while ærkæ2 > tol
k = k + 1;
vk+1 = Avk;
for j = 1 : k,
hj,k = v jHvk+1; vk+1 = vk+1 − hj,kvk;
end
hk+1,k = ævk+1æ2; vk+1 = vk+1/hk+1,k;
update QR-dec: Hk = Qk+1Rk
ærkæ2 = qk+1H ´1 ær0æ2
end
yk = Rk−1Q
k
H´1ær0æ2; xk = x0 + Vkyk;
(or simply )rk = r0 − Vk+1Hkyk = Vk+1 I −Q
kQ
k
H ´1ær0æ2; rk = b − Axk
GMRES
CRCD_01c.PRZ 17-18
Swiss Cen ter for Scientific Computing© Eric de Sturler
Iterative Methods: Cost
! Many Cheap Iterations versus Minimum Number of Expensive Iterations" same as sequential but issues determining cost change
! four main kernels" matrix-vector product: comp: 2*N*nz1 comm: “neighbour”" preconditioner: comp: 2*N*nz2 comm: “neighbour” (& global)" vector update: comp: 2*N comm: none" inner product: comp: 2*N comm: global
" Methods— GMRES, GCR, FOM, BiCG, CGS, BiCGSTAB(l)— short recurrence: cheap iteration / many iterations— full orthogonalization: minimal number of iterations / expensive
" Matrix vector product often linked with grid/domain partitioning— partition scheme to minimize comm. volume/number of messages— separate local and nonlocal references— overlap communication (latency hiding)
©2001 Eric de Sturler
u=un
u=us
u=uw u=ueLu=f
Lu =−(pux)x− (quy)y + rux+ suy + tu= f
Convection-Diffusion(-Reaction) Equation
Dirichlet boundary conditions
Model Problem
CRCD_01c.PRZ 19-20
©2001 Eric de Sturler
CG vs GMRES for various mesh widths (h)
0 50 100 150 200 250
-10
-8
-6
-4
-2
0
2
1/11 1/21
1/31
1/51 1/71
log10|r|2
# iterations (matvecs)
CG
GMRES
©2001 Eric de Sturler
0 100 200 300 400 500 600 700 800 900
0
1
2
3
4
5
6
7
8
h=1/11 h=1/31
Eigenvalues
min max cond. nr.
10 0.162 7.84 48.4
20 4.47e-2 7.95 178
30 2.05e-2 7.98 389
50 7.57e-3 7.99 1.06e3
70 3.92e-3 8.00 2.04e3
Eigenvalues for various h
CRCD_01c.PRZ 21-22
©2001 Eric de Sturler
CG vs GMRES
0 5 10 15 20 25 30 35
-10
-8
-6
-4
-2
0
2
CG
GMRES
log10|r|2
# iterations (matvecs)
p=q=1; t = 0; f = 0; h=1/ 11;
us = 0; uw =1; un = 1; ue = 0;
©2001 Eric de Sturler
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
-10
-8
-6
-4
-2
0
2
CG
GMRES
log10|r|2
p=q=1; t = 0; f = 0; h=1/ 11;
us = 0; uw =1; un = 1; ue = 0;
# time (s)
CG vs GMRES
CRCD_01c.PRZ 23-24
©2001 Eric de Sturler
0 20 40 60 80 100 120 140 160
-10
-8
-6
-4
-2
0
2
CG
GMRESlog10|r|2
# iterations (matvecs)
p=q=1; t = 0; f = 0; h=1/ 51;
us = 0; uw =1; un = 1; ue = 0;
CG vs GMRES
©2001 Eric de Sturler
CG vs GMRES (time)
0 2 4 6 8 10 12 14 16 18 20
-10
-8
-6
-4
-2
0
2
CG
GMRES
time (s)
log10|r|2
p=q=1; t = 0; f = 0; h=1/ 51;
us = 0; uw =1; un = 1; ue = 0;
CRCD_01c.PRZ 25-26
©2001 Eric de Sturler
Iterations for GMRES(m)
p=q=1; t = 0; f = 0; h=1/ 51;
us = 0; uw =1; un = 1; ue = 0;
0 500 1000 1500 2000 2500 3000 3500
-10
-8
-6
-4
-2
0
2
3
5 10
30
50 full (156)
log10|r|2
# iterations (matvecs)
©2001 Eric de Sturler
Time for GMRES(m)
0 5 10 15 20 25 30 35 40
-10
-8
-6
-4
-2
0
2
30
50
10
5
3
full (156)
time (s)
log10|r|2
p=q=1; t = 0; f = 0; h=1/ 51;
us = 0; uw =1; un = 1; ue = 0;
CRCD_01c.PRZ 27-28
©2001 Eric de Sturler
CG for a non-Hermitian Problem
0 100 200 300 400 500 600
-5
-4
-3
-2
-1
0
1
p=q=1; r=s=5; h=1/31;
us=0; uw=0; un=1; ue=1;
©2001 Eric de Sturler
Eigenvalues
2.5 3 3.5 4 4.5 5 5.5
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
p=q=1; r=s=70; h=1/31;
us=0; uw=0; un=1; ue=1;
real
imaginary
CRCD_01c.PRZ 29-30
©2001 Eric de Sturler
GMRES for varying convection
0 10 20 30 40 50 60 70 80 90 100
-12
-10
-8
-6
-4
-2
0
2
# iterations (matvecs)
log10|r|2
p=q=1; r=s: given; h=1/ 31;
us = 0; uw =0; un = 1; ue = 1;
r=s=250
100
10
70
©2001 Eric de Sturler
GMRES(m) with r=s=10
0 50 100 150
-10
-8
-6
-4
-2
0
2
# iterations (matvecs)
log10|r|2
10
full 2550
p=q=1; r=s=10; h=1/ 31;
us = 0; uw =0; un = 1; ue = 1;
CRCD_01c.PRZ 31-32
©2001 Eric de Sturler
GMRES(m) with r=s=70
0 5 10 15 20 25 30 35 40
-12
-10
-8
-6
-4
-2
0
full
50
10
25
# iterations (matvecs)
log10|r|2
p=q=1; r=s=70; h=1/ 31;
us = 0; uw =0; un = 1; ue = 1;
©2001 Eric de Sturler
GMRES(m) with r=s=100
0 10 20 30 40 50 60-10
-8
-6
-4
-2
0
2
10full
25
50
# iterations (matvecs)
log10|r|2
p=q=1; r=s=100; h=1/ 31;
us = 0; uw =0; un = 1; ue = 1;
CRCD_01c.PRZ 33-34
©2001 Eric de Sturler
GMRES(m) with r=s=250
0 20 40 60 80 100 120
-10
-8
-6
-4
-2
0
2
10
full
25
50
# iterations (matvecs)
log10|r|2
p=q=1; r=s=250; h=1/ 31;
us = 0; uw =0; un = 1; ue = 1;
©2001 Eric de Sturler
GMRES(m) after shifting spectrum
0 200 400 600 800 1000 1200
-8
-7
-6
-5
-4
-3
-2
-1
0
10
full
2550100
120
# iterations (matvecs)
log10|r|2
p=q=1; r=s=70; h=1/ 31;
us = 0; uw =0; un = 1; ue = 1;
A=A-3.65*I
CRCD_01c.PRZ 35-36
©2001 Eric de Sturler
0 100 200 300 400 500 600
-8
-6
-4
-2
0
2
4
full (119)
3 5
10 20 30 50 75
76
# iterations (matvecs)
log10|r|2
p=q=1; r=200; s=-200; t=0; f=0; h=1/51;
us = 0; uw =100; un = 100; ue = 0;
GMRES(m)
©2001 Eric de Sturler
0 20 40 60 80 100 120 140 160 180 200
-8
-6
-4
-2
0
2
4
GMRES(100)
# iterations (matvecs)
log10|r|2
p=q=1; r=200; s=-200; t=0; f=0; h=1/51;
us = 0; uw =100; un = 100; ue = 0;
GMRES(m)
CRCD_01c.PRZ 37-38