AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526:...
Transcript of AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526:...
AMS526: Numerical Analysis I(Numerical Linear Algebra)
Lecture 15: Reduction to Hessenberg and Tridiagonal Forms;Rayleigh Quotient Iteration
Xiangmin Jiao
Stony Brook University
Xiangmin Jiao Numerical Analysis I 1 / 25
Outline
1 Schur Factorization
2 Reduction to Hessenberg and Tridiagonal Forms
3 Rayleigh Quotient Iteration
Xiangmin Jiao Numerical Analysis I 2 / 25
“Obvious” Algorithms
Most obvious method is to find roots of characteristic polynomialpA
(�), but it is very ill-conditionedAnother idea is power iteration, using fact that
xkxk ,
AxkAxk ,
A2xkA2xk ,
A3xkA3xk , . . .
converge to an eigenvector corresponding to the largest eigenvalue ofA in absolute value, but it may converge very slowly
Instead, compute a eigenvalue-revealing factorization, such as Schurfactorization
A = QTQ⇤
by introducing zeros, using algorithms similar to QR factorization
Xiangmin Jiao Numerical Analysis I 3 / 25
“Obvious” Algorithms
Most obvious method is to find roots of characteristic polynomialpA
(�), but it is very ill-conditionedAnother idea is power iteration, using fact that
xkxk ,
AxkAxk ,
A2xkA2xk ,
A3xkA3xk , . . .
converge to an eigenvector corresponding to the largest eigenvalue ofA in absolute value, but it may converge very slowly
Instead, compute a eigenvalue-revealing factorization, such as Schurfactorization
A = QTQ⇤
by introducing zeros, using algorithms similar to QR factorization
Xiangmin Jiao Numerical Analysis I 3 / 25
A Fundamental Difficulty
However, eigenvalue-revealing factorization cannot be done in finitenumber of steps:
Any eigenvalue solver must be iterative
To see this, consider a general polynomial of degree m
p(z) = zm + am�1
zm�1 + · · ·+ a1
z + a0
There is no closed-form expression for the roots of p: (Abel, 1842)In general, the roots of polynomial equations higher than fourthdegree cannot be written in terms of a finite number of operations
Xiangmin Jiao Numerical Analysis I 4 / 25
A Fundamental Difficulty Cont’d
However, the roots of pA
are the eigenvalues of the companion matrix
A =
2
6666664
0 �a0
1 0 �a1
1 . . . .... . . 0 �a
m�2
1 �am�1
3
7777775
Therefore, in general, we cannot find the eigenvalues of a matrix in afinite number of stepsIn practice, however, there are algorithms that converge to desiredprecision in a few iterations
Xiangmin Jiao Numerical Analysis I 5 / 25
Schur Factorization and Diagonalization
Most eigenvalue algorithms compute Schur factorization A = QTQ⇤
by transforming A with similarity transformations
Q⇤j
· · ·Q⇤2
Q⇤1| {z }
Q
⇤
A Q1
Q2
· · ·Qj| {z }
Q
,
where Qi
are unitary matrices, which converge to T as j ! 1Note: Real matrices might need complex Schur forms and eigenvaluesQuestion: For Hermitian A, what matrix will the sequence convergeto?
Xiangmin Jiao Numerical Analysis I 6 / 25
Schur Factorization and Diagonalization
Most eigenvalue algorithms compute Schur factorization A = QTQ⇤
by transforming A with similarity transformations
Q⇤j
· · ·Q⇤2
Q⇤1| {z }
Q
⇤
A Q1
Q2
· · ·Qj| {z }
Q
,
where Qi
are unitary matrices, which converge to T as j ! 1Note: Real matrices might need complex Schur forms and eigenvaluesQuestion: For Hermitian A, what matrix will the sequence convergeto?
Xiangmin Jiao Numerical Analysis I 7 / 25
Outline
1 Schur Factorization
2 Reduction to Hessenberg and Tridiagonal Forms
3 Rayleigh Quotient Iteration
Xiangmin Jiao Numerical Analysis I 8 / 25
Two Phases of Eigenvalue Computations
General A: First convert to upper-Hessenberg form, then to uppertriangular2
6664
⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥
3
7775
A 6=A⇤
Phase 1����!
2
6664
⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥
⇥ ⇥
3
7775
upper-Hessenberg
Phase 2����!
2
6664
⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥
⇥ ⇥ ⇥⇥ ⇥
⇥
3
7775
triangular
Hermitian A: First convert to tridiagonal form, then to diagonal2
6664
⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥
3
7775
A=A⇤
Phase 1����!
2
6664
⇥ ⇥⇥ ⇥ ⇥
⇥ ⇥ ⇥⇥ ⇥ ⇥
⇥ ⇥
3
7775
tridiagonal
Phase 2����!
2
6664
⇥⇥
⇥⇥
⇥
3
7775
diagonal
In general, phase 1 is direct and requires O(m3) flops, and phase 2 isiterative and requires O(m) iterations, and O(m3) flops fornon-Hermitian matrices and O(m2) flops for Hermitian matrices
Xiangmin Jiao Numerical Analysis I 9 / 25
Introducing Zeros by Similarity Transformations
First attempt: Compute Schur factorization A = QTQ⇤ by applyingHouseholder reflectors from both left and right
2
6664
⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥
3
7775
A
Q
⇤1 ·
��!
2
6664
x x x x x
0 x x x x
0 x x x x
0 x x x x
0 x x x x
3
7775
Q⇤1 A
·Q1��!
2
6664
x x x x x
x x x x x
x x x x x
x x x x x
x x x x x
3
7775
Q⇤1 AQ1
Unfortunately, the right multiplication destroys the zeros introduced byQ⇤
1
This would not work because of Abel’s theoremHowever, the subdiagonal entries typically decrease in magnitude
Xiangmin Jiao Numerical Analysis I 10 / 25
The Hessenberg Form
Second attempt: try to compute upper Hessenberg matrix H similar toA:
2
6664
⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥
3
7775
A
Q
⇤1 ·
��!
2
6664
⇥ ⇥ ⇥ ⇥ ⇥x x x x x
0 x x x x
0 x x x x
0 x x x x
3
7775
Q⇤1 A
·Q1��!
2
6664
⇥ x x x x
⇥ x x x x
x x x x
x x x x
x x x x
3
7775
Q⇤1 AQ1
The zeros introduced by Q⇤1
A were not destroyed this time!Continue with remaining columns would result in Hessenberg form:
Q
⇤2 ·
��!
2
6664
⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥
x x x x
0 x x x
0 x x x
3
7775
Q⇤2 Q⇤
1 AQ1
·Q2��!
2
6664
⇥ ⇥ x x x
⇥ ⇥ x x x
⇥ x x x
x x x
x x x
3
7775
Q⇤2 Q⇤
1 AQ1Q2
· · ·
Xiangmin Jiao Numerical Analysis I 11 / 25
The Hessenberg Form
After m � 2 steps, we obtain the Hessenberg form:
Q⇤m�2
· · ·Q⇤2
Q⇤1| {z }
Q
⇤
A Q1
Q2
· · ·Qm�2| {z }
Q
= H =
2
66664
⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥
⇥ ⇥
3
77775
For Hermitian matrix A, H is Hermitian and hence is tridiagonal
Xiangmin Jiao Numerical Analysis I 12 / 25
Householder Reduction to Hessenberg
Householder Reduction to Hessenberg Formfor k = 1 to m � 2
x = Ak+1:m,k
vk
= sign(x1
)kxk2
e1
+ xvk
= vk
/kvk
k2
Ak+1:m,k:m = A
k+1:m,k:m � 2vk
(v⇤k
Ak+1:m,k:m)
A1:m,k+1:m = A
1:m,k+1:m � 2(A1:m,k+1:mv
k
)v⇤k
Note: Q is never formed explicitly.Operation count
⇠m�2X
k=1
4(m � k)2 + 4m(m � k) ⇠ 4m3/3 + 4m3 � 4m3/2 = 10m3/3
Xiangmin Jiao Numerical Analysis I 13 / 25
Reduction to Tridiagonal FormIf A is Hermitian, then
Q⇤m�2
· · ·Q⇤2
Q⇤1| {z }
Q
⇤
A Q1
Q2
· · ·Qm�2| {z }
Q
= H =
2
666664
⇥ ⇥⇥ ⇥ ⇥
. . . . . . . . .⇥ ⇥ ⇥
⇥ ⇥
3
777775
For Hermitian A, operation count would be same as Householder QR:4m3/3
IFirst, taking advantage of sparsity, cost of applying right reflectors is
also 4(m � k)k instead of 4m(m � k), so cost is
⇠m�2X
k=1
8(m � k)2 ⇠ 8m
3/3
ISecond, taking advantage of symmetry, cost is reduced by 50% to
4m
3/3
Xiangmin Jiao Numerical Analysis I 14 / 25
Stability of Hessenberg Reduction
Theorem
Householder reduction to Hessenberg form is backward stable, in that
QHQ⇤ = A + �A,k�AkkAk = O(✏machine)
for some �A 2 Cm⇥m
Note: Similar to Householder QR, Q is exactly unitary based on some vk
Xiangmin Jiao Numerical Analysis I 15 / 25
Outline
1 Schur Factorization
2 Reduction to Hessenberg and Tridiagonal Forms
3 Rayleigh Quotient Iteration
Xiangmin Jiao Numerical Analysis I 16 / 25
Solving Eigenvalue Problems
All eigenvalue solvers must be iterativeIterative algorithms have multiple facets:
1Basic idea behind the algorithms
2Convergence and techniques to speed-up convergence
3Efficiency of implementation
4Termination criteria
We will focus on first two aspects
Xiangmin Jiao Numerical Analysis I 17 / 25
Simplification: Real Symmetric Matrices
We will consider eigenvalue problems for real symmetric matrices, i.e.A = AT 2 Rm⇥m, and Ax = �x for x 2 Rm
INote: x
⇤ = x
T, and kxk =
px
Tx
A has real eigenvalues �1
,�2
, . . . , �m
and orthonormal eigenvectorsq1
, q2
, . . . , qm
, where kqj
k = 1Eigenvalues are often also ordered in a particular way (e.g., orderedfrom large to small in magnitude)In addition, we focus on symmetric tridiagonal form
IWhy? Because phase 1 of two-phase algorithm reduces matrix into
tridiagonal form
Xiangmin Jiao Numerical Analysis I 18 / 25
Rayleigh Quotient
The Rayleigh quotient of x 2 Rm is the scalar
r(x) =xTAxxT x
For an eigenvector x , its Rayleigh quotient is r(x) = xT�x/xT x = �,the corresponding eigenvalue of xFor general x , r(x) = ↵ that minimizes kAx � ↵xk
2
.x is eigenvector of A() rr(x) = 2
x
T
x
(Ax � r(x)x) = 0 with x 6= 0r(x) is smooth and rr(q
j
) = 0 for any j , and therefore isquadratically accurate:
r(x)� r(qJ
) = O(kx � qJ
k2) as x ! qJ
for some J
Xiangmin Jiao Numerical Analysis I 19 / 25
Power Iteration
Simple power iteration for largest eigenvalue
Algorithm: Power Iterationv (0) =some unit-length vectorfor k = 1, 2, . . .
w = Av (k�1)
v (k) = w/kwk�(k) = r(v (k)) = (v (k))TAv (k)
Termination condition is omitted for simplicity
Xiangmin Jiao Numerical Analysis I 20 / 25
Convergence of Power IterationExpand initial v (0) in orthonormal eigenvectors q
i
, and apply Ak :
v (0) = a1
q1
+ a2
q2
+ · · ·+ am
qm
v (k) = ck
Akv (0)
= ck
(a1
�k
1
q1
+ a2
�k
2
q2
+ · · ·+ am
�k
m
qm
)
= ck
�k
1
(a1
q1
+ a2
(�2
/�1
)kq2
+ · · ·+ am
(�m
/�1
)kqm
)
If |�1
| > |�2
| � · · · � |�m
| � 0 and qT
1
v (0) 6= 0, this gives
kv (k) � (±q1
)k = O⇣|�
2
/�1
|k⌘, |�(k) � �
1
| = O⇣|�
2
/�1
|2k
⌘
where ± sign is chosen to be sign of qT
1
v (k)
It finds the largest eigenvalue (unless eigenvector is orthogonal to v (0))Error reduces by only a constant factor (⇡ |�
2
/�1
|) each step, andvery slowly especially when |�
2
| ⇡ |�1
|Xiangmin Jiao Numerical Analysis I 21 / 25
Inverse IterationApply power iteration on (A � µI )�1, with eigenvalues {(�
j
� µ)�1}If µ ⇡ �
J
for some J, then (�J
� µ)�1 may be far larger than(�
j
� µ)�1, j 6= J, so power iteration may converge rapidly
Algorithm: Inverse Iterationv (0) =some unit-length vectorfor k = 1, 2, . . .
Solve (A � µI )w = v (k�1) for wv (k) = w/kwk�(k) = r(v (k)) = (v (k))TAv (k)
Converges to eigenvector qJ
if parameter µ is close to �J
kv (k) � (±qJ
)k = O
����µ� �
J
µ� �K
����k
!, |�(k) � �
J
| = O
����µ� �
J
µ� �K
����2k
!
where �J
and �K
are closest and second closest eigenvalues to µStandard method for determining eigenvector given eigenvalue
Xiangmin Jiao Numerical Analysis I 22 / 25
Rayleigh Quotient Iteration
Parameter µ is constant in inverse iteration, but convergence is betterfor µ close to the eigenvalueImprovement: At each iteration, set µ to last computed Rayleighquotient
Algorithm: Rayleigh Quotient Iterationv (0) =some unit-length vector�(0) = r(v (0)) = (v (0))TAv (0)
for k = 1, 2, . . .Solve (A � �(k�1)I )w = v (k�1) for wv (k) = w/kwk�(k) = r(v (k)) = (v (k))TAv (k)
Cost per iteration is linear for tridiagonal matrix
Xiangmin Jiao Numerical Analysis I 23 / 25
Convergence of Rayleigh Quotient Iteration
Cubic convergence in Rayleigh quotient iteration
kv (k+1) � (±qJ
)k = O(kv (k) � (±qJ
)k3)
and|�(k+1) � �
J
| = O⇣|�(k) � �
J
|3⌘
In other words, each iteration triples number of digits of accuracyProof idea: If v (k) is close to an eigenvector, kv (k) � (±q
J
)k ✏, thenaccuracy of Rayleigh quotient estimate �(k) is |�(k) � �
J
| = O(✏2).One step of inverse iteration then gives
kv (k+1) � qJ
k = O(|�(k) � �J
|kv (k) � qJ
k) = O(✏3)
Rayleigh quotient is great in finding largest (or smallest) eigenvalueand its corresponding eigenvector. What if we want to find alleigenvalues?
Xiangmin Jiao Numerical Analysis I 24 / 25
Operation Counts
In Rayleigh quotient iteration,if A 2 Rm⇥m is full matrix, then solving (A � µI )w = v (k�1) may takeO(m3) flops per stepif A 2 Rm⇥m is upper Hessenberg, then each step takes O(m2) flopsif A 2 Rm⇥m is tridiagonal, then each step takes O(m) flops
Xiangmin Jiao Numerical Analysis I 25 / 25