AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526:...

26
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg and Tridiagonal Forms; Rayleigh Quotient Iteration Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 25

Transcript of AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526:...

Page 1: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

AMS526: Numerical Analysis I(Numerical Linear Algebra)

Lecture 15: Reduction to Hessenberg and Tridiagonal Forms;Rayleigh Quotient Iteration

Xiangmin Jiao

Stony Brook University

Xiangmin Jiao Numerical Analysis I 1 / 25

Page 2: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

Outline

1 Schur Factorization

2 Reduction to Hessenberg and Tridiagonal Forms

3 Rayleigh Quotient Iteration

Xiangmin Jiao Numerical Analysis I 2 / 25

Page 3: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

“Obvious” Algorithms

Most obvious method is to find roots of characteristic polynomialpA

(�), but it is very ill-conditionedAnother idea is power iteration, using fact that

xkxk ,

AxkAxk ,

A2xkA2xk ,

A3xkA3xk , . . .

converge to an eigenvector corresponding to the largest eigenvalue ofA in absolute value, but it may converge very slowly

Instead, compute a eigenvalue-revealing factorization, such as Schurfactorization

A = QTQ⇤

by introducing zeros, using algorithms similar to QR factorization

Xiangmin Jiao Numerical Analysis I 3 / 25

Page 4: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

“Obvious” Algorithms

Most obvious method is to find roots of characteristic polynomialpA

(�), but it is very ill-conditionedAnother idea is power iteration, using fact that

xkxk ,

AxkAxk ,

A2xkA2xk ,

A3xkA3xk , . . .

converge to an eigenvector corresponding to the largest eigenvalue ofA in absolute value, but it may converge very slowly

Instead, compute a eigenvalue-revealing factorization, such as Schurfactorization

A = QTQ⇤

by introducing zeros, using algorithms similar to QR factorization

Xiangmin Jiao Numerical Analysis I 3 / 25

Page 5: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

A Fundamental Difficulty

However, eigenvalue-revealing factorization cannot be done in finitenumber of steps:

Any eigenvalue solver must be iterative

To see this, consider a general polynomial of degree m

p(z) = zm + am�1

zm�1 + · · ·+ a1

z + a0

There is no closed-form expression for the roots of p: (Abel, 1842)In general, the roots of polynomial equations higher than fourthdegree cannot be written in terms of a finite number of operations

Xiangmin Jiao Numerical Analysis I 4 / 25

Page 6: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

A Fundamental Difficulty Cont’d

However, the roots of pA

are the eigenvalues of the companion matrix

A =

2

6666664

0 �a0

1 0 �a1

1 . . . .... . . 0 �a

m�2

1 �am�1

3

7777775

Therefore, in general, we cannot find the eigenvalues of a matrix in afinite number of stepsIn practice, however, there are algorithms that converge to desiredprecision in a few iterations

Xiangmin Jiao Numerical Analysis I 5 / 25

Page 7: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

Schur Factorization and Diagonalization

Most eigenvalue algorithms compute Schur factorization A = QTQ⇤

by transforming A with similarity transformations

Q⇤j

· · ·Q⇤2

Q⇤1| {z }

Q

A Q1

Q2

· · ·Qj| {z }

Q

,

where Qi

are unitary matrices, which converge to T as j ! 1Note: Real matrices might need complex Schur forms and eigenvaluesQuestion: For Hermitian A, what matrix will the sequence convergeto?

Xiangmin Jiao Numerical Analysis I 6 / 25

Page 8: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

Schur Factorization and Diagonalization

Most eigenvalue algorithms compute Schur factorization A = QTQ⇤

by transforming A with similarity transformations

Q⇤j

· · ·Q⇤2

Q⇤1| {z }

Q

A Q1

Q2

· · ·Qj| {z }

Q

,

where Qi

are unitary matrices, which converge to T as j ! 1Note: Real matrices might need complex Schur forms and eigenvaluesQuestion: For Hermitian A, what matrix will the sequence convergeto?

Xiangmin Jiao Numerical Analysis I 7 / 25

Page 9: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

Outline

1 Schur Factorization

2 Reduction to Hessenberg and Tridiagonal Forms

3 Rayleigh Quotient Iteration

Xiangmin Jiao Numerical Analysis I 8 / 25

Page 10: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

Two Phases of Eigenvalue Computations

General A: First convert to upper-Hessenberg form, then to uppertriangular2

6664

⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥

3

7775

A 6=A⇤

Phase 1����!

2

6664

⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥

⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥

⇥ ⇥

3

7775

upper-Hessenberg

Phase 2����!

2

6664

⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥

⇥ ⇥ ⇥⇥ ⇥

3

7775

triangular

Hermitian A: First convert to tridiagonal form, then to diagonal2

6664

⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥

3

7775

A=A⇤

Phase 1����!

2

6664

⇥ ⇥⇥ ⇥ ⇥

⇥ ⇥ ⇥⇥ ⇥ ⇥

⇥ ⇥

3

7775

tridiagonal

Phase 2����!

2

6664

⇥⇥

⇥⇥

3

7775

diagonal

In general, phase 1 is direct and requires O(m3) flops, and phase 2 isiterative and requires O(m) iterations, and O(m3) flops fornon-Hermitian matrices and O(m2) flops for Hermitian matrices

Xiangmin Jiao Numerical Analysis I 9 / 25

Page 11: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

Introducing Zeros by Similarity Transformations

First attempt: Compute Schur factorization A = QTQ⇤ by applyingHouseholder reflectors from both left and right

2

6664

⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥

3

7775

A

Q

⇤1 ·

��!

2

6664

x x x x x

0 x x x x

0 x x x x

0 x x x x

0 x x x x

3

7775

Q⇤1 A

·Q1��!

2

6664

x x x x x

x x x x x

x x x x x

x x x x x

x x x x x

3

7775

Q⇤1 AQ1

Unfortunately, the right multiplication destroys the zeros introduced byQ⇤

1

This would not work because of Abel’s theoremHowever, the subdiagonal entries typically decrease in magnitude

Xiangmin Jiao Numerical Analysis I 10 / 25

Page 12: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

The Hessenberg Form

Second attempt: try to compute upper Hessenberg matrix H similar toA:

2

6664

⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥

3

7775

A

Q

⇤1 ·

��!

2

6664

⇥ ⇥ ⇥ ⇥ ⇥x x x x x

0 x x x x

0 x x x x

0 x x x x

3

7775

Q⇤1 A

·Q1��!

2

6664

⇥ x x x x

⇥ x x x x

x x x x

x x x x

x x x x

3

7775

Q⇤1 AQ1

The zeros introduced by Q⇤1

A were not destroyed this time!Continue with remaining columns would result in Hessenberg form:

Q

⇤2 ·

��!

2

6664

⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥

x x x x

0 x x x

0 x x x

3

7775

Q⇤2 Q⇤

1 AQ1

·Q2��!

2

6664

⇥ ⇥ x x x

⇥ ⇥ x x x

⇥ x x x

x x x

x x x

3

7775

Q⇤2 Q⇤

1 AQ1Q2

· · ·

Xiangmin Jiao Numerical Analysis I 11 / 25

Page 13: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

The Hessenberg Form

After m � 2 steps, we obtain the Hessenberg form:

Q⇤m�2

· · ·Q⇤2

Q⇤1| {z }

Q

A Q1

Q2

· · ·Qm�2| {z }

Q

= H =

2

66664

⇥ ⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥ ⇥ ⇥

⇥ ⇥ ⇥ ⇥⇥ ⇥ ⇥

⇥ ⇥

3

77775

For Hermitian matrix A, H is Hermitian and hence is tridiagonal

Xiangmin Jiao Numerical Analysis I 12 / 25

Page 14: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

Householder Reduction to Hessenberg

Householder Reduction to Hessenberg Formfor k = 1 to m � 2

x = Ak+1:m,k

vk

= sign(x1

)kxk2

e1

+ xvk

= vk

/kvk

k2

Ak+1:m,k:m = A

k+1:m,k:m � 2vk

(v⇤k

Ak+1:m,k:m)

A1:m,k+1:m = A

1:m,k+1:m � 2(A1:m,k+1:mv

k

)v⇤k

Note: Q is never formed explicitly.Operation count

⇠m�2X

k=1

4(m � k)2 + 4m(m � k) ⇠ 4m3/3 + 4m3 � 4m3/2 = 10m3/3

Xiangmin Jiao Numerical Analysis I 13 / 25

Page 15: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

Reduction to Tridiagonal FormIf A is Hermitian, then

Q⇤m�2

· · ·Q⇤2

Q⇤1| {z }

Q

A Q1

Q2

· · ·Qm�2| {z }

Q

= H =

2

666664

⇥ ⇥⇥ ⇥ ⇥

. . . . . . . . .⇥ ⇥ ⇥

⇥ ⇥

3

777775

For Hermitian A, operation count would be same as Householder QR:4m3/3

IFirst, taking advantage of sparsity, cost of applying right reflectors is

also 4(m � k)k instead of 4m(m � k), so cost is

⇠m�2X

k=1

8(m � k)2 ⇠ 8m

3/3

ISecond, taking advantage of symmetry, cost is reduced by 50% to

4m

3/3

Xiangmin Jiao Numerical Analysis I 14 / 25

Page 16: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

Stability of Hessenberg Reduction

Theorem

Householder reduction to Hessenberg form is backward stable, in that

QHQ⇤ = A + �A,k�AkkAk = O(✏machine)

for some �A 2 Cm⇥m

Note: Similar to Householder QR, Q is exactly unitary based on some vk

Xiangmin Jiao Numerical Analysis I 15 / 25

Page 17: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

Outline

1 Schur Factorization

2 Reduction to Hessenberg and Tridiagonal Forms

3 Rayleigh Quotient Iteration

Xiangmin Jiao Numerical Analysis I 16 / 25

Page 18: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

Solving Eigenvalue Problems

All eigenvalue solvers must be iterativeIterative algorithms have multiple facets:

1Basic idea behind the algorithms

2Convergence and techniques to speed-up convergence

3Efficiency of implementation

4Termination criteria

We will focus on first two aspects

Xiangmin Jiao Numerical Analysis I 17 / 25

Page 19: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

Simplification: Real Symmetric Matrices

We will consider eigenvalue problems for real symmetric matrices, i.e.A = AT 2 Rm⇥m, and Ax = �x for x 2 Rm

INote: x

⇤ = x

T, and kxk =

px

Tx

A has real eigenvalues �1

,�2

, . . . , �m

and orthonormal eigenvectorsq1

, q2

, . . . , qm

, where kqj

k = 1Eigenvalues are often also ordered in a particular way (e.g., orderedfrom large to small in magnitude)In addition, we focus on symmetric tridiagonal form

IWhy? Because phase 1 of two-phase algorithm reduces matrix into

tridiagonal form

Xiangmin Jiao Numerical Analysis I 18 / 25

Page 20: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

Rayleigh Quotient

The Rayleigh quotient of x 2 Rm is the scalar

r(x) =xTAxxT x

For an eigenvector x , its Rayleigh quotient is r(x) = xT�x/xT x = �,the corresponding eigenvalue of xFor general x , r(x) = ↵ that minimizes kAx � ↵xk

2

.x is eigenvector of A() rr(x) = 2

x

T

x

(Ax � r(x)x) = 0 with x 6= 0r(x) is smooth and rr(q

j

) = 0 for any j , and therefore isquadratically accurate:

r(x)� r(qJ

) = O(kx � qJ

k2) as x ! qJ

for some J

Xiangmin Jiao Numerical Analysis I 19 / 25

Page 21: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

Power Iteration

Simple power iteration for largest eigenvalue

Algorithm: Power Iterationv (0) =some unit-length vectorfor k = 1, 2, . . .

w = Av (k�1)

v (k) = w/kwk�(k) = r(v (k)) = (v (k))TAv (k)

Termination condition is omitted for simplicity

Xiangmin Jiao Numerical Analysis I 20 / 25

Page 22: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

Convergence of Power IterationExpand initial v (0) in orthonormal eigenvectors q

i

, and apply Ak :

v (0) = a1

q1

+ a2

q2

+ · · ·+ am

qm

v (k) = ck

Akv (0)

= ck

(a1

�k

1

q1

+ a2

�k

2

q2

+ · · ·+ am

�k

m

qm

)

= ck

�k

1

(a1

q1

+ a2

(�2

/�1

)kq2

+ · · ·+ am

(�m

/�1

)kqm

)

If |�1

| > |�2

| � · · · � |�m

| � 0 and qT

1

v (0) 6= 0, this gives

kv (k) � (±q1

)k = O⇣|�

2

/�1

|k⌘, |�(k) � �

1

| = O⇣|�

2

/�1

|2k

where ± sign is chosen to be sign of qT

1

v (k)

It finds the largest eigenvalue (unless eigenvector is orthogonal to v (0))Error reduces by only a constant factor (⇡ |�

2

/�1

|) each step, andvery slowly especially when |�

2

| ⇡ |�1

|Xiangmin Jiao Numerical Analysis I 21 / 25

Page 23: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

Inverse IterationApply power iteration on (A � µI )�1, with eigenvalues {(�

j

� µ)�1}If µ ⇡ �

J

for some J, then (�J

� µ)�1 may be far larger than(�

j

� µ)�1, j 6= J, so power iteration may converge rapidly

Algorithm: Inverse Iterationv (0) =some unit-length vectorfor k = 1, 2, . . .

Solve (A � µI )w = v (k�1) for wv (k) = w/kwk�(k) = r(v (k)) = (v (k))TAv (k)

Converges to eigenvector qJ

if parameter µ is close to �J

kv (k) � (±qJ

)k = O

����µ� �

J

µ� �K

����k

!, |�(k) � �

J

| = O

����µ� �

J

µ� �K

����2k

!

where �J

and �K

are closest and second closest eigenvalues to µStandard method for determining eigenvector given eigenvalue

Xiangmin Jiao Numerical Analysis I 22 / 25

Page 24: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

Rayleigh Quotient Iteration

Parameter µ is constant in inverse iteration, but convergence is betterfor µ close to the eigenvalueImprovement: At each iteration, set µ to last computed Rayleighquotient

Algorithm: Rayleigh Quotient Iterationv (0) =some unit-length vector�(0) = r(v (0)) = (v (0))TAv (0)

for k = 1, 2, . . .Solve (A � �(k�1)I )w = v (k�1) for wv (k) = w/kwk�(k) = r(v (k)) = (v (k))TAv (k)

Cost per iteration is linear for tridiagonal matrix

Xiangmin Jiao Numerical Analysis I 23 / 25

Page 25: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

Convergence of Rayleigh Quotient Iteration

Cubic convergence in Rayleigh quotient iteration

kv (k+1) � (±qJ

)k = O(kv (k) � (±qJ

)k3)

and|�(k+1) � �

J

| = O⇣|�(k) � �

J

|3⌘

In other words, each iteration triples number of digits of accuracyProof idea: If v (k) is close to an eigenvector, kv (k) � (±q

J

)k ✏, thenaccuracy of Rayleigh quotient estimate �(k) is |�(k) � �

J

| = O(✏2).One step of inverse iteration then gives

kv (k+1) � qJ

k = O(|�(k) � �J

|kv (k) � qJ

k) = O(✏3)

Rayleigh quotient is great in finding largest (or smallest) eigenvalueand its corresponding eigenvector. What if we want to find alleigenvalues?

Xiangmin Jiao Numerical Analysis I 24 / 25

Page 26: AMS526: Numerical Analysis I (Numerical Linear …jiao/teaching/ams526_fall14/lectures/...AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 15: Reduction to Hessenberg

Operation Counts

In Rayleigh quotient iteration,if A 2 Rm⇥m is full matrix, then solving (A � µI )w = v (k�1) may takeO(m3) flops per stepif A 2 Rm⇥m is upper Hessenberg, then each step takes O(m2) flopsif A 2 Rm⇥m is tridiagonal, then each step takes O(m) flops

Xiangmin Jiao Numerical Analysis I 25 / 25