ICS 6N Computational Linear Algebra Symmetric Matrices...
Transcript of ICS 6N Computational Linear Algebra Symmetric Matrices...
ICS 6N Computational Linear AlgebraSymmetric Matrices and Orthogonal Diagonalization
Xiaohui Xie
University of California, Irvine
Xiaohui Xie (UCI) ICS 6N 1 / 21
Symmetric matrices
An n × n matrix A is symmetric if AT = A.
Component wise: A is symmetric if
aij = aji
for i , j = 1, 2, . . . , n
Xiaohui Xie (UCI) ICS 6N 2 / 21
Matrix Diagonalization
Matrix A is diagonalizable if there exists a diagonal matrix Λ such that
A = PΛP−1
If A can be diagonalized, then Ak = PΛkP−1
No all matrices can be diagonalized.
A matrix can be diagonalized if and only if there exists n linearlyindependent eigenvectors.
Some special cases:
If an nxn matrix A has n distinct eigenvalues, then it is diagonalizable.If A is symmetric, then it is diagonalizable.
Xiaohui Xie (UCI) ICS 6N 3 / 21
Diagonalization of symmetric matrices
Example: diagonalize the matrix
A =
6 −2 −1−2 6 −1−1 −1 5
Characteristic equation of A is
0 = −λ3 + 17λ2 − 90λ+ 144 = −(λ− 8)(λ− 6)(λ− 3)
so we have three distinct eigenvalues λ1 = 8, λ2 = 6, λ3 = 3.
Find corresponding eigenvectors
v1 =
−110
, v2 =
−1−12
, v3 =
111
Note that vT1 v2 = 0, vT1 v3 = 0, vT2 v3 = 0, i.e., the eigenvectors aremutually orthogonal.Xiaohui Xie (UCI) ICS 6N 4 / 21
Diagonalization of symmetric matrices
Example: diagonalize the matrix A =
6 −2 −1−2 6 −1−1 −1 5
Further normalize eigenvector to be unit vectors.
u1 =
−1/√
2
1/√
20
, u2 =
−1/√
6
−1/√
6
2/√
6
, u3 =
1/√
3
1/√
3
1/√
3
Let
P =
−1/√
2 −1/√
6 1/√
3
1/√
2 −1/√
6 1/√
3
0 2/√
6 1/√
3
,D =
8 0 00 6 00 0 3
A = PDPT , since P is an orthogonal matrix (P−1 = PT ).
Xiaohui Xie (UCI) ICS 6N 5 / 21
Spectrum theorem
If A is an n × n symmetric matrix
1 All eigenvalues of A are real
2 A has exactly n real eigenvalues (counting for multiplicity). But thisdoesn’t mean they are distinct
3 The geometric multiplicity of λ = dim(Null(A− λI )) = the algebraicmultiplicity of λ
4 The eigenspaces are mutually orthogonal:If λ1 6= λ2 are two distinct eigenvalues, then their correspondingeigenvectors v1, v2 are orthogonal.
Xiaohui Xie (UCI) ICS 6N 6 / 21
Proof
1 Let λ be an eigenvalue of A with corresponding eigenvector x , soAx = λx and Ax∗ = λ∗x∗. Then
λ∗xT x∗ = xTAx∗ = (Ax)T x∗ = λxT x∗.
=⇒ λ∗ = λ, so λ is real.
2 Let x1 and x2 be two eigenvectors corresponding to two distincteigenvalues λ1 and λ2.
xT1 Ax2 = (xT1 Ax2)T = xT2 AT (xT1 )T = xT2 Ax1
=⇒ λ2xT1 x2 = λ1x
T1 x2 =⇒ (λ1 − λ2)(xT1 x2) = 0
Since λ1 6= λ2, (xT1 x2) = 0 so they are orthogonal.
Xiaohui Xie (UCI) ICS 6N 7 / 21
Orthogonal diagonalization
If an n × n matrix A is symmetric, its eigenvectors v1, · · · , vn can bechosen to be orthonormal.
If it has n distinct eigenvalues, then the n eigenvectors are orthogonal.Normalize these vectors to make them orthonormal.If an eigenvalue λ has multiplicity greater than 1, find an orthonormalbasis of the corresponding eigenspace, Null(A− λI), and use vectors inthis basis as eigenvectors.
In this case, P =[v1 v2 . . . vn
]is an orthogonal matrix, that is,
P−1 = PT .
And A can be orthogonally diagonalized
A = PΛPT
Xiaohui Xie (UCI) ICS 6N 8 / 21
Orthogonal diagonalization: an example
Orthogonally diagonalize the matrix A =
3 −2 4−2 6 24 2 3
Characteristic equation:
0 = −λ3 + 12λ2 − 21λ− 98 = −(λ− 7)2(λ+ 2)
Produce bases for the eigenspaces by solving linear equations:
λ = 7 : v1 =
101
v2 =
−1/210
; λ = −2 : v3 =
−1−1/2
1
Apply Gram-Schdmit to produce an orthogonal basis for theeigenspace of λ = 7.
Xiaohui Xie (UCI) ICS 6N 9 / 21
Orthogonal diagonalization: an example
Produce bases for the eigenspaces by solving linear equations:
λ = 7 : v1 =
101
v2 =
−1/210
; λ = −2 : v3 =
−1−1/2
1
Apply Gram-Schdmit to produce orthogonal bases
The component of v2 orthogonal to v1 is
z2 = v2 −v2 · v1v1 · v1
v1 =
−1/41
1/4
Normalize v1, z2
u1 =
1/√
20
1/√
2
, u2 =
−1/√
18
4/√
18
1/√
18
Normalize v3 to obtain u3.
A = PDPT where P = [u1, u2, u3] and D = diag(7, 7,−2).
Xiaohui Xie (UCI) ICS 6N 10 / 21
Application 1: Quadratic Forms
Any quadratic function of x can be expressed in the form of
Q(x) = xTAx
where x is a vector in Rn and A is an nxn symmetric matrix.
More explicitly,
xTAx =n∑
i=1
n∑j=1
aijxixj
Xiaohui Xie (UCI) ICS 6N 11 / 21
Example
For example,
Q(x) = 2x21 + 3x22 + 4x23 + 5x2x3 + 6x1x2
can be written in quadratic form with matrix
A =
2 3 03 3 5
20 5
2 4
Xiaohui Xie (UCI) ICS 6N 12 / 21
Optimizing quadratic functions
Consider the following optimization problem:
max Q(x) = 2x21 + 3x22 + 4x23
subject to ‖x‖ = 1
Xiaohui Xie (UCI) ICS 6N 13 / 21
Optimizing quadratic functions
Consider the following optimization problem (without cross-productterms):
max Q(x) = 2x21 + 3x22 + 4x23
subject to ‖x‖ = 1
Solution:Since 2x21 ≤ 4x21 and 3x2 ≤ 4x22 , we have
Q(x) ≤ 4x21 + 4x22 + 4x23 = 4
In addition, we can choose x1 = 0, x2 = 0, x3 = 1 to reach the maximum.
Xiaohui Xie (UCI) ICS 6N 14 / 21
Optimizing quadratic functions
A more general problem:
max Q(x) = xTAx
subject to ‖x‖ = 1
Xiaohui Xie (UCI) ICS 6N 15 / 21
Optimizing quadratic functions
A more general problem:
max Q(x) = xTAx
subject to ‖x‖ = 1
Solution: Use A = PΛPT to transform the problem into an easier form.
Q(x) = xTPΛPT x = (PT x)TΛ(PT x)
Use y = PT x to change variables. Convert the problem to
max Q(y) = yTΛy = λ1y21 + · · ·+ λny
2n
subject to ‖y‖ = 1
max xTAx subject to ‖x‖ = 1: λmax
{A}
min xTAx subject to ‖x‖ = 1: λmin
{A}
Xiaohui Xie (UCI) ICS 6N 16 / 21
Optimizing quadratic functions: example
max Q(x) = x21 − 8x1x2 − 5x22
subject to ‖x‖ = 1
Xiaohui Xie (UCI) ICS 6N 17 / 21
Optimizing quadratic functions: example
Solution:
The matrix of the quadratic form is
A =
[1 −4−4 −5
]Orthogonally diagonalize A:
P =
[2/√
5 1/√
5
−1/√
5 2/√
5
],D =
[3 00 −7
]Change variables from x to y = PT x , and rewrite the objectivefunction
x21 − 8x1x2 − 5x22 = xTAx = (Py)TA(Py) = yTDy = 3y21 − 7y22
max Q(x) over ‖x‖ = 1 is 3.
Xiaohui Xie (UCI) ICS 6N 18 / 21
Application 2: Principle Component Analysis (PCA)
Problem: Given a set of data points {x (1), x (2), · · · , x (m)} in Rn, find theaxis along which the data points have maximal variation.
Assume the data center around origin. If not, subtract the mean fromeach data point.
Xiaohui Xie (UCI) ICS 6N 19 / 21
Application 2: Principle Component Analysis (PCA)
Problem: Given a set of data points {x (1), x (2), · · · , x (m)} in Rn, find theaxis along which the data points have maximal variance.
Use a unit vector u in Rn denote the direction of the axis.
Project each data point onto u to obtain {y (1), y (2), · · · , y (m)}, wherey (i) = uT x (i).
The variance of projected points
σ2 =1
m
m∑i=1
(y (i))2 =1
m
m∑i=1
uT x (i)(x (i))Tu = uTXu
where matrix X is defined by
X =1
m
m∑i=1
x (i)(x (i))T
called covariance matrix.
Xiaohui Xie (UCI) ICS 6N 20 / 21
Application 2: Principle Component Analysis (PCA)
Problem: Given a set of data points {x (1), x (2), · · · , x (m)} in Rn, find theaxis along which the data points have maximal variance.
Reformulate the problem into a quadratic optimization problem
max uTXu
subject to ‖u‖ = 1
where matrix X = 1m
∑mi=1 x
(i)(x (i))T is the covariance matrix.
Solution: u is the eigenvector corresponding to the largest eigenvalueof X . The resulting y points are called the first principle component.
Xiaohui Xie (UCI) ICS 6N 21 / 21