Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization...

Chapter 28 – Part II

Matrix Operations

Matrix Operations Gaussian elimination LU factorization Gaussian elimination with partial

pivoting LUP factorization Error analysis Complexity of matrix multiplication

& inversion SVD and PCA SVD and PCA applications

Project 2 – image compression using SVD

Eigenvalues & Eigenvectors

Eigenvectors (for a square mm matrix S)

How many eigenvalues are there at most?

only has a non-zero solution if

this is a m-th order equation in λ which can have at most m distinct solutions (roots of the characteristic polynomial) – can be complex even though S is real.

eigenvalue(right) eigenvector

Example

Some of these slides are adapted from notes of Stanford CS276 & CMU CS385

Matrix-vector multiplication

000

020

003

S has eigenvalues 3, 2, 0 withcorresponding eigenvectors

0

0

1

1v

0

1

0

2v

1

0

0

3v

On each eigenvector, S acts as a multiple of the identity matrix: but as a different multiple on each.

Any vector (say x= ) can be viewed as a combination of the eigenvectors:

x = 2v1 + 4v2 + 6v3

6

4

2

Matrix vector multiplication

Thus a matrix-vector multiplication such as Sx can be rewritten in terms of the eigenvalues/vectors:

Even though x is an arbitrary vector, the action of S on x is determined by the eigenvalues/vectors.

Suggestion: the effect of “small” eigenvalues is small.

332211321

321

642642

)642(

vvvSvSvSvSx

vvvSSx

Eigenvalues & Eigenvectors

0 and , 2121}2,1{}2,1{}2,1{ vvvSv

- For symmetric matrices, eigenvectors for distincteigenvalues are orthogonal

TSS and 0 if ,complex for IS

- All eigenvalues of a real symmetric matrix are real.

0vSv if then ,0, Swww Tn

- All eigenvalues of a positive semidefinite matrixare non-negative

?

?

?

Example

Let

Then

The eigenvalues are 1 and 3 (nonnegative, real).

The eigenvectors are orthogonal (and real):

21

12S

.01)2(21

12 2

IS

1

1

1

1

Real, symmetric.

Let be a square matrix with m linearly independent eigenvectors

Theorem: Exists an eigen decomposition

Columns of U are eigenvectors of S Diagonal elements of are

eigenvalues of

Eigen/diagonal Decomposition

Diagonal decomposition (2)

nvvU ...1Let U have the eigenvectors as columns:

n

nnnn vvvvvvSSU

............

1

1111

Then, SU can be written

And S=UU–1.

Thus SU=U, or U–1SU=

Diagonal decomposition - example

Recall .3,1;

21

1221

S

The eigenvectors:

1

1

1

1

11

11U

Inverting U,

2/12/1

2/12/11U

Then, S=UU–1 =

2/12/1

2/12/1

30

01

11

11

Example continued

Let’s divide U (and multiply U–1 ) by 2

2/12/1

2/12/1

30

01

2/12/1

2/12/1Then, S=

Q (Q-1= QT )

If is a symmetric matrix: Theorem: Exists a (unique) eigen

decomposition where Q is orthogonal:

Q-1= QT

Columns of Q are normalized eigenvectors

Columns are orthogonal. (everything is real)

Symmetric Eigen Decomposition

TQQS

Exercise

Examine the symmetric eigen decomposition, if any, for each of the following matrices:

01

10

01

10

32

21

42

22

Singular Value Decomposition

TVUA

mm mn V is nn

For an m n matrix A of rank r there exists a factorization(Singular Value Decomposition = SVD) as follows:

The columns of U are orthogonal eigenvectors of AAT.

The columns of V are orthogonal eigenvectors of ATA.

ii

rdiag ...1 Singular values.

Eigenvalues 1 … r of AAT are the eigenvalues of ATA.

Singular Value Decomposition

Illustration of SVD dimensions and sparseness

SVD example

Let

01

10

11

A

Thus m=3, n=2. Its SVD is

2/12/1

2/12/1

00

30

01

3/16/12/1

3/16/12/1

3/16/20

Typically, the singular values arranged in decreasing order.…

SVD can be used to compute optimal low-rank approximations.

Approximation problem: Find Ak of rank k such that

Ak and X are both mn matrices.

Typically, want k << r.

Low-rank Approximation

Frobenius norm = 2-normF

kXrankXk XAA

min

)(:

Solution via SVD

Low-rank Approximation

set smallest r-k singular values to zero

Tkk VUA )0,...,0,,...,(diag 1

column notation: sum of rank 1 matrices

Tii

k

i ik vuA

1

k

Approximation error

How good (bad) is this approximation?

It’s the best possible, measured by the Frobenius norm of the error:

where the i are ordered such that i i+1.

Suggests why Frobenius error drops as k increased.

) :shown becan (It

,min

12

)(:

kk

FkFkXrankX

AA

AAXA

SVD Application

Image compression

SVD example

Eigen values of A’A: ?

Eigen vector?

Matrix U?

SV

D e

xam

ple

--

Acti

on

of

A o

n a

un

it

cir

cle

apply

Vx U

PCA

Principal Components Analysis

Data Presentation

Example: 53 Blood and urine measurements (wet chemistry) from 65 people (33 alcoholics, 32 non-alcoholics).

Matrix Format

Spectral Format

H - W B C H - R B C H - H g b H - H c t H - M C V H - M C H H - M C H CH - M C H C

A 1 8 . 0 0 0 0 4 . 8 2 0 0 1 4 . 1 0 0 0 4 1 . 0 0 0 0 8 5 . 0 0 0 0 2 9 . 0 0 0 0 3 4 . 0 0 0 0

A 2 7 . 3 0 0 0 5 . 0 2 0 0 1 4 . 7 0 0 0 4 3 . 0 0 0 0 8 6 . 0 0 0 0 2 9 . 0 0 0 0 3 4 . 0 0 0 0

A 3 4 . 3 0 0 0 4 . 4 8 0 0 1 4 . 1 0 0 0 4 1 . 0 0 0 0 9 1 . 0 0 0 0 3 2 . 0 0 0 0 3 5 . 0 0 0 0

A 4 7 . 5 0 0 0 4 . 4 7 0 0 1 4 . 9 0 0 0 4 5 . 0 0 0 0 1 0 1 . 0 0 0 0 3 3 . 0 0 0 0 3 3 . 0 0 0 0

A 5 7 . 3 0 0 0 5 . 5 2 0 0 1 5 . 4 0 0 0 4 6 . 0 0 0 0 8 4 . 0 0 0 0 2 8 . 0 0 0 0 3 3 . 0 0 0 0

A 6 6 . 9 0 0 0 4 . 8 6 0 0 1 6 . 0 0 0 0 4 7 . 0 0 0 0 9 7 . 0 0 0 0 3 3 . 0 0 0 0 3 4 . 0 0 0 0

A 7 7 . 8 0 0 0 4 . 6 8 0 0 1 4 . 7 0 0 0 4 3 . 0 0 0 0 9 2 . 0 0 0 0 3 1 . 0 0 0 0 3 4 . 0 0 0 0

A 8 8 . 6 0 0 0 4 . 8 2 0 0 1 5 . 8 0 0 0 4 2 . 0 0 0 0 8 8 . 0 0 0 0 3 3 . 0 0 0 0 3 7 . 0 0 0 0

A 9 5 . 1 0 0 0 4 . 7 1 0 0 1 4 . 0 0 0 0 4 3 . 0 0 0 0 9 2 . 0 0 0 0 3 0 . 0 0 0 0 3 2 . 0 0 0 0

0 10 20 30 40 50 600100200300400500600700800900

1000

measurement

Val

ue

Measurement

0 10 20 30 40 50 60 700

0.20.40.60.811.21.41.61.8

Person

H-B

ands

0 50 150 250 350 45050100150200250300350400450500550

C-Triglycerides

C-L

DH

0100

200300

400500

0200

4006000

1

2

3

4

C-TriglyceridesC-LDH

M-E

PI

Univariate Bivariate

Trivariate

Data Presentation

Data Presentation

Better presentation than ordinate axes? Do we need a 53 dimension space to view

data? How to find the ‘best’ low dimension

space that conveys maximum useful information?

One answer: Find “Principal Components”

Principal Components

First PC is direction of maximum variance from origin

Subsequent PCs are orthogonal to 1st PC and describe maximum residual variance

0 5 10 15 20 25 300

5

10

15

20

25

30

Wavelength 1

Wa

vele

ng

th 2

0 5 10 15 20 25 300

5

10

15

20

25

30

Wavelength 1

Wa

vele

ng

th 2

PC 1

PC 2

We wish to explain/summarize the underlying variance-covariance structure of a large set of variables through a few linear combinations of these variables.

The Goal

Applications

Uses: Data Visualization Data Reduction Data Classification Trend Analysis Factor Analysis Noise Reduction

Examples: How many unique “sub-sets” are

in the sample? How are they similar / different? What are the underlying factors

that influence the samples? Which time / temporal trends are

(anti)correlated? Which measurements are needed

to differentiate? How to best present what is

“interesting”? Which “sub-set” does this new

sample rightfully belong?

This is accomplished by rotating the axes.

X1

X2

Trick: Rotate Coordinate Axes

Suppose we have a population measured on p random variables X1,…,Xp. Note that these random variables represent the p-axes of the Cartesian coordinate system in which the population resides. Our goal is to develop a new set of p axes (linear combinations of the original p axes) in the directions of greatest variability:

PCA: General

From k original variables: x1,x2,...,xk:

Produce k new variables: y1,y2,...,yk:

y1 = a11x1 + a12x2 + ... + a1kxk

y2 = a21x1 + a22x2 + ... + a2kxk

...yk = ak1x1 + ak2x2 + ... + akkxk

PCA: GeneralFrom k original variables: x1,x2,...,xk:


y1 = a11x1 + a12x2 + ... + a1kxk

y2 = a21x1 + a22x2 + ... + a2kxk

...yk = ak1x1 + ak2x2 + ... + akkxk

such that:

yk's are uncorrelated (orthogonal)

y1 explains as much as possible of original variance in data set

y2 explains as much as possible of remaining variance

etc.

4.0 4.5 5.0 5.5 6.02

3

4

5

1st Principal Component, y1

2nd Principal Component, y2

PCA Scores

4.0 4.5 5.0 5.5 6.02

3

4

5

xi2

xi1

yi,1 yi,2

PCA Eigenvalues

4.0 4.5 5.0 5.5 6.02

3

4

5

λ1λ2

PCA: Another Explanation

From k original variables: x1,x2,...,xk:


y1 = a11x1 + a12x2 + ... + a1kxk

y2 = a21x1 + a22x2 + ... + a2kxk

...yk = ak1x1 + ak2x2 + ... + akkxk

such that:

yk's are uncorrelated (orthogonal)

y1 explains as much as possible of original variance in data set

y2 explains as much as possible of remaining variance

etc.

yk's arePrincipal Components

Example Apple Samsung Nokia

student1 1 2 1student2 4 2 13student3 7 8 1student4 8 4 5

Mean545

-mean Apple Samsung Nokiastudent1 -4 -2 -4student2 -1 -2 8student3 2 4 -4student4 3 0 0



Mean545


Sample co-variance matrix: S=B’B/(n-1) =

PCA of B finding eigenvalues and eigenvectors of S: S=VDV’

V= D=

1st principle component: [-0.0740, -0.3030, 0.9501]’, y1 = -0.0740A-0.3030S+0.9501N2nd principle component: [0.8193 0.5247 0.2312]’, y2 …3rd principle component: [0.5686 -0.7955 -0.2094]’, y3 …

Total variance = tr(D) = 50 = tr(S)

10 6 06 8 -80 -8 32

0.5686 0.8193 -0.0740-0.7955 0.5247 -0.3030-0.2094 0.2312 0.9501

1.6057 0 00 13.843 00 0 34.5513



Mean545


Sample co-variance matrix: S=B’B/(n-1) =

PCA of B finding eigenvalues and eigenvectors of S: S=VDV’

V= D=

1st principle component: [-0.0740, -0.3030, 0.9501], y1 = -0.0740A-0.3030S+0.9501N2nd principle component: [0.8193 0.5247 0.2312], y2 …3rd principle component: [0.5686 -0.7955 -0.2094], y3 …

Total variance = tr(D) = 50 = tr(S)

10 6 06 8 -80 -8 32

0.5686 0.8193 -0.0740-0.7955 0.5247 -0.3030-0.2094 0.2312 0.9501

1.6057 0 00 13.843 00 0 34.5513

New view of the data y1 = -0.0740A-0.3030S+0.9501N, … C = BV =

C y3 y2 y1student1 0.1542 -5.2514 -2.8984student2 -0.6528 -0.0191 8.2808student3 -1.2072 2.8126 -5.1604student4 1.7058 2.4579 -0.222

y1

y2

x1

x2

Apple Samsung Nokiastudent1 1 2 1student2 4 2 13student3 7 8 1student4 8 4 5

Another look of the DataVar^-1*B A S Nstudent1 1.7321 3.4641 1.7321student2 0.6827 0.3413 2.2186student3 1.8489 2.1131 0.2641student4 3.8431 1.9215 2.4019

B Apple Samsung Nokiastudent1 1 2 1student2 4 2 13student3 7 8 1student4 8 4 5

Make all variances = 1

-mean A S Nstudent1 -0.2946 1.5041 0.077925student2 -1.344 -1.6187 0.564425student3 -0.1778 0.1531 -1.39008student4 1.8164 -0.0385 0.747725

S=VDV’, lambda=[0.5873; 1.4916; 2.2370];C y3 y2 y1student1 0.8888 0.9583 -0.8043student2 0.2289 -0.5312 2.1001student3 -0.9428 1.048 -0.01student4 -0.1749 -1.4751 -1.2858

y1

y2

Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization...

Documents

Transcript of Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization...