SVD and PCASVD and PCA Derek Onken and Li Xiong January 29, 2018 2 Feature Extraction Create new...

SVD and PCA

Derek Onken and Li Xiong

January 29, 2018 2

Feature Extraction

Create new features (attributes) by combining/mapping existing ones

Common methods

Principle Component Analysis

Singular Value Decomposition

Other compression methods (time-frequency analysis)

Fourier transform (e.g. time series)

Discrete Wavelet Transform (e.g. 2D images)

January 29, 2018 3

Principle component analysis: find the dimensions that capture the most variance

A linear mapping of the data to a new coordinate system such that the greatest variance lies on the first coordinate (the first principal component), the second greatest variance on the second coordinate, and so on.

Steps

Normalize input data: each attribute falls within the same range

Compute k orthonormal (unit) vectors, i.e., principal components -each input data (vector) is a linear combination of the k principal component vectors

The principal components are sorted in order of decreasing “significance”

Weak components can be eliminated, i.e., those with low variance

Principal Component Analysis (PCA)

Dimensionality Reduction: PCA

Mathematically

Compute the covariance matrix

Find the eigenvectors of the covariance matrix correspond to large eigenvalues

Y

X

v

PCA: Illustrative Example

5


6


7


8


9

Eigen Decomposition

How the eigenvalues and eigenvectors create a Matrix decomposition.• Q is a matrix consisting of the eigenvectors• Λ is the diagonal matrix containing all the eigenvalues

Singular Value Decomposition (SVD)

Similarity of Eigen and SVD

Columns of Q are eigenvectors

Λ contains eigenvalues

Columns of u are left-singular vectors

Columns of v are right-singular vects

Σ contains ordered singular values 𝜎𝑖

A must be square and we defined A as A=MTM.

The vj are eigenvectors of MTM.

The ui are eigenvectors of MMT.

The eigenvalues are squares of the singular values. (𝜆𝑖 = 𝜎𝑖2)

AN APPLICATION EXAMPLE…..

FROM::

Dimensionality Reduction:SVD & CUR

CS246: Mining Massive DatasetsJure Leskovec, Stanford University

http://cs246.stanford.edu

SVD - Properties

It is always possible to decompose a real matrix A into A = U VT , where

U, , V: unique

U, V: column orthonormal

UT U = I; VT V = I (I: identity matrix)

(Columns are orthogonal unit vectors)

: diagonal

Entries (singular values) are positive, and sorted in decreasing order (σ1 σ2 ... 0)

1/29/2018Jure Leskovec, Stanford CS246:

Mining Massive Datasets15

Nice proof of uniqueness: http://www.mpi-inf.mpg.de/~bast/ir-seminar-ws04/lecture2.pdf

http://www.mpi-inf.mpg.de/~bast/ir-seminar-ws04/lecture2.pdf

SVD – Example: Users-to-Movies

Consider a matrix. What does SVD do?



=

SciFi

Romance

Matr

ix

Alie

n

Sere

nity

Casa

bla

nca

Am

elie

1 1 1 0 0

3 3 3 0 0

4 4 4 0 0

5 5 5 0 0

0 2 0 4 4

0 0 0 5 5

0 1 0 2 2

m

n

U

VT

“Concepts” AKA Latent dimensionsAKA Latent factors


A = U VT - example: Users to Movies



=

SciFi

x x

Matr

ix

Alie

n

Sere

nity

Casa

bla

nca

Am

elie

1 1 1 0 0

3 3 3 0 0

4 4 4 0 0

5 5 5 0 0

0 2 0 4 4

0 0 0 5 5

0 1 0 2 2

0.13 0.02 -0.01

0.41 0.07 -0.03

0.55 0.09 -0.04

0.68 0.11 -0.05

0.15 -0.59 0.65

0.07 -0.73 -0.67

0.07 -0.29 0.32

12.4 0 0

0 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.09

0.12 -0.02 0.12 -0.69 -0.69

0.40 -0.80 0.40 0.09 0.09

Romance


A = U VT - example: Users to Movies



SciFi-concept

Romance-concept

=

SciFi

x x

Matr

ix

Alie

n

Sere

nity

Casa

bla

nca

Am

elie

1 1 1 0 0

3 3 3 0 0

4 4 4 0 0

5 5 5 0 0

0 2 0 4 4

0 0 0 5 5

0 1 0 2 2

0.13 0.02 -0.01

0.41 0.07 -0.03

0.55 0.09 -0.04

0.68 0.11 -0.05

0.15 -0.59 0.65

0.07 -0.73 -0.67

0.07 -0.29 0.32

12.4 0 0

0 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.09

0.12 -0.02 0.12 -0.69 -0.69

0.40 -0.80 0.40 0.09 0.09

Romance


A = U VT - example:



Romance-concept

U is “user-to-concept” factor matrix

SciFi-concept

=

SciFi

x x

Matr

ix

Alie

n

Sere

nity

Casa

bla

nca

Am

elie

1 1 1 0 0

3 3 3 0 0

4 4 4 0 0

5 5 5 0 0

0 2 0 4 4

0 0 0 5 5

0 1 0 2 2

0.13 0.02 -0.01

0.41 0.07 -0.03

0.55 0.09 -0.04

0.68 0.11 -0.05

0.15 -0.59 0.65

0.07 -0.73 -0.67

0.07 -0.29 0.32

12.4 0 0

0 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.09

0.12 -0.02 0.12 -0.69 -0.69

0.40 -0.80 0.40 0.09 0.09

Romance


A = U VT - example:



SciFi

SciFi-concept

“strength” of the SciFi-concept

=

SciFi

x x

Matr

ix

Alie

n

Sere

nity

Casa

bla

nca

Am

elie

1 1 1 0 0

3 3 3 0 0

4 4 4 0 0

5 5 5 0 0

0 2 0 4 4

0 0 0 5 5

0 1 0 2 2

0.13 0.02 -0.01

0.41 0.07 -0.03

0.55 0.09 -0.04

0.68 0.11 -0.05

0.15 -0.59 0.65

0.07 -0.73 -0.67

0.07 -0.29 0.32

12.4 0 0

0 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.09

0.12 -0.02 0.12 -0.69 -0.69

0.40 -0.80 0.40 0.09 0.09

Romance


A = U VT - example:



SciFi-concept

V is “movie-to-concept”factor matrix

SciFi-concept

=

SciFi

x x

Matr

ix

Alie

n

Sere

nity

Casa

bla

nca

Am

elie

1 1 1 0 0

3 3 3 0 0

4 4 4 0 0

5 5 5 0 0

0 2 0 4 4

0 0 0 5 5

0 1 0 2 2

0.13 0.02 -0.01

0.41 0.07 -0.03

0.55 0.09 -0.04

0.68 0.11 -0.05

0.15 -0.59 0.65

0.07 -0.73 -0.67

0.07 -0.29 0.32

12.4 0 0

0 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.09

0.12 -0.02 0.12 -0.69 -0.69

0.40 -0.80 0.40 0.09 0.09

Romance



SVD - Interpretation #1

‘movies’, ‘users’ and ‘concepts’: U: user-to-concept matrix

V: movie-to-concept matrix

: its diagonal elements: ‘strength’ of each concept

SVD – Best Low Rank Approx.

Fact: SVD gives ‘best’ axis to project on:

‘best’ = minimizing the sum of reconstruction errors



A U

Sigma

VT=

B U

Sigma

VT=

B is best approximation of A:

𝐴 − 𝐵 𝐹 =

𝑖𝑗

𝐴𝑖𝑗 − 𝐵𝑖𝑗2

Example of SVD

Case study: How to query?

Q: Find users that like ‘Matrix’

A: Map query into a ‘concept space’ – how?



=

SciFi

Romnce

x x

Matr

ix

Alie

n

Sere

nity

Casa

bla

nca

Am

elie

1 1 1 0 0

3 3 3 0 0

4 4 4 0 0

5 5 5 0 0

0 2 0 4 4

0 0 0 5 5

0 1 0 2 2

0.13 0.02 -0.01

0.41 0.07 -0.03

0.55 0.09 -0.04

0.68 0.11 -0.05

0.15 -0.59 0.65

0.07 -0.73 -0.67

0.07 -0.29 0.32

12.4 0 0

0 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.09

0.12 -0.02 0.12 -0.69 -0.69

0.40 -0.80 0.40 0.09 0.09






5 0 0 0 0

q =

Matrix

Alie

n

v1

q

v2

Matr

ix

Alie

n

Sere

nity

Casa

bla

nca

Am

elie

Project into concept space:

Inner product with each

‘concept’ vector vi






v1

q

q*v1

5 0 0 0 0

Matr

ix

Alie

n

Sere

nity

Casa

bla

nca

Am

elie

v2

Matrix

Alie

n

q =

Project into concept space:

Inner product with each

‘concept’ vector vi


Compactly, we have:

qconcept = q V

E.g.:



movie-to-conceptfactors (V)

=

SciFi-concept

5 0 0 0 0

Matr

ix

Alie

n

Sere

nity

Casa

bla

nca

Am

elie

q =

0.56 0.12

0.59 -0.02

0.56 0.12

0.09 -0.69

0.09 -0.69

x 2.8 0.6


How would the user d that rated (‘Alien’, ‘Serenity’) be handled?dconcept = d V

E.g.:



movie-to-conceptfactors (V)

=

SciFi-concept

0 4 5 0 0

Matr

ix

Alie

n

Sere

nity

Casa

bla

nca

Am

elie

q =

0.56 0.12

0.59 -0.02

0.56 0.12

0.09 -0.69

0.09 -0.69

x 5.2 0.4


Observation: User d that rated (‘Alien’, ‘Serenity’) will be similar to user qthat rated (‘Matrix’), although d and q have zero ratings in common!



0 4 5 0 0

d =

SciFi-concept

5 0 0 0 0

q =

Matr

ix

Alie

n

Sere

nity

Casa

bla

nca

Am

elie

Zero ratings in common Similarity > 0

2.8 0.6

5.2 0.4

SVD: Drawbacks

+ Optimal low-rank approximationin terms of Frobenius norm

- Interpretability problem: A singular vector specifies a linear

combination of all input columns or rows

- Lack of sparsity: Singular vectors are dense!



=

U

VT

SVD and PCASVD and PCA Derek Onken and Li Xiong January 29, 2018 2 Feature Extraction Create new...

Documents

Transcript of SVD and PCASVD and PCA Derek Onken and Li Xiong January 29, 2018 2 Feature Extraction Create new...