SVD and PCASVD and PCA Derek Onken and Li Xiong January 29, 2018 2 Feature Extraction Create new...

31
SVD and PCA Derek Onken and Li Xiong

Transcript of SVD and PCASVD and PCA Derek Onken and Li Xiong January 29, 2018 2 Feature Extraction Create new...

  • SVD and PCA

    Derek Onken and Li Xiong

  • January 29, 2018 2

    Feature Extraction

    Create new features (attributes) by combining/mapping existing ones

    Common methods

    Principle Component Analysis

    Singular Value Decomposition

    Other compression methods (time-frequency analysis)

    Fourier transform (e.g. time series)

    Discrete Wavelet Transform (e.g. 2D images)

  • January 29, 2018 3

    Principle component analysis: find the dimensions that capture the most variance

    A linear mapping of the data to a new coordinate system such that the greatest variance lies on the first coordinate (the first principal component), the second greatest variance on the second coordinate, and so on.

    Steps

    Normalize input data: each attribute falls within the same range

    Compute k orthonormal (unit) vectors, i.e., principal components -each input data (vector) is a linear combination of the k principal component vectors

    The principal components are sorted in order of decreasing “significance”

    Weak components can be eliminated, i.e., those with low variance

    Principal Component Analysis (PCA)

  • Dimensionality Reduction: PCA

    Mathematically

    Compute the covariance matrix

    Find the eigenvectors of the covariance matrix correspond to large eigenvalues

    Y

    X

    v

  • PCA: Illustrative Example

    5

  • PCA: Illustrative Example

    6

  • PCA: Illustrative Example

    7

  • PCA: Illustrative Example

    8

  • PCA: Illustrative Example

    9

  • Eigen Decomposition

    How the eigenvalues and eigenvectors create a Matrix decomposition.• Q is a matrix consisting of the eigenvectors• Λ is the diagonal matrix containing all the eigenvalues

  • Singular Value Decomposition (SVD)

  • Similarity of Eigen and SVD

    Columns of Q are eigenvectors

    Λ contains eigenvalues

    Columns of u are left-singular vectors

    Columns of v are right-singular vects

    Σ contains ordered singular values 𝜎𝑖

    A must be square and we defined A as A=MTM.

    The vj are eigenvectors of MTM.

    The ui are eigenvectors of MMT.

    The eigenvalues are squares of the singular values. (𝜆𝑖 = 𝜎𝑖2)

  • AN APPLICATION EXAMPLE…..

  • FROM::

    Dimensionality Reduction:SVD & CUR

    CS246: Mining Massive DatasetsJure Leskovec, Stanford University

    http://cs246.stanford.edu

  • SVD - Properties

    It is always possible to decompose a real matrix A into A = U VT , where

    U, , V: unique

    U, V: column orthonormal

    UT U = I; VT V = I (I: identity matrix)

    (Columns are orthogonal unit vectors)

    : diagonal

    Entries (singular values) are positive, and sorted in decreasing order (σ1 σ2 ... 0)

    1/29/2018Jure Leskovec, Stanford CS246:

    Mining Massive Datasets15

    Nice proof of uniqueness: http://www.mpi-inf.mpg.de/~bast/ir-seminar-ws04/lecture2.pdf

    http://www.mpi-inf.mpg.de/~bast/ir-seminar-ws04/lecture2.pdf

  • SVD – Example: Users-to-Movies

    Consider a matrix. What does SVD do?

    1/29/2018Jure Leskovec, Stanford CS246:

    Mining Massive Datasets16

    =

    SciFi

    Romance

    Matr

    ix

    Alie

    n

    Sere

    nity

    Casa

    bla

    nca

    Am

    elie

    1 1 1 0 0

    3 3 3 0 0

    4 4 4 0 0

    5 5 5 0 0

    0 2 0 4 4

    0 0 0 5 5

    0 1 0 2 2

    m

    n

    U

    VT

    “Concepts” AKA Latent dimensionsAKA Latent factors

  • SVD – Example: Users-to-Movies

    A = U VT - example: Users to Movies

    1/29/2018Jure Leskovec, Stanford CS246:

    Mining Massive Datasets17

    =

    SciFi

    x x

    Matr

    ix

    Alie

    n

    Sere

    nity

    Casa

    bla

    nca

    Am

    elie

    1 1 1 0 0

    3 3 3 0 0

    4 4 4 0 0

    5 5 5 0 0

    0 2 0 4 4

    0 0 0 5 5

    0 1 0 2 2

    0.13 0.02 -0.01

    0.41 0.07 -0.03

    0.55 0.09 -0.04

    0.68 0.11 -0.05

    0.15 -0.59 0.65

    0.07 -0.73 -0.67

    0.07 -0.29 0.32

    12.4 0 0

    0 9.5 0

    0 0 1.3

    0.56 0.59 0.56 0.09 0.09

    0.12 -0.02 0.12 -0.69 -0.69

    0.40 -0.80 0.40 0.09 0.09

    Romance

  • SVD – Example: Users-to-Movies

    A = U VT - example: Users to Movies

    1/29/2018Jure Leskovec, Stanford CS246:

    Mining Massive Datasets18

    SciFi-concept

    Romance-concept

    =

    SciFi

    x x

    Matr

    ix

    Alie

    n

    Sere

    nity

    Casa

    bla

    nca

    Am

    elie

    1 1 1 0 0

    3 3 3 0 0

    4 4 4 0 0

    5 5 5 0 0

    0 2 0 4 4

    0 0 0 5 5

    0 1 0 2 2

    0.13 0.02 -0.01

    0.41 0.07 -0.03

    0.55 0.09 -0.04

    0.68 0.11 -0.05

    0.15 -0.59 0.65

    0.07 -0.73 -0.67

    0.07 -0.29 0.32

    12.4 0 0

    0 9.5 0

    0 0 1.3

    0.56 0.59 0.56 0.09 0.09

    0.12 -0.02 0.12 -0.69 -0.69

    0.40 -0.80 0.40 0.09 0.09

    Romance

  • SVD – Example: Users-to-Movies

    A = U VT - example:

    1/29/2018Jure Leskovec, Stanford CS246:

    Mining Massive Datasets19

    Romance-concept

    U is “user-to-concept” factor matrix

    SciFi-concept

    =

    SciFi

    x x

    Matr

    ix

    Alie

    n

    Sere

    nity

    Casa

    bla

    nca

    Am

    elie

    1 1 1 0 0

    3 3 3 0 0

    4 4 4 0 0

    5 5 5 0 0

    0 2 0 4 4

    0 0 0 5 5

    0 1 0 2 2

    0.13 0.02 -0.01

    0.41 0.07 -0.03

    0.55 0.09 -0.04

    0.68 0.11 -0.05

    0.15 -0.59 0.65

    0.07 -0.73 -0.67

    0.07 -0.29 0.32

    12.4 0 0

    0 9.5 0

    0 0 1.3

    0.56 0.59 0.56 0.09 0.09

    0.12 -0.02 0.12 -0.69 -0.69

    0.40 -0.80 0.40 0.09 0.09

    Romance

  • SVD – Example: Users-to-Movies

    A = U VT - example:

    1/29/2018Jure Leskovec, Stanford CS246:

    Mining Massive Datasets20

    SciFi

    SciFi-concept

    “strength” of the SciFi-concept

    =

    SciFi

    x x

    Matr

    ix

    Alie

    n

    Sere

    nity

    Casa

    bla

    nca

    Am

    elie

    1 1 1 0 0

    3 3 3 0 0

    4 4 4 0 0

    5 5 5 0 0

    0 2 0 4 4

    0 0 0 5 5

    0 1 0 2 2

    0.13 0.02 -0.01

    0.41 0.07 -0.03

    0.55 0.09 -0.04

    0.68 0.11 -0.05

    0.15 -0.59 0.65

    0.07 -0.73 -0.67

    0.07 -0.29 0.32

    12.4 0 0

    0 9.5 0

    0 0 1.3

    0.56 0.59 0.56 0.09 0.09

    0.12 -0.02 0.12 -0.69 -0.69

    0.40 -0.80 0.40 0.09 0.09

    Romance

  • SVD – Example: Users-to-Movies

    A = U VT - example:

    1/29/2018Jure Leskovec, Stanford CS246:

    Mining Massive Datasets21

    SciFi-concept

    V is “movie-to-concept”factor matrix

    SciFi-concept

    =

    SciFi

    x x

    Matr

    ix

    Alie

    n

    Sere

    nity

    Casa

    bla

    nca

    Am

    elie

    1 1 1 0 0

    3 3 3 0 0

    4 4 4 0 0

    5 5 5 0 0

    0 2 0 4 4

    0 0 0 5 5

    0 1 0 2 2

    0.13 0.02 -0.01

    0.41 0.07 -0.03

    0.55 0.09 -0.04

    0.68 0.11 -0.05

    0.15 -0.59 0.65

    0.07 -0.73 -0.67

    0.07 -0.29 0.32

    12.4 0 0

    0 9.5 0

    0 0 1.3

    0.56 0.59 0.56 0.09 0.09

    0.12 -0.02 0.12 -0.69 -0.69

    0.40 -0.80 0.40 0.09 0.09

    Romance

  • 1/29/2018Jure Leskovec, Stanford CS246:

    Mining Massive Datasets22

    SVD - Interpretation #1

    ‘movies’, ‘users’ and ‘concepts’: U: user-to-concept matrix

    V: movie-to-concept matrix

    : its diagonal elements: ‘strength’ of each concept

  • SVD – Best Low Rank Approx.

    Fact: SVD gives ‘best’ axis to project on:

    ‘best’ = minimizing the sum of reconstruction errors

    1/29/2018Jure Leskovec, Stanford CS246:

    Mining Massive Datasets23

    A U

    Sigma

    VT=

    B U

    Sigma

    VT=

    B is best approximation of A:

    𝐴 − 𝐵 𝐹 =

    𝑖𝑗

    𝐴𝑖𝑗 − 𝐵𝑖𝑗2

  • Example of SVD

  • Case study: How to query?

    Q: Find users that like ‘Matrix’

    A: Map query into a ‘concept space’ – how?

    1/29/2018Jure Leskovec, Stanford CS246:

    Mining Massive Datasets25

    =

    SciFi

    Romnce

    x x

    Matr

    ix

    Alie

    n

    Sere

    nity

    Casa

    bla

    nca

    Am

    elie

    1 1 1 0 0

    3 3 3 0 0

    4 4 4 0 0

    5 5 5 0 0

    0 2 0 4 4

    0 0 0 5 5

    0 1 0 2 2

    0.13 0.02 -0.01

    0.41 0.07 -0.03

    0.55 0.09 -0.04

    0.68 0.11 -0.05

    0.15 -0.59 0.65

    0.07 -0.73 -0.67

    0.07 -0.29 0.32

    12.4 0 0

    0 9.5 0

    0 0 1.3

    0.56 0.59 0.56 0.09 0.09

    0.12 -0.02 0.12 -0.69 -0.69

    0.40 -0.80 0.40 0.09 0.09

  • Case study: How to query?

    Q: Find users that like ‘Matrix’

    A: Map query into a ‘concept space’ – how?

    1/29/2018Jure Leskovec, Stanford CS246:

    Mining Massive Datasets26

    5 0 0 0 0

    q =

    Matrix

    Alie

    n

    v1

    q

    v2

    Matr

    ix

    Alie

    n

    Sere

    nity

    Casa

    bla

    nca

    Am

    elie

    Project into concept space:

    Inner product with each

    ‘concept’ vector vi

  • Case study: How to query?

    Q: Find users that like ‘Matrix’

    A: Map query into a ‘concept space’ – how?

    1/29/2018Jure Leskovec, Stanford CS246:

    Mining Massive Datasets27

    v1

    q

    q*v1

    5 0 0 0 0

    Matr

    ix

    Alie

    n

    Sere

    nity

    Casa

    bla

    nca

    Am

    elie

    v2

    Matrix

    Alie

    n

    q =

    Project into concept space:

    Inner product with each

    ‘concept’ vector vi

  • Case study: How to query?

    Compactly, we have:

    qconcept = q V

    E.g.:

    1/29/2018Jure Leskovec, Stanford CS246:

    Mining Massive Datasets28

    movie-to-conceptfactors (V)

    =

    SciFi-concept

    5 0 0 0 0

    Matr

    ix

    Alie

    n

    Sere

    nity

    Casa

    bla

    nca

    Am

    elie

    q =

    0.56 0.12

    0.59 -0.02

    0.56 0.12

    0.09 -0.69

    0.09 -0.69

    x 2.8 0.6

  • Case study: How to query?

    How would the user d that rated (‘Alien’, ‘Serenity’) be handled?dconcept = d V

    E.g.:

    1/29/2018Jure Leskovec, Stanford CS246:

    Mining Massive Datasets29

    movie-to-conceptfactors (V)

    =

    SciFi-concept

    0 4 5 0 0

    Matr

    ix

    Alie

    n

    Sere

    nity

    Casa

    bla

    nca

    Am

    elie

    q =

    0.56 0.12

    0.59 -0.02

    0.56 0.12

    0.09 -0.69

    0.09 -0.69

    x 5.2 0.4

  • Case study: How to query?

    Observation: User d that rated (‘Alien’, ‘Serenity’) will be similar to user qthat rated (‘Matrix’), although d and q have zero ratings in common!

    1/29/2018Jure Leskovec, Stanford CS246:

    Mining Massive Datasets30

    0 4 5 0 0

    d =

    SciFi-concept

    5 0 0 0 0

    q =

    Matr

    ix

    Alie

    n

    Sere

    nity

    Casa

    bla

    nca

    Am

    elie

    Zero ratings in common Similarity > 0

    2.8 0.6

    5.2 0.4

  • SVD: Drawbacks

    + Optimal low-rank approximationin terms of Frobenius norm

    - Interpretability problem: A singular vector specifies a linear

    combination of all input columns or rows

    - Lack of sparsity: Singular vectors are dense!

    1/29/2018Jure Leskovec, Stanford CS246:

    Mining Massive Datasets31

    =

    U

    VT