Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · •...

56
Dimensionality Reduction and Principle Components Analysis 1

Transcript of Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · •...

Page 1: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

Dimensionality Reduction�and Principle Components

Analysis

1

Page 2: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

Outline •  What is dimensionality reduction? •  Principle Components Analysis (PCA)

– Example (Bishop, ch 12) – PCA vs linear regression – PCA as a mixture model variant – Implementing PCA

•  Other matrix factorization methods – Applications (collaborative Jiltering) – MF with SGD – MF vs clustering vs LDA vs ….

2

Page 3: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

A DIMENSIONALITY REDUCTION METHOD YOU ALREADY KNOW….

3 http://ufldl.stanford.edu/wiki/

Page 4: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

Outline •  What’s new in ANNs in the last 5-10 years?

– Deeper networks, more data, and faster training •  Scalability and use of GPUs ✔ •  Symbolic differentiation ✔ •  Some subtle changes to cost function, architectures,

optimization methods ✔ •  What types of ANNs are most successful and why?

– Convolutional networks (CNNs) ✔ – Long term/short term memory networks (LSTM) ✔ – Word2vec and embeddings✔ – Autoencoders

•  What are the hot research topics for deep learning?

Page 5: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

Neural network auto-encoding •  Assume we would

like to learn the following (trivial?) output function:

Input Output

00000001 00000001

00000010 00000010

… …

10000000 10000000

•  One approach is to use this network

http://ufldl.stanford.edu/wiki/

Page 6: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

Neural network auto-encoding Input Output

00000001 00000001

00000010 00000010

… …

10000000 10000000

Maybe learning something like this:

http://ufldl.stanford.edu/wiki/

Page 7: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

Neural network autoencoding •  The hidden layer is a compressed version of the data •  If training error is zero then the training data is

perfectly reconstructed –  It probably won’t be unless you overJit

•  Other similar examples will be imperfectly reconstructed

•  High reconstruction error usually means the example is different from the training data – Reconstruction error on a vector x is related to P(x)

on the probability that the auto-encoder was trained with.

•  Autoencoders are related to generative models of the data

Page 8: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

(Deep) Stacked Autoencoders

8 http://ufldl.stanford.edu/wiki/

Train Apply

new dataset H of hidden unit activations for X

Page 9: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

(Deep) Stacked Autoencoders

9 http://ufldl.stanford.edu/wiki/

Page 10: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression
Page 11: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

Modeling documents using top 2000 words.

Page 12: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression
Page 13: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

Applications of dimensionality reduction •  Visualization

– examining data in 2 or 3 dimensions •  Semi-supervised learning

– supervised learning in the reduced space •  Anomaly detection and data completion

– coming soon •  Convenience in data processing

– Go from words (106 dimensions) to word vectors (hundreds of dimensions)

– Use dense GPU-friendly matrix operations instead of sparse ones

13

Page 14: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

PCA

14

Page 15: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

PCA is basically this picture…

•  with linear units (not sigmoid units)

•  trained to minimize squared error

•  with a constraint that the “hidden units” are orthogonal

Page 16: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

Motivating Example of PCA

16

Page 17: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

A Motivating Example

17

Page 18: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

A Motivating Example

18

Page 19: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

A Motivating Example

19

Page 20: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

20

Page 21: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

PCA as matrices

10,000 pixels

1000

imag

es

1000 * 10,000,00

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bm v11 … … …

… …

vij

vnm

~

V[i,j] = pixel j in image i

2 prototypes

PC1

PC2

Page 22: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

PCA as matrices: the optimization problem

Page 23: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

PCA as vectors: the optimization problem

Page 24: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

A cartoon of PCA

24

Page 25: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

A cartoon of PCA

25

Page 26: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

A 3D cartoon of PCA

26

Page 27: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

More cartoons

27

Page 28: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

PCA as matrices

10,000 pixels

1000

imag

es

1000 * 10,000,00

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bm v11 … … …

… …

vij

vnm

~

V[i,j] = pixel j in image i

2 prototypes

PC1

PC2

Page 29: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

PCA vs linear regression

Page 30: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

PCA vs linear regression

Page 31: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

PCA vs mixtures of Gaussians

Page 32: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

PCA vs mixtures of Gaussians

Page 33: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

PCA vs mixtures of Gaussians

Page 34: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

PCA vs mixtures of Gaussians

Page 35: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

PCA: IMPLEMENTATION

35

Page 36: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

Finding principle components of a matrix

•  There are two algorithms I like – EM (Roweis, NIPS 2007) – Eigenvectors of the correlation matrix

(next)

36

Page 37: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

PCA as vectors: the optimization problem

Page 38: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

PCA as matrices: the optimization problem

Page 39: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

Implementing PCA

39

Page 40: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

Implementing PCA: why does this work?

40

Page 41: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

Some more examples of PCA:�Eigenfaces

41

Page 42: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

Some more examples of PCA:�Eigenfaces

42

Page 43: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

Some more examples of PCA:�Eigenfaces

43

Page 44: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

Some more examples of PCA:�Eigenfaces

44

Page 45: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

Some more examples of PCA:�Eigenfaces

45

Page 46: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

Image denoising

46

Page 47: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

for image denoising

47

Page 48: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

PCA FOR MODELING TEXT�(SVD = SINGULAR VALUE DECOMPOSITION)

48

Page 49: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

A Scalability Problem with PCA •  Covariance matrix is large in high dimensions

– With d features covariance matrix is d*d •  SVD is a closely-related method that can be

implemented more efJiciently in high dimensions – Don’t explicitly compute covariance matrix – Instead decompose X to X=USVT

– S is k*k where k<<d – S is diagonal and S[i,i]=sqrt(λi) for V[:,i] – Columns of V ~= principle components – Rows of US ~= embedding for examples

49

Page 50: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

SVD example

50

Page 51: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

•  The Neatest Little Guide to Stock Market Investing •  Investing For Dummies, 4th Edition •  The Little Book of Common Sense Investing: The Only

Way to Guarantee Your Fair Share of Stock Market Returns

•  The Little Book of Value Investing •  Value Investing: From Graham to Buffett and Beyond •  Rich Dad’s Guide to Investing: What the Rich Invest in,

That the Poor and the Middle Class Do Not! •  Investing in Real Estate, 5th Edition •  Stock Investing For Dummies •  Rich Dad’s Advisors: The ABC’s of Real Estate

Investing: The Secrets of Finding Hidden Profits Most Investors Miss

https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/

Page 52: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/

TFIDF counts would be better

Page 53: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

Recovering latent factors in a matrix

m terms

n do

cum

ents

doc term matrix

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

vij

vnm

~

V[i,j] = TFIDF score of term j in doc i

53

Page 54: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

=

Page 55: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

Investing for real estate

Rich Dad’s Advisor’s: The ABCs of Real

Estate Investment …

Page 56: Dimensionality Reduction and Principle Components Analysiswcohen/10-601/601-pca.pdf · • Principle Components Analysis (PCA) – Example (Bishop, ch 12) – PCA vs linear regression

The little book of common

sense investing: …

Neatest Little Guide to Stock

Market Investing