Machine Learning Dimensionality Reduction

37
Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Dimensionality Reduction slides thanks to Xiaoli Fern (CS534, Oregon State Univ., 2011)

description

Machine Learning Dimensionality Reduction. slides thanks to Xiaoli Fern (CS534, Oregon State Univ., 2011). Dimensionality reduction. Many modern data domains involve huge numbers of features / dimensions Documents: thousands of words, millions of bigrams - PowerPoint PPT Presentation

Transcript of Machine Learning Dimensionality Reduction

Page 1: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 1

Machine Learning

Dimensionality Reduction

slides thanks to Xiaoli Fern (CS534, Oregon State Univ., 2011)

Page 2: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 2

Many modern data domains involve huge numbers of features / dimensions

– Documents: thousands of words, millions of bigrams

– Images: thousands to millions of pixels

– Genomics: thousands of genes, millions of DNA polymorphisms

Dimensionality reduction

Page 3: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 3

High dimensionality has many costs

– Redundant and irrelevant features degrade performance of some ML algorithms

– Difficulty in interpretation and visualization

– Computation may become infeasible what if your algorithm scales as O( n3 )?

– Curse of dimensionality

Why reduce dimensions?

Page 4: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 4

Page 5: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 5

Page 6: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 6

Page 7: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 7

Page 8: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 8

Page 9: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 9

Page 10: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 10

Page 11: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 11

Repeat until m lines

Page 12: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 12

Mean center the data

Compute covariance matrix

Calculate eigenvalues and eigenvectors of – Eigenvector with largest eigenvalue 1 is 1st principal

component (PC)

– Eigenvector with kth largest eigenvalue k is kth PC

– k / i i = proportion of variance captured by kth PC

Steps in principal component analysis

Page 13: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 13

Full set of PCs comprise a new orthogonal basis for feature space, whose axes are aligned with the maximum variances of original data.

Projection of original data onto first k PCs gives a reduced dimensionality representation of the data.

Transforming reduced dimensionality projection back into original space gives a reduced dimensionality reconstruction of the original data.

Reconstruction will have some error, but it can be small and often is acceptable given the other benefits of dimensionality reduction.

Applying a principal component analysis

Page 14: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 14

PCA example

original data mean centered data withPCs overlayed

Page 15: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 15

PCA example

original data projectedInto full PC space

original data reconstructed usingonly a single PC

Page 16: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 16

Page 17: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 17

Choosing the dimension k

Page 18: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 18

Page 19: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 19

Page 20: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 20

Page 21: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 21

Page 22: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 22

Helps reduce computational complexity. Can help supervised learning.

– Reduced dimension simpler hypothesis space.

– Smaller VC dimension less risk of overfitting.

PCA can also be seen as noise reduction. Caveats:

– Fails when data consists of multiple separate clusters.

– Directions of greatest variance may not be most informative (i.e. greatest classification power).

PCA: a useful preprocessing step

Page 23: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 23

Page 24: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 24

Page 25: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 25

Page 26: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 26

Page 27: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 27

Page 28: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 28

Page 29: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 29

Page 30: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 30

Page 31: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 31

Page 32: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 32

Page 33: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 33

Page 34: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 34

Page 35: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 35

Per Tom Dietterich:

“Methods that can be applied directly to data without requiring a great deal of time-consuming data preprocessing or careful tuning of the learning procedure.”

Off-the-shelf classifiers

Page 36: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 36

Off-the-shelf criteria

slide thanks to Tom Dietterich (CS534, Oregon State Univ., 2005)

Page 37: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 37

from Andrew Ng at Stanford

slides:

http://cs229.stanford.edu/materials/ML-advice.pdf

video:

http://www.youtube.com/v/sQ8T9b-uGVE(starting at 24:56)

Practical advice on machine learning