Department of Computer Science CSCI 5622: Machine …CSCI 5622: Machine Learning ChenhaoTan ... •...

Post on 09-Jul-2020

16 views 0 download

Transcript of Department of Computer Science CSCI 5622: Machine …CSCI 5622: Machine Learning ChenhaoTan ... •...

Department of Computer ScienceCSCI 5622: Machine Learning

Chenhao TanLecture 16: Dimensionality Reduction

Slides adapted from Jordan Boyd-Graber, Chris Ketelsen

1

Midterm

A. Review session

B. Flipped classroom

C. Go over the example midterm

D. Clustering!

2

Learning objectives

• Understand what unsupervised learning is for

• Learn principal component analysis

• Learn singular value decomposition

3

Supervised learning

4

Unsupervised learning

Data: X Labels: Y Data: X

Supervised learning

5

Unsupervised learning

Data: X

Latent structure: Z

Data: X Labels: Y

When do we need unsupervised learning?

6

When do we need unsupervised learning?

• Acquiring labels is expensive

• You may not even know what labels to acquire

7

When do we need unsupervised learning?• Exploratory data analysis

• Learn patterns/representations that can be useful for supervised

learning (representation learning)

• Generate data

• …

8

When do we need unsupervised learning?

9

https://qz.com/1090267/artificial-intelligence-can-now-show-you-how-those-pants-will-fit/

Unsupervised learning

10

• Dimensionality reduction

• Clustering

• Topic modeling

Unsupervised learning

11

• Dimensionality reduction

• Clustering

• Topic modeling

Principal Component Analysis -Motivation

12

Principal Component Analysis -Motivation

13

Data’s features almost certainly correlated

Principal Component Analysis -Motivation

14

Makes it hard to see hidden structure

Principal Component Analysis -Motivation

15

To make this easier, let try to reduce this to 1-dimension

Principal Component Analysis -Motivation

16

We need to shift our perspective

Change the definition of up-down-left-right

Choose new features as linear combinations of old features

Change of feature-basis

Principal Component Analysis -Motivation

17

We need to shift our perspective

Change the definition of up-down-left-right

Choose new features as linear combinations of old features

Change of feature-basis

Important: Center and normalize data before performing PCAWe will assume that this has already been done in this lecture.

Principal Component Analysis -Motivation

18

Proceed incrementally:

• If we could choose one combination to describe data?

• Which combination leads to the least loss of information?

• Once we've found that one, look for another one, perpendicular

to the first, the retains the next most amount of information-

• Repeat until done (or good enough)

Principal Component Analysis -Motivation

19

Principal Component Analysis -Motivation

20

Principal Component Analysis -Motivation

21

Principal Component Analysis -Motivation

22

Principal Component Analysis -Motivation

23

Principal Component Analysis -Motivation

24

Principal Component Analysis -Motivation

25

Principal Component Analysis -Motivation

26

Principal Component Analysis -Motivation

27

Principal Component Analysis -Motivation

28

Principal Component Analysis -Motivation

29

Principal Component Analysis -Motivation

30

Principal Component Analysis -Motivation

31

Principal Component Analysis -Motivation

32

The best vector to project onto is called the 1st principal componentWhat properties should it have?

Principal Component Analysis -Motivation

33

The best vector to project onto is called the 1st principal componentWhat properties should it have?• Should capture largest variance in data• Should probably be a unit vector

Principal Component Analysis -Motivation

34

The best vector to project onto is called the 1st principal componentWhat properties should it have?• Should capture largest variance in data• Should probably be a unit vectorAfter we’ve found the first, look the second which:• Captures largest amount of leftover variance• Should probably be a unit vector• Should be orthogonal to the one that came before it

Principal Component Analysis -Motivation

35

Principal Component Analysis -Motivation

36

Principal Component Analysis -Motivation

37

Main idea: The principal components give a new perpendicular coordinate system to view data where each principle component describes successively less and less information.

Principal Component Analysis -Motivation

38

Main idea: The principal components give a new perpendicular coordinate system to view data where each principle component describes successively less and less information.

So far: All we’ve done is a change of basis on the feature space.

But when do we reduce the dimension?

Principal Component Analysis -Motivation

39

But when do we reduce the dimension?

Picture data points in a 3D feature space

What if the points lied mostly along a single vector?

Principal Component Analysis -Motivation

40

The other two principal components are still there

But they do not carry much information

Principal Component Analysis -Motivation

41

The other two principal components are still there

But they do not carry much information

Throw them away and work with low dimensional representation!

Reduce 3D data to 1D

Principal Component Analysis – The How

42

Principal Component Analysis – The How

43

Principal Component Analysis – The How

44

Principal Component Analysis – The How

45

Principal Component Analysis – The How

46

But how do we find w?

Principal Component Analysis – The How

47

But how do we find w?

Principal Component Analysis – The How

48

Principal Component Analysis – The How

49

Principal Component Analysis – The How

50

Principal Component Analysis – The How

51

Principal Component Analysis – The How

52

Principal Component Analysis – The How

53

Principal Component Analysis – The How

54

Principal Component Analysis – The How

55

Principal Component Analysis – The How

56

Principal Component Analysis – The How

57

Principal Component Analysis – The How

58

Principal Component Analysis – The How

59

PCA – Dimensionality reduction

60

Questions:• How do we reduce dimensionality?• How much stuff should we keep?

PCA – Dimensionality reduction

61

PCA – Dimensionality reduction

62

Quiz

63

PCA - applications

64

PCA - applications

65

PCA - applications

66

PCA - applications

67

PCA - applications

68

PCA - applications

69

Connecting PCA and SVD

70

SVD Applications

71

Wrap up

72

Dimensionality reduction can be a useful way to • explore data• visualize data• represent data