Department of Computer Science CSCI 5622: Machine …CSCI 5622: Machine Learning ChenhaoTan ... •...

Department of Computer ScienceCSCI 5622: Machine Learning

Chenhao TanLecture 16: Dimensionality Reduction

Slides adapted from Jordan Boyd-Graber, Chris Ketelsen

Midterm

A. Review session

B. Flipped classroom

C. Go over the example midterm

D. Clustering!

Learning objectives

• Understand what unsupervised learning is for

• Learn principal component analysis

• Learn singular value decomposition

Supervised learning

Unsupervised learning

Data: X Labels: Y Data: X

Supervised learning

Data: X

Latent structure: Z

Data: X Labels: Y

When do we need unsupervised learning?

• Acquiring labels is expensive

• You may not even know what labels to acquire

When do we need unsupervised learning?• Exploratory data analysis

• Learn patterns/representations that can be useful for supervised

learning (representation learning)

• Generate data

• …

When do we need unsupervised learning?

https://qz.com/1090267/artificial-intelligence-can-now-show-you-how-those-pants-will-fit/

• Dimensionality reduction

• Clustering

• Topic modeling

• Dimensionality reduction

• Clustering

• Topic modeling

Principal Component Analysis -Motivation

Data’s features almost certainly correlated

Makes it hard to see hidden structure

To make this easier, let try to reduce this to 1-dimension

We need to shift our perspective

Change the definition of up-down-left-right

Choose new features as linear combinations of old features

Change of feature-basis

We need to shift our perspective

Change the definition of up-down-left-right

Choose new features as linear combinations of old features

Change of feature-basis

Important: Center and normalize data before performing PCAWe will assume that this has already been done in this lecture.

Proceed incrementally:

• If we could choose one combination to describe data?

• Which combination leads to the least loss of information?

• Once we've found that one, look for another one, perpendicular

to the first, the retains the next most amount of information-

• Repeat until done (or good enough)

The best vector to project onto is called the 1st principal componentWhat properties should it have?

The best vector to project onto is called the 1st principal componentWhat properties should it have?• Should capture largest variance in data• Should probably be a unit vector

The best vector to project onto is called the 1st principal componentWhat properties should it have?• Should capture largest variance in data• Should probably be a unit vectorAfter we’ve found the first, look the second which:• Captures largest amount of leftover variance• Should probably be a unit vector• Should be orthogonal to the one that came before it

Main idea: The principal components give a new perpendicular coordinate system to view data where each principle component describes successively less and less information.

So far: All we’ve done is a change of basis on the feature space.

But when do we reduce the dimension?

Picture data points in a 3D feature space

What if the points lied mostly along a single vector?

The other two principal components are still there

But they do not carry much information

The other two principal components are still there

But they do not carry much information

Throw them away and work with low dimensional representation!

Reduce 3D data to 1D

Principal Component Analysis – The How

But how do we find w?

PCA – Dimensionality reduction

Questions:• How do we reduce dimensionality?• How much stuff should we keep?

PCA – Dimensionality reduction

PCA - applications

Connecting PCA and SVD

SVD Applications

Wrap up

Dimensionality reduction can be a useful way to • explore data• visualize data• represent data

Department of Computer Science CSCI 5622: Machine …CSCI 5622: Machine Learning ChenhaoTan ... •...

Documents

Transcript of Department of Computer Science CSCI 5622: Machine …CSCI 5622: Machine Learning ChenhaoTan ... •...

Introduction to Machine Learning - Brown Universitycs.brown.edu/.../2012-02-02_bayesClassification.pdf · 2/2/2012 · Introduction to Machine Learning Brown University CSCI 1950-F,

Financial Reports EGN 5622 Enterprise Systems Integration Spring, 2012 Financial Reports EGN 5622 Enterprise Systems Integration Spring, 2012.

Machine Learning CSCI 5622

Department of Computer Science CSCI 5622: Machine Learning ... · •Final project midpoint presentation •For the final project, each person will be asked to summarize what everyone

Performance Management EGN 5622 Enterprise Systems Integration Spring, 2012

PARTS & SERVICEservicematters.whirlpoolcorp.com/.../service_manuals/refrig/5622.pdf · No Frost Refrigerator PARTS & SERVICE Manual Models 5622 285L 2G UL 110V QCS - WHITE 5624 285L

CSCI 4155/6505: Machine Learning with Roboticstt/CSCI415511/415511_9.pdf2nd edition, MIT Press 2010, and Pattern Recognition and Machine Learning by Christopher Bishop, Springer 2006.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

SPRING 2016 TOPICS COURSES SPRING... · 2015. 12. 1. · Artificial Intelligence (e.g., CSCI 4150); (b) Machine learning (e.g., CSCI 4100/6100); (c) Optimization; or (d) Microeconomics

Enterprise Operations Overview EGN 5622 Enterprise Systems Integration Spring, 2015 Enterprise Operations Overview EGN 5622 Enterprise Systems Integration.

Processor Structure & Operations of an Accumulator Machine Andres Ochoa CSCI 6303 Principles of Information Technology.

NI 5622 Calibration Procedure - National Instruments · NI 5622 Calibration Procedure 2 ni.com Conventions The following conventions are used in this document: » The » symbol leads

Lanier 5622 5627 Manual de Copiadora AF1022Gen ES OI Copy

Business ByDesign EGN 5622 Enterprise Systems Integration Spring, 2012.

TETTEX 5622 DIGITAL KILOVOLTMETER

Controlling Management EGN 5622 Enterprise Systems Integration (Professional MSEM) Fall, 2011 Controlling Management EGN 5622 Enterprise Systems Integration.

Controlling Management EGN 5622 Enterprise Systems Integration Spring, 2015 Controlling Management EGN 5622 Enterprise Systems Integration Spring, 2015.

Controlling Management EGN 5622 Enterprise Systems Integration Fall, 2015 Controlling Management EGN 5622 Enterprise Systems Integration Fall, 2015.

Performance Management EGN 5622 Enterprise Systems Integration Spring, 2012 Performance Management EGN 5622 Enterprise Systems Integration Spring, 2012.

CSCI 6505 Machine Learning Project