Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… ·...

55
Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng

Transcript of Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… ·...

Page 1: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Gilad Lerman

School of Mathematics

University of Minnesota

Topics in Machine

Learning

Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng

Page 2: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Machine Learning - Motivation

• Arthur Samuel (1959): “Field of study that

gives computers the ability to learn

without being explicitly programmed”

Page 3: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Machine Learning - Motivation

• Arthur Samuel (1959): “Field of study that

gives computers the ability to learn

without being explicitly programmed”

• In between, computer science, statistics,

optimization,…

• Three categories (soft dichotomy)

Supervised learning

Unsupervised learning

Reinforcement learning

Page 4: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Difficulties

• Understanding the methods

(requires knowledge of various areas)

• Understanding data and application areas

• Sometimes hard to establish mathematical

guarantees

• Sometimes hard to code and test

• Fast developing area of research

Page 5: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Simplification

• To avoid such difficulties, but obtain a fine

level of knowledge in 2 days, we’ll follow:

• Book is available online

• Plan: last 3 chapters (8-10)

and a bit more….

Page 6: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Review

• Supervised learning (training and test

sets) vs. unsupervised learning

• Examples of supervised learning:

regression, classification

• Examples of unsupervised learning:

density/function estimation, clustering,

dimension reduction

• Recall: regression, bias-variance tradeoff,

resampling (e.g., cross validation), linear

and non-linear models

Page 7: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Quick Review of Regression

and Nearest Neighbors

• Regression predicts a response variable Y (quantitative

variable) in terms of input variables (predictors) X1,…,Xp

given n samples in p; denote X=(X1,…,Xp)

• The regression function f(x)=E(Y|X=x) is the minimizer

of the mean square prediction error

• We cannot precisely compute f, since we have few if any

values of given x

Page 8: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Estimating f by NN

Page 9: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Remarks on NN and

Classification

• Need 𝑝 ≤ 4 and sufficiently large n

• Nearest neighbors tend to be far away in

high dimensions

• Can use kernel or spline smoothing

• Other common methods: parametric and

structure models

Page 10: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Neighborhoods in Increasing

Dimensions

Page 11: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

More on Regression

• Assessing model accuracy:

Page 12: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

More on Regression

Flexibility = degrees of freedom (each square represents method with same color),

Dashed line explained later (irreducible error)

Page 13: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

More on Regression

Page 14: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

More on Regression

Page 15: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

More on Regression

Page 16: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

On Regression Error

• For an estimator 𝑓 learned on training set

the mean squared error is

𝐸(𝑌 − 𝑓 𝑋 |𝑋 = 𝑥)2

• Assume 𝑌 = 𝑓 𝑋 + 𝜀, wher𝜀 is independent

noise with mean zero, then

𝐸(𝑌 − 𝑓 𝑋 |𝑋 = 𝑥)2 = 𝐸(𝑓 𝑋 + 𝜀 − 𝑓 𝑋 |𝑋 = 𝑥)2

= 𝐸(𝑓 𝑋 − 𝑓 𝑋 |𝑋 = 𝑥)2 + Var(𝜀)• Var(𝜀) is the irreducible error

• 𝐸(𝑓 𝑋 − 𝑓 𝑋 |𝑋 = 𝑥)2 is the reducible error

( 𝑓 𝑋 depends on random training sample)

Page 17: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Regression Error:

Bias and Variance

• 𝐸(𝑓 𝑋 − 𝑓 𝑋 |𝑋 = 𝑥)2 =

𝐸( 𝑓 𝑋 − 𝐸( 𝑓 𝑋 )|𝑋 = 𝑥)2 +

(𝐸( 𝑓 𝑋 |𝑋 = 𝑥) −𝑓 𝑥 )2=

Var( 𝑓 𝑋 |𝑋 = 𝑥)+Bias2(( 𝑓 𝑋 |𝑋 = 𝑥)

• 𝐸(𝑌 − 𝑓 𝑋 |𝑋 = 𝑥)2 =

Var( 𝑓 𝑋 |𝑋 = 𝑥)+Bias2(( 𝑓 𝑋 |𝑋 = 𝑥)+Var(𝜀)

Page 18: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Bias-Variance Tradeoff

Two other tradeoffs:

Page 19: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Bias-Variance Tradeoff

Page 20: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Quick Review of Classification

and Nearest Neighbors

• Classification:

Page 21: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Quick Review of Classification

and Nearest Neighbors

• Example:

Page 22: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Quick Review of Classification

and Nearest Neighbors

Page 23: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Quick Review of Classification

and Nearest Neighbors

Page 24: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Quick Review of Classification

and Nearest Neighbors

Page 25: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Quick Review of Classification

and Nearest Neighbors

Page 26: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Quick Review of Classification

and Nearest Neighbors

Page 27: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Chapter 9: SVM

Page 28: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Separation of 2 Classes by a

hyperplane• Training set: 𝑛 points (𝑥i,1, … , 𝑥i,p) , 1 ≤ 𝑖 ≤ 𝑛,

with 𝑛 labels 𝑦i∈ −1,1 , 1 ≤ 𝑖 ≤ 𝑛• Separating hyperplane (if exists) satisfies:

Page 29: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Separation of 2 Classes by a

hyperplane

Example:

Page 30: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Separation of 2 Classes by a

hyperplane• If a separating hyperplane exists, then

for a test observation 𝑥*, a classifier is

obtained by the sign of

(negative (positive) sign → -1/1)

• The magnitude of 𝑓 𝑥 * provides

confidence on class assignment

•* * 2

0

1 1

d( ,Hyp.) /p p

i i

i i

x β β x β

Page 31: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Maximal Margin Classifier

Page 32: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Maximal Margin Classifier

• MMC is the solution of

• No explanation in book, but immediate for

a math student…

• Actual algorithm is not discussed…

Page 33: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Numerical Solution (following

A. Ng’s Cs229 notes)

• Change of notation: y(i)=yi, 𝑥(i)=(𝑥i,1 , … , 𝑥i,p)

• Recall – Distance of (𝑥(i),y(i)) to a hyperplane

𝑤T𝑋 +b=0 is |𝑤T𝑥(i)+b|/ 𝑤

Page 34: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Numerical Solution (following

A. Ng’s Cs229 notes)

Original Problem (non-convex):

Equivalent non-convex problem via

Page 35: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Numerical Solution (following

A. Ng’s Cs229 notes)

Scale w and b by the same constant so that

(no effect on problem) and change

to the convex problem (quadratic program)

Page 36: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Equivalent Formulation

(following A. Ng’s Cs229 notes)

Lagrangian:

Dual:

Solution: Hence:

(used later)

Page 37: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

A Non-separable Example

Page 38: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Non-robustness of the

Maximal Margin Classifier

Page 39: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

The Support Vector Classifier

• If εi=0 → correct side of boundary

• If εi>0 → wrong side of margin

• If εi>1 → wrong side of hyperplane

• Solution is effected only by support vectors, i.e.,

observations on wrong side of margins or boundary.

Page 40: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Concept Demonstration

Page 41: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

More on the Optimization

Problem

• C – controls # observations on wrong side of margin

• C – controls the bias-variance trade-off

• Optimizer is effected only by support vectors

Increasing C in

clock-wise order:

Page 42: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Equivalent Formulation

(following A. Ng’s Cs229 notes)

• Dual:

• Similarly as before wTx is a linear

combination of <x,x(i)>

Page 43: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Support Vector Machine (SVM)

• From linear to nonlinear boundaries by

embedding to a higher-dimensional space

• The algorithm can be written in terms of a

dot product

• Instead of embed to a very high-dimen.

space, replace dot products with kernels

Page 44: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Clarification

Page 45: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Clarification

Page 46: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

More (following book)

By solution of SVC (recall earlier comment)

Can use only support vectors for SVC

For SVM – replace dot products with kernels

Page 47: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Demonstration

Page 48: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

SVM for K>2 Classes

• OVO (One vs. One): For training data,

construct 𝐾2

1/-1 classifiers (2 classes

out of K classes). For test point, use voting

(class with most pairwise assignments)

• OVA (One vs. All): For training, construct K

classifiers (class with 1 vs. rest of classes

with -1). For test x*, classify according to

largest estimated f(x*)

• OVO is better for K not too large

Page 49: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Chapter 8: Tree-based

Methods (or CART)

• Decision Trees for Regression

• Demonstration of predicting log(salary/1000) as a func.

of # of years in major leagues and hits in previous year

• Terminology: leaf/terminal node, internal node, branch

Page 50: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Chapter 8: Tree-based

Methods (or CART)

Page 51: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Building a Decision Tree

• We wish to minimize the RSS (residual sum of squares):

• Computationally infeasible. Use instead recursive binary

splitting (top-down greedy procedure)

Page 52: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Recursive Binary Splitting

• At each node (top to bottom) determine

predictor Xj and cutoff s minimizing

21

1 2

22

: ( , ): ( , )2 2

: ( , ) : ( , )1 2( , ) ( , )ii

i i

ii

i x R j si x R j s

i i

i x R j s i x R j s

yy

y yR j s R j s

Page 53: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Recursive Binary Splitting

• For 𝑗 = 1, … , 𝑝, determine s that maximize

• Can be done by sorting the j-values and

checking all n-1 pairs (xi,xi+1)

(O(1) operations for each) and reporting

average of xi and xi+1, for max. i.

• Total cost is O(pn).

• We assumed continuous random variables (can

modify for discrete ones)

21

22

: ( , ): ( , )

1 2( , ) ( , )ii

ii

i x R j si x R j s

yy

R j s R j s

Page 54: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

More on Recursive Binary

Splitting

• The previous process is repeated until a stopping

criteria is met

• Predict response by mean of training

observations in region the test sample belong to

Page 55: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning

Tree Pruning

• Continue page 17 of books’ slides trees.pdf