LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction...

27
LECTURE 05: CLASSIFICATION PT. 1 September 25, 2017 SDS 293: Machine Learning

Transcript of LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction...

Page 1: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

LECTURE 05:CLASSIFICATION PT. 1September 25, 2017SDS 293: Machine Learning

Page 2: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

Q&A: homework format

Q: What file format should we use for our homework?

A: PDF is fine for conceptual exercises; Jupyter notebook is preferable for applied exercises but .Rmd and .py are also acceptable

Page 3: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

Q&A: confidence vs. prediction intervals

Q: Problem 3.8 asks for the predicted mpg of a car with 98 horsepower, and then asks for the associated prediction and confidence intervals. I thought that for single number predictions you could only use prediction intervals and confidence intervals were for means and coefficients?

A: Let’s step back and consider what each one is telling us:

Page 4: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

Q&A: confidence vs. prediction intervals

Confidence intervalWe have 95%

confidence that the mean of all samples

with these predictorswill fall within this

interval

Prediction intervalWe have 95%

confidence that thenext sample with these predictors will fall within this

interval

Page 5: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

Outline

• Motivation• Bayes classifier• K-nearest neighbors• Logistic regression- Logistic model- Estimating coefficients with maximum likelihood- Multivariate logistic regression- Multiclass logistic regression- Limitations

• Linear discriminant analysis (LDA)- Bayes’ theorem- LDA on one predictor- LDA on multiple predictors

• Comparing Classification Methods

Page 6: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

@I6DUA6DIE

Y 9&#(0."#594TDB64T'A.#<$'$#$'=) 94M5IEM4M

-7AF4'B94TD6*'@DEF'@AOASKCA7

!1!

"#

!

"#height = +!2

!

"#

!

"#+!3

!

"#

!

"#

!

"

!

"

Page 7: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

@I6DUA6DIE

Y !:+3-)&'" JCA6'DL'J4'CAU4'A'A.#('$#$'=)%94M5IEM4_

Page 8: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

@I6DUA6DIE

Y !DK4'DE'94F94MMDIE;'J4'CAU4'A'M46'IL'69ADEDEF'*4-)+=#$'*<-*

Y WAE6'6I'8NDOT'A'7IT4O'6I'594TDB6'6C4']`NAOD6A6DU4^'94M5IEM4'LI9''+5 I8M49UA6DIEM

({ ), ( ), ( )...}

Page 9: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

#OAMMDLDBA6DIE

Y 194TDB6DEF'A'`NAOD6A6DU4'94M5IEM4'LI9'AE'I8M49UA6DIE'BAE'84'6CINFC6'IL'AM',1033)(;)'< 6CA6'I8M49UA6DIE

=>?@A

Page 10: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

Y WD6C'`NAE6D6A6DU4'94M5IEM4M;'J4'74AMN94T'A'7IT4OVM'ABBN9ABS'NMDEF'6C4',)#<%-A.#+)/%)++*+*

!"# $ %&' () *+ , -+.

Y 7.&B1+2" CIJ'TI'J4'6AK4'6C4'TDLL494EB4'IL'6JI'BOAMM4M

Y 9&1:-)&'" J4VOO'69S'6I'7DED7Da4'6C4'59I5I96DIE'IL'7DMBOAMMDLDBA6DIEM']6C4'$)-$%)++*+%+#$)^'DEM64AT*

/# $ %&' 0 -+ 1 -2+

( ! )?

GNAE6DLSDEF'ABBN9ABS

b0 3 DM'AE'DETDBA6I9'LNEB6DIE'6CA6'IN65N6M'3'DL'6C4'DE5N6'DM'69N4;'('I6C49JDM4

Page 11: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

WCS'JIEV6'ODE4A9'94F94MMDIE'JI9K_

Y !&'IEOS'JI9KM'IE'C:0'-)-0-)/+ 94M5IEM4M']JCS_^Y #INOT'69S'A559IABC'J4'6IIK'JD6C'`NAOD6A6DU4'4.+*),-&.3*

Y WCA6VM'6C4'59I8O47_

Y =1 if2 if3 if

!

"#

$#

Page 12: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

WCS'JIEV6'ODE4A9'94F94MMDIE'JI9K_

Y -M'D6'AES'846649'DL'J4'IEOS'CAU4'A'8DEA9S'94M5IEM4_

Y -E'6CDM'BAM4;'CIJ'7DFC6'J4'DE6495946'456_

Y =0 if

1 if

!

"#

$#

I9

Page 13: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

Bayes’ classifier

• A simple, elegant classifier: assign each observation to the most likely class, given its predictor values

• Mathematically: assign a test observation with predictor vector 𝑥+ to the class 𝑗 that maximizes:

Pr 𝑌 = 𝑗|𝑋 = 𝑥+

Page 14: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

$IS'4PA75O4

X1

X2

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o oo

o

o o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oooo

o

oo

o

ooo

oo

o

o

o

o

o

o

ooooo

o

o

o

o

o

oooo

o

o

o

oooo

o

o

ooooo o

o

oooo

o

oooo

o

ooo

o

o

o

ooo

o

o

o

ooooooooooooooo

o

ooo

o

oooo

o

oooo

o

o

o

o

oo

oooo

o

o

oooo

o

o

o

o

oooo

oooo

oo

oo

oooo

o

oooo

o

o

oooo

o

oooooooooooooooooooooooo

oooo

o

oo

oo

oooo

o

oo

oo

o

o

oooooooo oooooooo

oooo

oo

o

o

o

o

o

o

o

o

oooo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oooo

o

o

o

o

oooo

oo

oo

o oooo

oooo

o

o

o

oooooooooooooo

o

o

o

o

o

o

oooooooooooooo

o

o

o

o

o

o

oo

oo

o

ooooo

o

o

oooo oooo o

oooo

oooooo

o

o

o

oooooooo

ooooooooooooo

o

o

o

o

o

o

oooo

oooooooooooooo

o

o

o

o

oo

oooooooo

oooo

o

oooo

o

oooo

o

oooooooooooooooooooooooo

o

ooo

o

ooo

oooo

oooo

oo

oo

o

oooooooo

o

ooooo

o

oooo

o

o

oooooooooo

oo

oooo

oooooo

oo

oooo

o

o

o

o

oo

o

oo

oo

o

o

ZAS4MV'=4BDMDIE'ZINETA9S

)(X

Pr(Y = orange | X)> 50%

Pr(Y = blue | X)> 50%

=>?@;@ABCDEFG;H@9@I

Page 15: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

ZAS4MV'BOAMMDLD49

Y $4M6'499I9'9A64'IL'6C4'ZAS4M'BOAMMDLD49*

J , # KCLM

89 : $ 7;<;6

Y c94A6'E4JMd'$CDM'499I9'9A64'DM'59IUA8OSb'I56D7AOd

Y RNM6'IE4'59I8O47eY /KAS;'O46VM'4M6D7A64d

b'59IIL'O4L6'6I'6C4'94AT49

Page 16: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

X1

X2

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o oo

o

o o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

oo

o

oo

o

o

o

o

o

o

o

o

o

o

X1

X2

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o oo

o

o o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

oo

o

oo

o

o

o

o

o

o

o

o

o

o

ZABK'6I'IN9'6IS'4PA75O4

D

oooooo oooooooooooooooooo

ooo

ooo

ooo

oooooooooooooooooo

ooo

ooo

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

ooooo

oooo

oo

oo

oooo

o

oooooooooooooooooo

oooooo

ooo

oooooooooooooooooo

ooo

ooo

ooo

ooo

oooooo

ooo

ooo

ooo

ooooooooooooooooooooo

ooo

oooooooooooooooooo

oooooo

ooo

ooooooooooooooooooooooooooooooooo

oooooooooooooooooo

ooo

oooooooooooooooooo

oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

ooooooooo

oo

oooo

oo

oo

oooo

oooo

ooo

oooooo

oooooo

ooo

Dooo

D

Page 17: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

K-nearest neighbors

• Input: a positive integer K and a test value x0

• Step 1: Identify the K training examples closest to x0(call them N0)

• Step 2: Estimate:

Pr 𝑌 = 𝑗|𝑋 = 𝑥+ ≈1𝑘 P 𝐼 𝑦Q = 𝑗

Q∈TU• Step 3: Assign x0 to whichever class has the highest

(estimated) probability

Page 18: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

[\E4A94M6'E4DFC8I9M'4PA75O4*'['f'?

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o

oooo

ooo

ooo

ooo

o

o

o

o

o

oo

o

o

o

o

o

[00'=4BDMDIE'ZINETA9S

IAJ@

IAJ@

>K$LM@

Page 19: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

[\E4A94M6'E4DFC8I9M'4PA75O4*'['f'3(

X1

X2

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o oo

o

o o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

oo

o

oo

o

o

o

o

o

o

o

o

o

o

[00'=4BDMDIE'ZINETA9S

IAJ@>K$LM@

Page 20: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

KNN vs. Bayes

• Despite being extremely simple, KNN can produce classifiers that are close to optimal (is this surprising?)

• Problem: what’s the right K?

• Question: does it matter?

Page 21: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

#CIIMDEF'6C4'9DFC6'[

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o oo

o

o o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

KNN: K=1 KNN: K=100

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o oo

o

o o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

KNN: K=1 KNN: K=100

1&5 8DAM;'6)<6 UA9DAEB4TE f'(23g>)

6)<6#8DAM;'1&5#UA9DAEB4TE f'(23>:)

Page 22: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

[00'69ADEDEF'UM2'64M6'499I9

0.01 0.02 0.05 0.10 0.20 0.50 1.00

0.0

00

.05

0.1

00

.15

0.2

0

1/K

Erro

r R

ate

Training Errors

Test Errors

0.01 0.02 0.05 0.10 0.20 0.50 1.00

0.0

00

.05

0.1

00

.15

0.2

0

1/K

Erro

r R

ate

Training Errors

Test Errors

.O4PD8DOD6S']1/K^

"99I

9'9A6

4

8DFF49'K M7AOO49'K0.01 0.02 0.05 0.10 0.20 0.50 1.00

.O4PD8DOD6S']1/K^0.01 0.02 0.05 0.10 0.20 0.50 1.000.01 0.02 0.05 0.10 0.20 0.50 1.00

0.0

00

.05

0.1

00

.15

0.2

0

1/K

Erro

r R

ate

Training Errors

Test Errors

0.01 0.02 0.05 0.10 0.20 0.50 1.00

0.0

00.0

50.1

00.1

50.2

0

1/K

Error R

ate

Training Errors

Test Errors

Page 23: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

Lab: K-nearest neighbors

• To do today’s lab in R: class package• To do today’s lab in python: pandas, numpy, sklearn

• Instructions and code:http://www.science.smith.edu/~jcrouser/SDS293/labs/lab3.html

• Full version can be found beginning on p. 163 of ISLR

• Note: we’re going a little out of order, so you may want to stick with the demo code

Page 24: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

=DMBNMMDIE*'[00'IE'`NAE6D6A6DU4'94M5IEM4M

Y !:+3-)&'#F*'DM'6C494'AES'94AMIE'J4'BINOTEV6'NM4'[00'6I'594TDB6'`NAE6D6A6DU4'94M5IEM4M_

Y !:+3-)&'#G" JCA6']DL'AES6CDEF^'JINOT'E44T'6I'BCAEF4_

Y !:+3-)&'#H*'CIJ'TI4M'D6'BI75A94'6I'!&_

Pr j | x0( ) = 1k

I yi = j( )i!N0

" f̂ x0( ) = 1k

y0i!N0

"

Page 25: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

LR vs. KNN

Linear Regression• Parametric (meaning?)-We assume an underlying

functional form for f(X)

• Pros:- Coefficients have simple

interpretations- Easy to do significance

testing• Cons:-Wrong about the functional

form à poor performance

K-Nearest Neighbors• Non-parametric

(meaning?)- No explicit assumptions about

the form for f(X)• Pros:- Doesn’t require knowledge

about the underlying form-More flexible approach

• Cons:- Can accidentally “mask” the

underlying function

Page 26: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

=DMBNMMDIE*'JCDBC'746CIT_

Y !:+3-)&'#F*'JINOT'SIN'4P54B6'!&'6I'IN6549LI97'[00'JC4E'6C4'NET49OSDEF'94OA6DIEMCD5')3#1)'+0._'WCS_- P+3" [00'JIEV6'F46'A'94TNB6DIE'DE'8DAM'AM'D6'DEB94AM4M'DE'UA9DAEB4

Y !:+3-)&'#G"#JCA6'CA554EM'AM'6C4'm'IL'TD74EMDIEM'DEB94AM4M;'8N6'6C4'm'IL'I8M49UA6DIEM'M6ASM'6C4'MA74_- Q%:.3+#&(#*)2+'3)&'01)-;R*'6C4'7I94'TD74EMDIEM'6C494'A94;'6C4'LA96C49'

AJAS'4ABC'I8M49UA6DIEVM'kE4A94M6'E4DFC8I9Ml'BAE'84

Page 27: LECTURE 05: CLASSIFICATION PT. 1 - Smith College · 2017-09-24 · Q&A: confidence vs. prediction intervals Confidence interval We have 95% confidence that the mean of all samples

Coming up

• Wednesday: Logistic regression- Logistic model- Estimating coefficients with maximum likelihood-Multivariate logistic regression-Multiclass logistic regression- Limitations

• A1 due Weds. 9/27 by 11:59pm (submit using Moodle)