CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection &...

30
CHAPTER 2: Supervised Learning

Transcript of CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection &...

Page 1: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

CHAPTER 2:

Supervised Learning

Page 2: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Learning a Class from Examples Class C of a “family car”

Prediction: Is car x a family car?

Knowledge extraction: What do people expect from a family car?

Output:

Positive (+) and negative (–) examples

Input representation:

x1: price, x2 : engine power

2

Page 3: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Training set XNt

tt ,r 1}{ xX

negative is if

positive is if

x

x

0

1r

3

2

1

x

xx

Page 4: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Class C 2121 power engine AND price eepp

4

Page 5: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Hypothesis class H

negative is says if

positive is says if )(

x

xx

h

hh

0

1

N

t

tt rhhE1

1 x)|( X

5

Error of h on H

Page 6: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

S, G, and the Version Space

6

most specific hypothesis, S

most general hypothesis, G

h H, between S and G isconsistent and make up the version space(Mitchell, 1997)

Page 7: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Margin Choose h with largest margin

7

Page 8: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

VC Dimension N points can be labeled in 2N ways as +/–

H shatters N if there

exists h H consistent

for any of these:

VC(H ) = N

8

An axis-aligned rectangle shatters 4 points only !

Page 9: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Probably Approximately Correct (PAC) Learning How many training examples N should we have, such that with probability

at least 1 ‒ δ, h has error at most ε ?

(Blumer et al., 1989)

Each strip is at most ε/4

Pr that we miss a strip 1‒ ε/4

Pr that N instances miss a strip (1 ‒ ε/4)N

Pr that N instances miss 4 strips 4(1 ‒ ε/4)N

4(1 ‒ ε/4)N ≤ δ and (1 ‒ x)≤exp( ‒ x)

4exp(‒ εN/4) ≤ δ and N ≥ (4/ε)log(4/δ)

9

Page 10: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Noise and Model ComplexityUse the simpler one because Simpler to use

(lower computational

complexity)

Easier to train (lower

space complexity)

Easier to explain

(more interpretable)

Generalizes better (lower

variance - Occam’s razor)

10

Page 11: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Multiple Classes, Ci i=1,...,KNt

tt ,r 1}{ xX

, if

if

ijr

jt

it

ti C

C

x

x

0

1

, if

if

ijh

jt

it

ti C

C

x

xx

0

1

11

Train hypotheses hi(x), i =1,...,K:

Page 12: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Regression

01 wxwxg

01

2

2 wxwxwxg

N

t

tt xgrN

gE1

21X|

12

N

t

tt wxwrN

wwE1

2

0101

1X|,

tt

t

N

ttt

xfr

r

rx 1,X

Page 13: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Model Selection & Generalization Learning is an ill-posed problem; data is not sufficient to

find a unique solution

The need for inductive bias, assumptions about H Generalization: How well a model performs on new data

Overfitting: H more complex than C or f

Underfitting: H less complex than C or f

13

Page 14: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Triple Trade-Off There is a trade-off between three factors (Dietterich,

2003):

1. Complexity of H, c (H),

2. Training set size, N,

3. Generalization error, E, on new data

As N,E

As c (H),first Eand then E

14

Page 15: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Cross-Validation To estimate generalization error, we need data unseen

during training. We split the data as

Training set (50%)

Validation set (25%)

Test (publication) set (25%)

Resampling when there is few data

15

Page 16: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Dimensions of a Supervised Learner1. Model:

2. Loss function:

3. Optimization procedure:

|xg

t

tt grLE |,| xX

16

X|min arg*

E

Page 17: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

CHAPTER 3:

Bayesian Decision Theory

Page 18: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Probability and Inference Result of tossing a coin is {Heads,Tails}

Random var X {1,0}

Bernoulli: P {X=1} = poX (1 ‒ po)(1 ‒ X)

Sample: X = {xt }Nt =1

Estimation: po = # {Heads}/#{Tosses} = ∑txt / N

Prediction of next toss:

Heads if po > ½, Tails otherwise

18

Page 19: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Classification Credit scoring: Inputs are income and savings.

Output is low-risk vs high-risk Input: x = [x1,x2]T ,Output: C Î {0,1} Prediction:

otherwise 0

)|()|( if 1 choose

or

otherwise 0

)|( if 1 choose

C

C

C

C

,xxCP ,xxCP

. ,xxCP

2121

21

01

501

19

Page 20: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Bayes’ Rule

x

xx

p

pPP

CCC

| |

110

0011

110

xx

xxx

||

||

CC

CCCC

CC

Pp

PpPpp

PP

20

posterior

likelihoodprior

evidence

Page 21: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Bayes’ Rule: K>2 Classes

K

kkk

ii

iii

CPCp

CPCp

p

CPCpCP

1

|

|

||

x

x

x

xx

xx | max | if choose

and 1

kkii

K

iii

CPCPC

CPCP

10

21

Page 22: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Losses and Risks Actions: αi

Loss of αi when the state is Ck : λik

Expected risk (Duda and Hart, 1973)

xx

xx

|min| if choose

||

kkii

k

K

kiki

RR

CPR

1

22

Page 23: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Losses and Risks: 0/1 Loss

ki

kiik

if

if

1

0

x

x

xx

|

|

||

i

ikk

K

kkiki

CP

CP

CPR

1

1

23

For minimum risk, choose the most probable class

Page 24: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Losses and Risks: Reject

10

1

1

0

otherwise

if

if

,Ki

ki

ik

xxx

xx

|||

||

iik

ki

K

kkK

CPCPR

CPR

1

1

1

otherwise reject

| and || if choose 1xxx ikii CPikCPCPC

24

Page 25: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Discriminant Functions Kigi ,, , 1x xx kkii ggC max if choose

xxx kkii gg max| R

ii

i

i

i

CPCp

CP

R

g

|

|

|

x

x

x

x

25

K decision regions R1,...,RK

Page 26: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

K=2 Classes Dichotomizer (K=2) vs Polychotomizer (K>2)

g(x) = g1(x) – g2(x)

Log odds:

otherwise

if choose

2

1 0

C

gC x

x

x

|

|log

2

1

CP

CP

26

Page 27: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Utility Theory Prob of state k given exidence x: P (Sk|x)

Utility of αi when state is k: Uik

Expected utility:

xx

xx

| max| if Choose

||

jj

ii

kkiki

EUEUα

SPUEU

27

Page 28: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Association Rules Association rule: X Y

People who buy/click/visit/enjoy X are also likely to buy/click/visit/enjoy Y.

A rule implies association, not necessarily causation.

28

Page 29: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Association measures Support (X Y):

Confidence (X Y):

Lift (X Y):

29

customers

and bought whocustomers

#

#,

YXYXP

X

YX

XP

YXPXYP

bought whocustomers

and bought whocustomers

|

#

#

)(

,

)(

)|(

)()(

,

YP

XYP

YPXP

YXP

Page 30: CHAPTER 2: Supervised Learningakfletcher/stat261/Lec2Slides_UnSupBayes.pdfModel Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique

Apriori algorithm (Agrawal et al., 1996) For (X,Y,Z), a 3-item set, to be frequent (have enough

support), (X,Y), (X,Z), and (Y,Z) should be frequent.

If (X,Y) is not frequent, none of its supersets can be frequent.

Once we find the frequent k-item sets, we convert them to rules: X, Y Z, ...

and X Y, Z, ...

30