Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446:...

41
1/11/2017 1 Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily Fox CSE 446: Machine Learning 2 Make predictions, get $, right?? ©2017 Emily Fox Model Algorithm Fit f Model + algorithm fitted function Predictions decisions outcome

Transcript of Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446:...

Page 1: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

1

CSE 446: Machine Learning

Assessing PerformanceCSE 446: Machine LearningEmily FoxUniversity of WashingtonJanuary 11, 2017

©2017 Emily Fox

CSE 446: Machine Learning2

Make predictions, get $, right??

©2017 Emily Fox

Model

Algorithm

Fit f

Model + algorithm fitted function

Predictions decisions outcome

Page 2: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

2

CSE 446: Machine Learning3

Or, how much am I losing?

Example: Lost $ due to inaccurate listing price- Too low low offers

- Too high few lookers + no/low offers

How much am I losing compared to perfection?

Perfect predictions: Loss = 0

My predictions: Loss = ???

©2017 Emily Fox

CSE 446: Machine Learning4

Measuring loss

Loss function:

L(y,fŵ(x))

Examples: (assuming loss for underpredicting = overpredicting)

Absolute error: L(y,fŵ(x)) = |y-fŵ(x)|

Squared error: L(y,fŵ(x)) = (y-fŵ(x))2

©2017 Emily Fox

actual value

Cost of using ŵat x when y is true

= predicted value ŷf(x)

Page 3: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

3

CSE 446: Machine Learning©2017 Emily Fox

“Remember that all models are wrong; the practical question is how wrong do they have

to be to not be useful.” George Box, 1987.

CSE 446: Machine Learning

Assessing the loss

©2017 Emily Fox

Page 4: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

4

CSE 446: Machine Learning

Assessing the lossPart 1: Training error

©2017 Emily Fox

CSE 446: Machine Learning8

Define training data

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($

)

x

y

Page 5: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

5

CSE 446: Machine Learning9

Define training data

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($

)

x

y

CSE 446: Machine Learning10

Example:Fit quadratic to minimize RSS

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($

)

x

y

ŵminimizesRSS of

training data

Page 6: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

6

CSE 446: Machine Learning11

Compute training error

1. Define a loss function L(y,fŵ(x))- E.g., squared error, absolute error,…

2. Training error

= avg. loss on houses in training set

= L(yi,fŵ(xi))

©2017 Emily Fox

fit using training data

CSE 446: Machine Learning12

Example: Use squared error loss (y-fŵ(x))2

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($

)

x

y

Training error (ŵ) = 1/N *

[($train 1-fŵ(sq.ft.train 1))2

+ ($train 2-fŵ(sq.ft.train 2))2

+ ($train 3-fŵ(sq.ft.train 3))2

+ … include all training houses]

Page 7: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

7

CSE 446: Machine Learning13

Training error vs. model complexity

©2017 Emily Fox

Model complexity

Err

or

square feet (sq.ft.)

pri

ce

($)

x

y

CSE 446: Machine Learning14

Training error vs. model complexity

©2017 Emily Fox

Model complexity

Err

or

square feet (sq.ft.)

pri

ce

($)

x

y

Page 8: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

8

CSE 446: Machine Learning15

Training error vs. model complexity

©2017 Emily Fox

Model complexity

Err

or

square feet (sq.ft.)

pri

ce

($)

x

y

CSE 446: Machine Learning16

Training error vs. model complexity

©2017 Emily Fox

Model complexity

Err

or

square feet (sq.ft.)

pri

ce

($)

x

y

Page 9: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

9

CSE 446: Machine Learning17

Training error vs. model complexity

©2017 Emily Fox

Model complexity

Err

or

x

y

x

y

CSE 446: Machine Learning18

Is training error a good measure of predictive performance?

How do we expect to perform on a new house?

©2017 Emily Fox

square feet (sq.ft.)

pri

ce (

$)

x

y

Page 10: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

10

CSE 446: Machine Learning19

Is training error a good measure of predictive performance?

Is there something particularly bad about having xt sq.ft.??

©2017 Emily Fox

square feet (sq.ft.)

pri

ce (

$)

x

y

xt

CSE 446: Machine Learning20

Is training error a good measure of predictive performance?Issue: Training error is overly optimistic…ŵwas fit to training data

©2017 Emily Fox

square feet (sq.ft.)

pri

ce (

$)

x

y

xt

Small training error ≠> good predictions unless training data includes everything you

might ever see

Page 11: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

11

CSE 446: Machine Learning

Assessing the lossPart 2: Generalization (true) error

©2017 Emily Fox

CSE 446: Machine Learning22

Generalization error

Really want estimate of loss over all possible ( ,$) pairs

©2017 Emily Fox

Lots of housesin neighborhood,but not in dataset

Page 12: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

12

CSE 446: Machine Learning23

Distribution over houses

In our neighborhood, houses of what # sq.ft. ( ) are we likely to see?

©2017 Emily Fox

square feet (sq.ft.)

CSE 446: Machine Learning24

Distribution over sales prices

For houses with a given # sq.ft. ( ), what house prices $are we likely to see?

©2017 Emily Fox

price ($)

For fixed # sq.ft.

Page 13: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

13

CSE 446: Machine Learning25

Generalization error definition

Really want estimate of loss over all possible ( ,$) pairs

Formally:

generalization error = Ex,y[L(y,fŵ(x))]

©2017 Emily Fox

fit using training data

average over all possible(x,y) pairs weighted by

how likely each is

CSE 446: Machine Learning26

Generalization error vs. model complexity

©2017 Emily Fox

Model complexity

Err

or

square feet (sq.ft.)

pri

ce

($)

x

y

Page 14: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

14

CSE 446: Machine Learning27

Generalization error vs. model complexity

©2017 Emily Fox

Model complexity

Err

or

square feet (sq.ft.)

pri

ce

($)

x

y

CSE 446: Machine Learning28

Generalization error vs. model complexity

©2017 Emily Fox

Model complexity

Err

or

square feet (sq.ft.)

pri

ce

($)

x

y fŵ

Page 15: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

15

CSE 446: Machine Learning29

Generalization error vs. model complexity

©2017 Emily Fox

Model complexity

Err

or

square feet (sq.ft.)

pri

ce

($)

x

y fŵ

CSE 446: Machine Learning30

Generalization error vs. model complexity

©2017 Emily Fox

Model complexity

Err

or

square feet (sq.ft.)

pri

ce

($)

x

y

Page 16: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

16

CSE 446: Machine Learning31

Generalization error vs. model complexity

©2017 Emily Fox

Model complexity

Err

or

x

y

x

y

Can’t compute!

CSE 446: Machine Learning

Assessing the lossPart 3: Test error

©2017 Emily Fox

Page 17: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

17

CSE 446: Machine Learning33

Approximating generalization error

Wanted estimate of loss over all possible ( ,$) pairs

©2017 Emily Fox

Approximate by looking athouses not in training set

CSE 446: Machine Learning34

Forming a test set

Hold out some ( ,$) that are not used for fitting the model

©2017 Emily Fox

Training set

Test set

Page 18: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

18

CSE 446: Machine Learning35 ©2017 Emily Fox

Training set

Test set

Proxy for “everything you might see”

Hold out some ( ,$) that are not used for fitting the model

Forming a test set

CSE 446: Machine Learning36

Compute test error

Test error

= avg. loss on houses in test set

= L(yi,fŵ(xi))

©2017 Emily Fox

fit using training data# test points

has never seen test data!

Page 19: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

19

CSE 446: Machine Learning37

Example: As before, fit quadratic to training data

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($

)

x

y

ŵminimizesRSS of

training data

CSE 446: Machine Learning38

Example: As before, use squared error loss (y-fŵ(x))2

©2017 Emily Fox

Test error (ŵ) = 1/Ntest *

[($test 1-fŵ(sq.ft.test 1))2

+ ($test 2-fŵ(sq.ft.test 2))2

+ ($test 3-fŵ(sq.ft.test 3))2

+ … include all test houses]square feet (sq.ft.)

pri

ce

($

)

x

y

Page 20: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

20

CSE 446: Machine Learning39

Training, true, & test error vs. model complexity

©2017 Emily Fox

Model complexity

Err

or

Overfitting if:

x

y

x

y

CSE 446: Machine Learning

Training/test split

©2017 Emily Fox

Page 21: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

21

CSE 446: Machine Learning41

Training/test splits

©2017 Emily Fox

Training set Test set

how many? how many?vs.

CSE 446: Machine Learning42

Training/test splits

©2017 Emily Fox

Too fewŵpoorly estimated

Training set

Test set

Page 22: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

22

CSE 446: Machine Learning43

Too few test error bad approximation of generalization error

Training/test splits

©2017 Emily Fox

Training setTest set

CSE 446: Machine Learning44

Training/test splits

Typically, just enough test points to form a reasonable estimate of generalization error

If this leaves too few for training, other methods like cross validation (will see later…)

©2017 Emily Fox

Training set Test set

Page 23: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

23

CSE 446: Machine Learning

3 sources of error + the bias-variance tradeoff

©2017 Emily Fox

CSE 446: Machine Learning46

3 sources of error

In forming predictions, there are 3 sources of error:

1. Noise

2. Bias

3. Variance

©2017 Emily Fox

Page 24: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

24

CSE 446: Machine Learning47

Data inherently noisy

©2017 Emily Fox

square feet (sq.ft.)

pri

ce (

$)

x

y yi = fw(true)(xi)+εi

Irreducible error

varianceof noise

CSE 446: Machine Learning48

Bias contribution

Assume we fit a constant function

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($)

x

y

square feet (sq.ft.)

pri

ce

($)

x

yfŵ(train1)

fŵ(train2)

N house sales ( ,$)

N other house sales ( ,$)

Page 25: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

25

CSE 446: Machine Learning49

Bias contribution

Over all possible size N training sets,what do I expect my fit to be?

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($)

x

y fw(true)

fw ̄

fŵ(train1)

fŵ(train2)

fŵ(train3)

CSE 446: Machine Learning50

Bias contribution

Bias(x) = fw(true)(x) - fw ̄(x)

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($)

x

y

fw ̄

low complexity

high bias

fw(true)

Is our approach flexible enough to capture fw(true)?If not, error in predictions.

Page 26: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

26

CSE 446: Machine Learning51

Variance contribution

How much do specific fits vary from the expected fit?

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($)

x

y

fw ̄

fŵ(train1)

fŵ(train2)

fŵ(train3)

CSE 446: Machine Learning52

Variance contribution

How much do specific fits vary from the expected fit?

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($)

x

y

fw ̄

fŵ(train1)

fŵ(train2)

fŵ(train3)

Page 27: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

27

CSE 446: Machine Learning53

Variance contribution

How much do specific fits vary from the expected fit?

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($)

x

y

low complexity

low variance

Can specific fits vary widely? If so, erratic predictions

CSE 446: Machine Learning54

Variance of high-complexity models

Assume we fit a high-order polynomial

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($)

x

yfŵ(train1)

fŵ(train2)

y

square feet (sq.ft.)

pri

ce

($)

x

Page 28: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

28

CSE 446: Machine Learning55

Variance of high-complexity models

Assume we fit a high-order polynomial

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($)

x

y

fw ̄

fŵ(train1)

fŵ(train3)

fŵ(train2)

CSE 446: Machine Learning56

Variance of high-complexity models

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($)

x

y

fw ̄

high complexity

high variance

Page 29: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

29

CSE 446: Machine Learning57

Bias of high-complexity models

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($)

x

y

fw ̄

fw(true)

high complexity

low bias

CSE 446: Machine Learning58

Bias-variance tradeoff

©2017 Emily Fox

Model complexity

x

y

x

y

Page 30: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

30

CSE 446: Machine Learning59

Error vs. amount of data

©2017 Emily Fox

# data points in training set

Err

or

CSE 446: Machine Learning

Summary of assessing performance

©2017 Emily Fox

Page 31: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

31

CSE 446: Machine Learning61

What you can do now…

• Describe what a loss function is and give examples

• Contrast training, generalization, and test error

• Compute training and test error given a loss function

• Discuss issue of assessing performance on training set

• Describe tradeoffs in forming training/test splits

• List and interpret the 3 sources of avg. prediction error- Irreducible error, bias, and variance

©2017 Emily Fox

CSE 446: Machine Learning

More in depth on the3 sources of errors…

©2017 Emily Fox

Page 32: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

32

CSE 446: Machine Learning63

Accounting for training set randomness

Training set was just a random sample of N houses sold

What if N other houses had been sold and recorded?

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($)

x

y fŵ(1)

square feet (sq.ft.)

pri

ce

($)

x

y fŵ(2)

CSE 446: Machine Learning64

Accounting for training set randomness

Training set was just a random sample of N houses sold

What if N other houses had been sold and recorded?

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($)

x

y

square feet (sq.ft.)

pri

ce

($)

x

yfŵ(1) fŵ(2)

generalization error ofŵ(1) generalization error ofŵ(2)

Page 33: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

33

CSE 446: Machine Learning65

Accounting for training set randomness

Ideally, want performance averaged over all possible training sets of size N

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($)

x

y

square feet (sq.ft.)

pri

ce

($)

x

yfŵ(1) fŵ(2)

generalization error ofŵ(1) generalization error ofŵ(2)

CSE 446: Machine Learning66

Expected prediction error

Etraining set[generalization error of ŵ(training set)]

©2017 Emily Fox

averaging over all training sets (weighted by how likely each is)

parameters fit on a specific training set

square feet (sq.ft.)

pri

ce

($)

x

y fŵ(training set)

Page 34: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

34

CSE 446: Machine Learning67

Start by considering:

1. Loss at target xt (e.g. 2640 sq.ft.)

2. Squared error loss L(y,fŵ(x)) = (y-fŵ(x))2

Prediction error at target input

©2017 Emily Foxsquare feet (sq.ft.)

pri

ce

($)

x

y fŵ(training set)

xt

CSE 446: Machine Learning68

Sum of 3 sources of error

©2017 Emily Foxsquare feet (sq.ft.)

pri

ce

($)

x

y fŵ(training set)

xt

Average prediction error at xt

= σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt))

Page 35: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

35

CSE 446: Machine Learning69

Error variance of the model

Average prediction error at xt

= σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt))

©2017 Emily Fox

σ2 =“variance”

y = fw(true)(x)+ε

square feet (sq.ft.)

pri

ce

($)

x

y

xt

Irreducible error

CSE 446: Machine Learning70

Bias of function estimator

Average prediction error at xt

= σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt))

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($)

x

y

square feet (sq.ft.)

pri

ce

($)

x

y

fŵ(train1) fŵ(train2)

Page 36: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

36

CSE 446: Machine Learning71

Bias of function estimatorAverage estimated function = fw ̄(x)True function = fw(x)

©2017 Emily Foxsquare feet (sq.ft.)

pri

ce

($)

x

y

fw ̄

fw

xt

Etrain[fŵ(train)(x)]

over all training sets of size N

fŵ(train1)fŵ(train2)

CSE 446: Machine Learning72

Bias of function estimatorAverage estimated function = fw ̄(x)True function = fw(x)

©2017 Emily Foxsquare feet (sq.ft.)

pri

ce

($)

x

y

fw ̄

fw

bias(fŵ(xt)) = fw(xt) - fw ̄(xt)

xt

Page 37: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

37

CSE 446: Machine Learning73

Average prediction error at xt

= σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt))

Bias of function estimator

©2017 Emily Fox

CSE 446: Machine Learning74

Average prediction error at xt

= σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt))

square feet (sq.ft.)

pri

ce (

$)

x

y

Variance of function estimator

©2017 Emily Fox

fw ̄

fŵ(train1)

fŵ(train2)

fŵ(train3)

Page 38: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

38

CSE 446: Machine Learning75

Average prediction error at xt

= σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt))

square feet (sq.ft.)

pri

ce (

$)

x

y

Variance of function estimator

©2017 Emily Fox

fw ̄

xt

CSE 446: Machine Learning76

var(fŵ(xt)) = Etrain[(fŵ(train)(xt)-fw ̄(xt))2]

Variance of function estimator

©2017 Emily Fox

fw ̄

square feet (sq.ft.)

pri

ce (

$)

x

y

xt

over all training sets of size N

deviation of specific fit from expected fit at xt

fit on a specifictraining dataset

what I expect to learn over all training sets

Page 39: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

39

CSE 446: Machine Learning

Why 3 sources of error?A formal derivation

©2017 Emily Fox

CSE 446: Machine Learning78

Deriving expected prediction errorExpected prediction error

= Etrain [generalization error of ŵ(train)]

= Etrain [Ex,y[L(y,fŵ(train)(x))]]

1. Look at specific xt

2. Consider L(y,fŵ(x)) = (y-fŵ(x))2

Expected prediction error at xt

= Etrain, [(yt-fŵ(train)(xt))2]

©2017 Emily Fox

yt

Page 40: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

40

CSE 446: Machine Learning79

Deriving expected prediction error

Expected prediction error at xt

= Etrain, [(yt-fŵ(train)(xt))2]

= Etrain, [((yt-fw(true)(xt)) + (fw(true)(xt)-fŵ(train)(xt)))2]

©2017 Emily Fox

yt

yt

CSE 446: Machine Learning80

Equating MSE with bias and variance

MSE[fŵ(train)(xt)]= Etrain[(fw(true)(xt) – fŵ(train)(xt))2]

= Etrain[((fw(true)(xt)–fw ̄(xt)) + (fw ̄(xt)–fŵ(train)(xt)))2]

©2017 Emily Fox

Page 41: Assessing Performance - University of Washington · 2017-01-30 · Assessing Performance CSE 446: Machine Learning Emily Fox University of Washington January 11, 2017 ©2017 Emily

1/11/2017

41

CSE 446: Machine Learning81

Putting it all together

Expected prediction error at xt

= σ2 + MSE[fŵ(xt)]= σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt))

©2017 Emily Fox

3 sources of error