Model Selection Using AIC and BIC

41
1 Common Model Selection Statistics: AIC and BIC Addictions Research Seminar August 8, 2007

Transcript of Model Selection Using AIC and BIC

Page 1: Model Selection Using AIC and BIC

1

Common Model Selection

Statistics: AIC and BIC

Addictions Research Seminar

August 8, 2007

Page 2: Model Selection Using AIC and BIC

2@ David FarleyA visit from a flying mutant ______.

© D

avid

Fa

rle

y

Page 3: Model Selection Using AIC and BIC

3

Elephant Model

Page 4: Model Selection Using AIC and BIC

4

Objectives

• Know what AIC and BIC do

• Know the role that some statisticians

think AIC and BIC should play

in research

• Be aware of alternatives

• Motivation to look further

Page 5: Model Selection Using AIC and BIC

5

Outline

• Objectives

• Model Selection (MS) Problems

• Commonly Used MS Statistics

– Motivations

– Use

• Alternatives and Recommendations

Page 6: Model Selection Using AIC and BIC

6

Model Selection

• What you do depends on:

– Study Design

– Suite of Collected Variables

– Purpose

– Philosophy on model building

Page 7: Model Selection Using AIC and BIC

7

Model Selection

• Model selection is not model testing

• Psychological model/theory vs.

statistical model

Page 8: Model Selection Using AIC and BIC

8

Research Context

• Null Hypothesis Significance Testing

• Model Testing– Testing Structure

– Parameter Testing

• Exploratory/Model Building– Descriptive motivations

– Predictive utility

– Evidence production

Page 9: Model Selection Using AIC and BIC

9

Model Selection: Approaches

1. Just select the full model only

2. Use stepwise selection, ignore selection uncertainty

3. Use MS statistic, ignore selection uncertainty

4. Use MS statistic & consider uncertainty

5. Do multimodel inference.

6. First reduce predictors and thoughtfully weigh models considering MS statistics

Page 10: Model Selection Using AIC and BIC

10

Although [MS Stats] are helpful exploratory tools, the model-building process should utilize theory and common sense.

Alan Agresti

Model selection is rarely based solely on [MS Stats] but depends also on the purpose of the analysis and subject matter information.

Jouni Kuha

Page 11: Model Selection Using AIC and BIC

11

Model Selection Criteria

• Test of hypotheses (NHST)

• Ad hoc methods

• Optimization of some selection criteria

– Criteria based on MSE, MS prediction error

– Information Criteria

– Consistent estimators of P(true model)

Page 12: Model Selection Using AIC and BIC

12

NHST does not mesh with IC in model

selection

“A very common mistake seen in the applied

literature is to “test” to see whether the best model

is significantly better than the second best model”Anderson & Burnham 2002

Page 13: Model Selection Using AIC and BIC

13

Using Statistics to Help Guide

Model Fit (MF)

• R2

2gof

• MSE

Model Selection (MS)

• AIC

• BIC

• TIC, NIC, EIC, FIC, GIC, SIC,QAIC, Cp, PRESS, CAICF, MDL, HQ, Vapnik-Chernovekis D…

Page 14: Model Selection Using AIC and BIC

14

Page 15: Model Selection Using AIC and BIC

15

Page 16: Model Selection Using AIC and BIC

16

AIC Motivation

• A measure of the predictive

performance of the models

• It is based on information loss

Page 17: Model Selection Using AIC and BIC

17

AIC Motivation

• Based on Kullback-Leibler (K-L) information

loss

• I(f,g) is the information loss due to the use of

a model to approximate reality

• Turns out that you can compare models’

relative information loss without knowing

being able to describe reality exactly

Page 18: Model Selection Using AIC and BIC

18

AIC Motivation

Akaike found that the log-likelihood value of a model was a biased estimate of the relative information loss. The bias was approximately equal to the number of parameters in the model.

relativeE(K - L) = ℓ(q | D) - P

Page 19: Model Selection Using AIC and BIC

19

AIC Functionality

• AIC selects a best model in terms of the

bias/variance trade-off, not a quasi-true

model

• The target model changes with the

sample size.

Page 20: Model Selection Using AIC and BIC

20

• AIC is not consistent. There is always a

possibility it will select models with too

many variables (without finite sample

adjustments).

• AIC is efficient. The expected prediction

error for AIC selected models is the

smallest possible (N being large).

Page 21: Model Selection Using AIC and BIC

21

What AIC Values Mean

AIC values are not interpretable. They

contain arbitrary constants.

Di = AICi - AICmin

4 £ Di £ 7

Di >10

Considerable Less Support

Difficult to Support

Page 22: Model Selection Using AIC and BIC

22

When all the models have very low weights,

there is no inferential credibility for any single

model regarding what are the “important”

predictor variables. It is foolish to think that

the variables included in the best model are

“the” important ones and the excluded are not

important.”

Burnham & Anderson 2002

Page 23: Model Selection Using AIC and BIC

23

AIC = 2[ℓ(q2 ) - ℓ(q1)]- 2(p2 - p1)

BIC = 2[ℓ(q2 ) - ℓ(q1)]- logn(p2 - p1)

Page 24: Model Selection Using AIC and BIC

24

BIC Motivation

“Aim of Bayesian approach is to identify the model with the highest probability of being the true model”

Kuha 2004

“The assumed purpose of the BIC-selected model was often simple prediction: as opposed to scientific understanding of the system under study”

Burnham & Anderson 2002

Page 25: Model Selection Using AIC and BIC

25

BIC Motivation

BF12 = p(D |M2 ) / p(D |M1)

Bayes Factor = evidence in favor

of model 2 over model 1

BIC is an approximation of a transformation of

the Bayes Factor (for a limited set of priors).

Page 26: Model Selection Using AIC and BIC

26

BIC does not always need to be a good

approximation of the Bayes Factor if it is

used mainly to identify which of the of

the models has the highest posterior

probability.

Page 27: Model Selection Using AIC and BIC

27

Justification for BIC

BIC is consistent. It asymptotically

reaches its goal.

Page 28: Model Selection Using AIC and BIC

28

Meaning of BIC Values

pi is the posterior probability that model i

is the true model.

(assuming that that there is a true model

and that it is in your model set)

pi =e(- 1

2DBICi )

r=1

R

å e(- 12DBICR )

Page 29: Model Selection Using AIC and BIC

29

PDA Model

Time MS Stats

LinearQuadratic Cubic HDRS Tx PDA1Attendance AIC dAIC BIC P(T.M.)

1 x x x x x -174 2 -112 0.002428

2 x x x x x x -141 35 -70 1.84E-12

3 x x x x x x -166 10 -100 6.02E-06

4 x x x -151 25 -99 3.65E-06

5 x x x x -176 0 -124 0.979624

6 x x x -158 18 -116 0.017942

Page 30: Model Selection Using AIC and BIC

30

Similarities• Penalized model selection

criteria

• Data must be fixed

• They can be special cases of each other

• Both good at approximating target quantities

• Bayesian or frequentist derivation

• Ambivalence

• Only as good as your data

Differences• BIC is dimension

consistent/AIC approximate

relative information loss

• BIC penalizes complex

models more than AIC

• Definition of “good model”

• Need for a true model

Page 31: Model Selection Using AIC and BIC

31

Burnham and Anderson

Objection to BICWe question the concept of a simple “true

model” in the biological sciences and would

surely think if it existed that it would not be in

the set of candidate models.

There is nothing in the foundation of of BIC that

addresses a bias-variance trade off, and

hence addresses parsimony as a feature of

BIC model selection.

Page 32: Model Selection Using AIC and BIC

32

Other’s Views

For model selection purposes, there is no clear

choice between AIC and BIC.

Kuha 2002

BIC target model doesn’t depend on N, but we

know the number of parameters selected will

so BIC can’t deliver on its objective in practice.

KMC

Page 33: Model Selection Using AIC and BIC

33

“All models are wrong, some models are useful.”

George Box

Any model is just a simplification of reality.

Select a model that is a useful description or powerful predictor.

Page 34: Model Selection Using AIC and BIC

34

Simulation Results

• BIC better than AIC when the true

model is included as a candidate and

often better than AICc

• AIC does better when true model was

not in the set

• These are not universal results

Page 35: Model Selection Using AIC and BIC

35

Simulation ResultsR

ela

tiv

e E

rro

r

Kuha 2004

Page 36: Model Selection Using AIC and BIC

36

Alternative Approaches Exist

• Direct

• Cross-validation

• Use all the models at the same time!

• Report out top contenders

Train Validate Test

Page 37: Model Selection Using AIC and BIC

37

Recommendations

• Establish a philosophy

• Conduct thoughtful model building

• Use MS stats as a guide only

• Use multiple stats simultaneously

Page 38: Model Selection Using AIC and BIC

38

Elephant Model

Page 39: Model Selection Using AIC and BIC

39

Objectives

• Know what AIC and BIC do.

• Know the role that some statisticians

think AIC and BIC should play

in research.

• Be aware of alternatives.

• Motivate to learn more about AIC/BIC

Page 40: Model Selection Using AIC and BIC

40

© D

avid

Fa

rle

y

Page 41: Model Selection Using AIC and BIC

41

Restricted Space and Directed

Selection• Akaike believed that the most important

contribution of his general approach was the

clarification of the importance of modeling

and the need for substantial, prior information

on the system being studied.

• The importance of carefully defining a small

set of candidate models cannot be

overemphasized.(A & B 2002)