Lecture 4 Model Formulation and Choice of Functional Forms: Translating Your Ideas into Models.

Lecture 4

Model Formulation and Choice of Functional Forms:

Translating Your Ideas into Models

• Alternate models as multiple working hypotheses.

• Null models

• Choice of functional forms

Topics

Data

Scientific Model* (hypothesis)

Probability Model

Inference

The triangle of statistical inference

*All hypotheses can be expressed as models!

The Scientific Method

“Science is a process for learning about nature in which competing ideas are measured against observations”

Feynman 1965

Scientific Process

• Devise alternative hypotheses.

• Devise experiment(s) with alternative possible outcomes.

• Carry out experiments.

• Recycle procedure. -- Platt 1964 (Strong inference)

But this is time consuming and not very useful for many questions…..

The method of multiple working hypotheses

• “It differs from the simple working hypothesis in that it distributes the effort and divides the affection. “

• “ Bring up into review every rationale explanation of the phenomenon in hand and to develop every tenable hypothesis relative to its nature. “

• “ Some of the hypotheses have already been proposed and used while others are the investigator’s own creations.

• “ An adequate explanation often involves the coordination of several causes. “

• “ When faithfully followed for a sufficient time it develops the habit of parallel or complex thought. “

• “ The power of viewing phenomena analytically and synthetically at the same time appears to be gained . “

---T. C.Chamberlain, 1890. Science 15: 92.

What is the best model to use?

• This is the critical question in making valid inferences from data.

• Careful a priori consideration of alternative models will often require a major change in emphasis among scientists.

• Model specification is more difficult than the application of likelihood techniques.

Formulation of Candidate Models

• Conceptually difficult.

• Subjective.

• Original and innovative.

• Models represent a scientific hypothesis.

Translating your qualitative ideas into a quantitative, algebraic model that can be tested against alternative models…

Where do models come from?

• Scientific literature.

• Results of manipulative experiments.

• Personal experience.

• Scientific debate.

• Natural resource management questions.

• Monitoring programs.

• Judicial hearings.

Are models truth?

• Truth has infinite dimensions

• Sample data are finite

• Models should provide a good approximation to the data

• Larger data sets will support more complex approximations to reality

“..empiricism, like theory, is based on a series of simplifying assumptions…By choosing what to measure and what to

ignore, an empiricist is making as many assumptions as does any theoretician.”

--David Tilman

Model selection is implicit in science

Develop a set of a priori candidate models

• Include a global model that includes all potential relevant effects.

• Test of global model (R-square, goodness of fit tests).

• Develop alternative simpler models.

Assessing alternative models

• How well does the model approximate “truth” relative to its competitors? (high accuracy or low bias).

• How repeatable is the prediction of a model relative to its competitors? (high precision or low variance).

Why do model selection at all?Principle of parsimony

Number of parametersFew Many

Bia

s 2

Variance

Principle of parsimony applied to model selection

• We typically penalize added complexity.

• A more complex model has to exceed a certain threshold of improvement over a simpler model.

• Added complexity usually makes a model more unstable.

• Complex models spread the data too thinly over data.

• Model selection is not about whether something is true or not but about whether we have enough information to characterize it properly.

12)3.0( xey

Reality: Actual data

Example from page 33-34 of Burnham and Anderson

)()( 10 xyE

)()()( 2210 xxyE

A set of candidate models

)()()()()()( 55

44

33

2210 xxxxxyE

UNDERFITTING!!

)()( 10 xyE

Too simple: High bias (low accuracy)

OVERFITTING!!

)()()()()()( 55

44

33

2210 xxxxxyE

Too complicated: High variance (low precision)

REASONABLE FIT

)()()( 2210 xxyE

The compromise: a parsimonious model

Null Models

• Parametric methods advocate testing hypotheses against a null expectation (Ho ).

• Often the null is probably false simply on a priori grounds (e.g., the parameter θ had no effect).

• In likelihood terms this usually means the null model is the one that sets the value of parameter θ equal to 0 or 1.

States of mind of a null hypothesis tester

Practical importance of Statistical significance

observed difference of observed difference

Not significant Significant

Not important

Important

Model Selection Methods

• Adjusted R- square.

• Likelihood Ratio Tests.

• Akaike’s Information Criterion.

We will talk about these topics later…

Choice of Functional Forms

• Model formulation requires the specification of a functional form that formalizes the relationship between the predictive variables and the process we are trying to understand.

• The functional form should clarify the verbal description of the mechanisms driving the process under study.

• Choosing a functional form is a skill that needs to be developed over time.

Choice of Functional Forms:Mechanism vs. phenomenology

• Mechanistic: based on some biological or ecological model.

• Phenomenological: functions that fit the data well or are simple/convenient to use.

Choice of functional forms: What matters?

• Does it represent what happens in your model?

• Does the shape of the function resemble actual data?

• Is the range of data desired delivered by this function?

• Does the function allow for ready variation of the aspects of the question that the researcher wants to explore?

• What happens at either end (as x 0 and x)?

• What happens in the middle?

• Critical points (maxima, minima).

Model Functions Vs. Probability Density Functions

Properties of pdf’s

1

1

0

dx)x(f

)x(f

)x(f

)x(fy

x

Prob(x)

Some useful functions (not necessarily pdf’s!)

• Exponential.

• Weibull.

• Logistic.

• Lognormal.

• Power.

• Generalized Poisson.

• Logarithmic.

Exponential

ba

)ee(aY

aeY

eY

aeY

axbx

x/b

bxa

bx

Exponential: Decline in maximum potential growth as a function of crowding

baNCIemultiplierGrowth

Effe

c t o

n g r

owth

(G

row

th m

u ltip

l i er)

0

1

Species ASpecies B

NCI (Neighborhood Crowding Index)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0 10 20 30 40 50 60 70 80 90 100

Percent full light

Lo

g R

adia

l G

row

th (

mm

)Michaelis-Menten function

Light)sa(

Light*aLogGrowth

a = 1.43 s = 0.76

a = 1.63 s = 0.31

The exponential is a special case of the Weibull function (β=0):

parameter scalethe is

parameter shapethe is

zzf

z

exp

1)(

1

Weibull function

Weibull Example: Dispersal functions

SEEDLING DISPERSION - PARTIAL CANOPY SITES

0.00

0.50

1.00

1.50

2.00

2.50

0 5 10 15 20 25 30 35 40 45 50

DISTANCE FROM PARENT TREE (m)

SE

ED

LIN

G D

EN

SIT

Y (

#/m

2)

HW

CW

SX

PL

BL

BA

CT

AT

EP

cetanDis*De

n

DBHSTRnsitySeedlingDe

1

100

3

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

X

Y

a=-2,b=1

a=2,b=-1

Logistic

)bxa(

)bxa(

e

eY

1

Logistic: Probability of mortality as a function of storm severity

)DBHbSa(

)DBHbSa(

c

c

e

eMortality.Prob

1

Storm Mortality Functions (40 cm DBH)

0.0

0.2

0.4

0.6

0.8

1.0

1.2

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Storm Severity

Pro

ba

bili

ty o

f Mo

rta

lity

ACRU

ACSA

BELU

FAGR

PIRU

PRSE

TSCA

Canham et al. 2001

Lognormal

2

c

)b/x(lna

expY

Lognormal: Leaf litterfall as a function of distance to the parent

tree

0

10

20

30

40

50

60

70

80

0 5 10 15 20 25

Distance from a 30 cm DBH Tree (m)

Le

af

Lit

terf

all

(g/m

2)

ACRU

ACSA

FAGR

FRAM

QURU

TSCA

2

21

1

30

Xb

Xo/cetanDisln

expDBH

STRLitterfall

Data from GMF, CT

DBH (cm)

0 20 40 60 80 100 120 140

0

5

10

15

20

25

30

35

CASARBDACEXCMANBIDINGLAUSLOBERCECSCHTABHETGUAGUIALCLATSCHMORBUCTET

Lognormal: Growth as a function of DBH

2)/ln(

2

1

*..

0

bX

XDBH

EXPGrowthMaxGrowthPot

Ma x

. P

o te n

t ial G

row

th (

cm/y

r)

Data from LFDP, Puerto Rico

Power function: small mammal distribution as a function of canopy tree

neighborhood2)cx(abY

Schnurr et al. 2004.

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

-5 -4 -3 -2 -1 0 1 2 3 4 5

mice

voles

chipmunks

Canopy Tree Neighborhood Ordination Axis

1995(year after mast)

Maple Oak

Parameter trade-offs: More than one way to get there….

baNCIemultiplierGrowth

Effe

ct o

n g

row

th (

Gro

wth

mul

tiplie

r)

0

1

Species ASpecies B

NCI (Neighborhood Crowding Index)

Trade-off?

Things to keep in mind

• Scaling issues: Pay attention to units, scales, and conversions.

• Multiplicative functions and parameter tradeoff.

• Computational issues • Large exponent values

• Division by zero

• Logs of negative numbers

• Catalog of curves for curve fitting. British Columbia Ministry of Forests.

• Abramowitz, M. and I. Stegun. 1965. Handbook of Mathematical Functions.

• McGill, B. 2003. “Strong and weak tests of macroecological theory”. Oikos.

• VanClay, J. 1995. “Growth models for tropical forests: a synthesis of models and methods”. Forest Science.

Some useful references

Lecture 4 Model Formulation and Choice of Functional Forms: Translating Your Ideas into Models.

Documents

Transcript of Lecture 4 Model Formulation and Choice of Functional Forms: Translating Your Ideas into Models.