Lecture 4 Model Formulation and Choice of Functional Forms: Translating Your Ideas into Models.
-
Upload
patience-mckenzie -
Category
Documents
-
view
216 -
download
1
Transcript of Lecture 4 Model Formulation and Choice of Functional Forms: Translating Your Ideas into Models.
Lecture 4
Model Formulation and Choice of Functional Forms:
Translating Your Ideas into Models
• Alternate models as multiple working hypotheses.
• Null models
• Choice of functional forms
Topics
Data
Scientific Model* (hypothesis)
Probability Model
Inference
The triangle of statistical inference
*All hypotheses can be expressed as models!
The Scientific Method
“Science is a process for learning about nature in which competing ideas are measured against observations”
Feynman 1965
Scientific Process
• Devise alternative hypotheses.
• Devise experiment(s) with alternative possible outcomes.
• Carry out experiments.
• Recycle procedure. -- Platt 1964 (Strong inference)
But this is time consuming and not very useful for many questions…..
The method of multiple working hypotheses
• “It differs from the simple working hypothesis in that it distributes the effort and divides the affection. “
• “ Bring up into review every rationale explanation of the phenomenon in hand and to develop every tenable hypothesis relative to its nature. “
• “ Some of the hypotheses have already been proposed and used while others are the investigator’s own creations.
• “ An adequate explanation often involves the coordination of several causes. “
• “ When faithfully followed for a sufficient time it develops the habit of parallel or complex thought. “
• “ The power of viewing phenomena analytically and synthetically at the same time appears to be gained . “
---T. C.Chamberlain, 1890. Science 15: 92.
What is the best model to use?
• This is the critical question in making valid inferences from data.
• Careful a priori consideration of alternative models will often require a major change in emphasis among scientists.
• Model specification is more difficult than the application of likelihood techniques.
Formulation of Candidate Models
• Conceptually difficult.
• Subjective.
• Original and innovative.
• Models represent a scientific hypothesis.
Translating your qualitative ideas into a quantitative, algebraic model that can be tested against alternative models…
Where do models come from?
• Scientific literature.
• Results of manipulative experiments.
• Personal experience.
• Scientific debate.
• Natural resource management questions.
• Monitoring programs.
• Judicial hearings.
Are models truth?
• Truth has infinite dimensions
• Sample data are finite
• Models should provide a good approximation to the data
• Larger data sets will support more complex approximations to reality
“..empiricism, like theory, is based on a series of simplifying assumptions…By choosing what to measure and what to
ignore, an empiricist is making as many assumptions as does any theoretician.”
--David Tilman
Model selection is implicit in science
Develop a set of a priori candidate models
• Include a global model that includes all potential relevant effects.
• Test of global model (R-square, goodness of fit tests).
• Develop alternative simpler models.
Assessing alternative models
• How well does the model approximate “truth” relative to its competitors? (high accuracy or low bias).
• How repeatable is the prediction of a model relative to its competitors? (high precision or low variance).
Why do model selection at all?Principle of parsimony
Number of parametersFew Many
Bia
s 2
Variance
Principle of parsimony applied to model selection
• We typically penalize added complexity.
• A more complex model has to exceed a certain threshold of improvement over a simpler model.
• Added complexity usually makes a model more unstable.
• Complex models spread the data too thinly over data.
• Model selection is not about whether something is true or not but about whether we have enough information to characterize it properly.
12)3.0( xey
Reality: Actual data
Example from page 33-34 of Burnham and Anderson
)()( 10 xyE
)()()( 2210 xxyE
A set of candidate models
)()()()()()( 55
44
33
2210 xxxxxyE
UNDERFITTING!!
)()( 10 xyE
Too simple: High bias (low accuracy)
OVERFITTING!!
)()()()()()( 55
44
33
2210 xxxxxyE
Too complicated: High variance (low precision)
REASONABLE FIT
)()()( 2210 xxyE
The compromise: a parsimonious model
Null Models
• Parametric methods advocate testing hypotheses against a null expectation (Ho ).
• Often the null is probably false simply on a priori grounds (e.g., the parameter θ had no effect).
• In likelihood terms this usually means the null model is the one that sets the value of parameter θ equal to 0 or 1.
States of mind of a null hypothesis tester
Practical importance of Statistical significance
observed difference of observed difference
Not significant Significant
Not important
Important
Model Selection Methods
• Adjusted R- square.
• Likelihood Ratio Tests.
• Akaike’s Information Criterion.
We will talk about these topics later…
Choice of Functional Forms
• Model formulation requires the specification of a functional form that formalizes the relationship between the predictive variables and the process we are trying to understand.
• The functional form should clarify the verbal description of the mechanisms driving the process under study.
• Choosing a functional form is a skill that needs to be developed over time.
Choice of Functional Forms:Mechanism vs. phenomenology
• Mechanistic: based on some biological or ecological model.
• Phenomenological: functions that fit the data well or are simple/convenient to use.
Choice of functional forms: What matters?
• Does it represent what happens in your model?
• Does the shape of the function resemble actual data?
• Is the range of data desired delivered by this function?
• Does the function allow for ready variation of the aspects of the question that the researcher wants to explore?
• What happens at either end (as x 0 and x)?
• What happens in the middle?
• Critical points (maxima, minima).
Model Functions Vs. Probability Density Functions
Properties of pdf’s
1
1
0
dx)x(f
)x(f
)x(f
)x(fy
x
Prob(x)
Some useful functions (not necessarily pdf’s!)
• Exponential.
• Weibull.
• Logistic.
• Lognormal.
• Power.
• Generalized Poisson.
• Logarithmic.
Exponential
ba
)ee(aY
aeY
eY
aeY
axbx
x/b
bxa
bx
Exponential: Decline in maximum potential growth as a function of crowding
baNCIemultiplierGrowth
Effe
c t o
n g r
owth
(G
row
th m
u ltip
l i er)
0
1
Species ASpecies B
NCI (Neighborhood Crowding Index)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0 10 20 30 40 50 60 70 80 90 100
Percent full light
Lo
g R
adia
l G
row
th (
mm
)Michaelis-Menten function
Light)sa(
Light*aLogGrowth
a = 1.43 s = 0.76
a = 1.63 s = 0.31
The exponential is a special case of the Weibull function (β=0):
parameter scalethe is
parameter shapethe is
zzf
z
exp
1)(
1
Weibull function
Weibull Example: Dispersal functions
SEEDLING DISPERSION - PARTIAL CANOPY SITES
0.00
0.50
1.00
1.50
2.00
2.50
0 5 10 15 20 25 30 35 40 45 50
DISTANCE FROM PARENT TREE (m)
SE
ED
LIN
G D
EN
SIT
Y (
#/m
2)
HW
CW
SX
PL
BL
BA
CT
AT
EP
cetanDis*De
n
DBHSTRnsitySeedlingDe
1
100
3
0
0.2
0.4
0.6
0.8
1
0 2 4 6 8 10
X
Y
a=-2,b=1
a=2,b=-1
Logistic
)bxa(
)bxa(
e
eY
1
Logistic: Probability of mortality as a function of storm severity
)DBHbSa(
)DBHbSa(
c
c
e
eMortality.Prob
1
Storm Mortality Functions (40 cm DBH)
0.0
0.2
0.4
0.6
0.8
1.0
1.2
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Storm Severity
Pro
ba
bili
ty o
f Mo
rta
lity
ACRU
ACSA
BELU
FAGR
PIRU
PRSE
TSCA
Canham et al. 2001
Lognormal
2
c
)b/x(lna
expY
Lognormal: Leaf litterfall as a function of distance to the parent
tree
0
10
20
30
40
50
60
70
80
0 5 10 15 20 25
Distance from a 30 cm DBH Tree (m)
Le
af
Lit
terf
all
(g/m
2)
ACRU
ACSA
FAGR
FRAM
QURU
TSCA
2
21
1
30
Xb
Xo/cetanDisln
expDBH
STRLitterfall
Data from GMF, CT
DBH (cm)
0 20 40 60 80 100 120 140
0
5
10
15
20
25
30
35
CASARBDACEXCMANBIDINGLAUSLOBERCECSCHTABHETGUAGUIALCLATSCHMORBUCTET
Lognormal: Growth as a function of DBH
2)/ln(
2
1
*..
0
bX
XDBH
EXPGrowthMaxGrowthPot
Ma x
. P
o te n
t ial G
row
th (
cm/y
r)
Data from LFDP, Puerto Rico
Power function: small mammal distribution as a function of canopy tree
neighborhood2)cx(abY
Schnurr et al. 2004.
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
-5 -4 -3 -2 -1 0 1 2 3 4 5
mice
voles
chipmunks
Canopy Tree Neighborhood Ordination Axis
1995(year after mast)
Maple Oak
Parameter trade-offs: More than one way to get there….
baNCIemultiplierGrowth
Effe
ct o
n g
row
th (
Gro
wth
mul
tiplie
r)
0
1
Species ASpecies B
NCI (Neighborhood Crowding Index)
Trade-off?
Things to keep in mind
• Scaling issues: Pay attention to units, scales, and conversions.
• Multiplicative functions and parameter tradeoff.
• Computational issues • Large exponent values
• Division by zero
• Logs of negative numbers
• Catalog of curves for curve fitting. British Columbia Ministry of Forests.
• Abramowitz, M. and I. Stegun. 1965. Handbook of Mathematical Functions.
• McGill, B. 2003. “Strong and weak tests of macroecological theory”. Oikos.
• VanClay, J. 1995. “Growth models for tropical forests: a synthesis of models and methods”. Forest Science.
Some useful references