Fall 2015. Statistical Models For Crash Data Modeling Process Determine Modeling Objectives...
-
Upload
karin-marsh -
Category
Documents
-
view
216 -
download
1
Transcript of Fall 2015. Statistical Models For Crash Data Modeling Process Determine Modeling Objectives...
Predictive Methods and Development of
Statistical Models– Part IIFall 2015
Statistical Models For Crash DataModeling Process
Determine Modeling Objectives• Definition (Intersections, Pedestrians,
etc.)• Data availability• Unit Scales (Crashes/year; Severity; etc.)
Establish Appropriate Process• Sampling Models• Observational Models• Process/System State Models• Parameter Models (Bayesian Models
Only)
Statistical Models For Crash DataModeling Process
Determine Inferential Goals• Point estimate (Value + Standard Error)• Distribution (Bayesian Models)• Percentiles (2.5%, 85%, etc.; Bayesian
Models)
Select Computation Techniques• Frequentist (MLE)• Bayesian (via simulation)• Empirical Bayes
Evaluate Models• Goodness-of-Fit• Prediction• Confidence Intervals
Data/Methodological Issue Associated ProblemsOverdispersion Can violate some the basic count-data modeling assumptions of some
modeling approaches
Underdispersion As with overdispersion, can violate some the basic count-data modeling assumptions of some modeling approaches
Time-varying explanatory variables Averaging of variables over studied time intervals ignores potentially important variations within time intervals – which can result in erroneous parameter estimates
Temporal and spatial correlation Correlation over time and space causes losses in estimation efficiencyLow sample mean and small sample size
Causes an excess number of observations where zero crashes are observed which can cause errors in parameter estimates
Injury severity and crash type correlation
Correlation between severities and crash types causes losses in estimation efficiency when separate severity-count models are estimated
Under reporting Under reporting can distort model predictions and lead to erroneous inferences with regard to the influence of explanatory variables
Omitted variables bias If significant variables are omitted from the model, parameter estimates will be biased and possibly erroneous inferences with regard to the influence of explanatory variables will result
Endogenous variables If endogenous variables are included without appropriate statistical corrections parameter estimates will be biased and erroneous inferences with regard to the influence of explanatory variables may be drawn
Functional form If incorrect functional for is used, the result will be biased parameter estimates and possibly erroneous inferences with regard to the influence of explanatory variables
Fixed parameters If parameters are estimated as fixed when they actually vary across observations, the result will be biased parameter estimates and possibly erroneous inferences with regard to the influence of explanatory variables
Statistical Models For Crash DataData and Methodological Issues Associated with Crash-
Frequency Data
Statistical Models For Crash DataSummary of Existing Models for Analyzing Crash-Frequency
DataModel Type Advantages DisadvantagesPoisson Most basic model; easy to estimate Cannot handle over- and under-
dispersion; negatively influenced by the low sample mean and small sample size bias
Negative binomial/Poisson-gamma
Easy to estimate can account for overdispersion
Cannot handle under-dispersion; can be adversely influenced by the low sample mean and small sample size bias
Poisson-lognormal More flexible than the Poisson-gamma to handle over-dispersion
Cannot handle under-dispersion; can be adversely influenced by the low sample mean and small sample size bias (less than the Poisson-gamma); cannot estimate a varying dispersion parameter
Zero-inflated Poisson and negative binomial
Handles datasets that have a large number of zero-crash observations
Can create theoretical inconsistencies; zero-inflated negative binomial can be adversely influenced by the low sample mean and small sample size bias
Conway-Maxwell-Poisson
Can handle under- and over-dispersion or combination of both using a variable dispersion (scaling) parameter
Could be negatively influenced by the low sample mean and small sample size bias; no multivariate extensions available to date
Gamma Can handle under-dispersed data Truncated distribution (full gamma function); independence of data (incomplete gamma function)
Generalized estimating equation models
Can handle temporal correlation May need to determine or evaluate the type of temporal correlation a priori; results sensitive to missing values
Generalized additive models
More flexible than the traditional generalized estimating equation models; allows non-linear variable interactions
Relatively complex to implement; may not be easily transferable to other datasets
Statistical Models For Crash DataSummary of Existing Models for Analyzing Crash-Frequency
DataModel Type Advantages Disadvantages
Random-effects models Handles temporal and spatial correlation May not be easily transferable to other datasets
Negative multinomial Can account for overdispersion and serial correlation; panel count data.
Cannot handle under-dispersion; can be adversely influenced by the low sample mean and small sample size bias
Random-parameters models
More flexible than the traditional fixed parameter models in accounting for unobserved heterogeneity
Complex estimation process; may not be easily transferable to other datasets
Bivariate/multivariate models
Can model different crash types simultaneously; more flexible functional form than the generalized estimating equation models (can use non-linear functions)
Complex estimation process; requires formulation of correlation matrix
Finite mixture/Markov Switching
Can be used for analyzing sources of dispersion in the data
Complex estimation process; may not be easily transferable to other datasets
Duration models By considering the time between crashes (as opposed to crash frequency directly), allows for a very in-depth analysis of data and duration effects
Requires more detailed data than traditional crash frequency models; time-varying explanatory variables are difficult to handle
Hierarchical/Multilevel Models
Can handle temporal, spatial and other correlations among groups of observations
May not be easily transferable to other datasets; correlation results can be difficult to interpret
Neural Network, Bayesian Neural Network, and support vector machine
Non parametric approach does not require an assumption about distribution of data; flexible functional form; usually provides better statistical fit than traditional parametric models
Complex estimation process; may not be transferable to other datasets; work as black-boxes; may not have interpretable parameters
Review of Multivariate Linear ModelsOrdinary Least Square
Method:
0 1 1 2 2
01
i i i k ik i
k
i j ij ij
y x x x
y x
This is an estimation technique that is used for estimating unknown coefficients. It consists of solving p = k + 1 simultaneously linear equations and by minimizing the sum of square errors.
Let
Note: E(ε) = 0 and var(ε) = σ2
Review of Multivariate Linear Models
2
1
2
01 1
n
i
n k
i j iji j
S
S y b b x
The least square function S is given by
The S function is to be minimized with respect to β1, β2, …, βk. The least square estimators, say b0, b1, …, bk, must satisfy
0 1,
0 1,
, , 01 10
, , 01 1
| 2 0
| 2 ( ) 0
k
k
n k
b b b i j iji i
n k
b b b i j ij iji jj
Sy b x
Sy b b x x
j = 1, 2, …, k
Review of Multivariate Linear ModelsIt is easier to solve the equations by using a
matrix format. The equations can be written the following way:
1
2
n
y
y
y
y
y Xβwhere
11 12 1
21 22 2
1 2
1
1
1
k
k
n n nk
x x x
x x x
x x x
X
1
2
n
β
1
2
n
ε
Review of Multivariate Linear ModelsNeed to find the least square estimator b that
minimizes2
1
( ) ( ) ( )n
ii
S
y Xβ y Xβ
It can be shown that S(β) can be expressed this way
The least square estimator* must satisfy
( ) 2S y y β X y β XXβ
| 2 2 0bS
X y XXb
which simplifies to
XXb X y 1( ) X yb XX* b is called the ordinary least squares estimator of β.
Review of Multivariate Linear ModelsMaximum Likelihood
Method:The likelihood function is found from the joint probability distribution of the observations. Given the assumption that the distribution of errors is normally distributed and the variance σ2 is constant, the likelihood function is the following (normal distribution)
2
1( ) ( )
2 22 / 2
1( , , )
(2 )ne
y Xβ y Xβy β
Same model as before: Y Xβ
Review of Multivariate Linear ModelsThe maximum likelihood estimators are the values
of the parameters β and σ2 that maximize the likelihood function. Maximizing the likelihood is equivalent to maximizing the log-likelihood, . The log-likelihood is:
2 22
1ln[ ( , , )] ln(2 ) ln( ) ( ) ( )
2 2 2
n n
y β y Xβ y Xβ
ln( )
The derivative of the log-likelihood function is called the score function. Taking the derivatives with respect to the coefficients β and equating to zero yields
2
2
ln( ) 1( 2 )
2
1( )
2
0
0
b X y b XXββ
X y Xb
X yb = XX
Review of Multivariate Linear ModelsTaking the partial derivative with respect to
gives
2 2 4
ln( ) 1 1( )
2 2) ( 0
y Xb y Xb
2 1( )) (n
y Xb y Xb
Which is
2
Generalized Linear Models
In the previous overheads, it was obvious how the normal distribution played an important role in estimating the coefficients and inferences of probabilistic models. Unfortunately, there are many practical situations where the normal assumption is not valid. Count data, binary response (0 or 1) or other continuous variables with positive and high-skewed distribution cannot be modeled with a normally distributed errors.
The generalized linear model (GLM) was developed to allow fitting regression models for univariate response data that follows a very general distribution called exponential family. This family includes the normal, binomial, negative binomial, geometric, gamma, etc.
Statistical Models For Crash DataPoisson-gamma Model (NB)
( | )
!
ii
i ii
ef y
y
The crash count (or any count) follows a Poisson distribution:
The mean of yi, conditional on μi, is Poisson with the conditional mean and variance given by
1
0( | )
!
i i
iii i i i
i
ef y e d
y
Statistical Models For Crash DataPoisson-gamma Model (NB)
( )( )
( 1) ( )
iy
i ii
i i i
y uf y
y u
The PDF of the Poisson-gamma regression for yi is
The mean and variance are given by
( )i iE y u2
( ) ii iVar y u
The mean function is given by
( ) exp( )i i iE y u x β
2( )i i iVar y u or
Statistical Models For Crash DataPoisson-gamma Model
Example – Crash Data at 3-legged signalized intersections:
0 1 2maj majF Fe
Expected number of crashes
Where,
majF Major traffic flow
1 20 minmajF F
Functional form:Functional form needed to model crash data:
minF Minor traffic flow
Need to take the natural log of the flow variables
Statistical Models For Crash DataPoisson-gamma Model
The GENMOD Procedure
Model Information
Data Set WORK.C Distribution Negative Binomial Link Function Log Dependent Variable Total Total
Number of Observations Read 255 Number of Observations Used 255
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF Deviance 252 288.8580 1.1463 Scaled Deviance 252 288.8580 1.1463 Pearson Chi-Square 252 312.6975 1.2409 Scaled Pearson X2 252 312.6975 1.2409 Log Likelihood 836.0686 Full Log Likelihood -606.7989 AIC (smaller is better) 1221.5978 AICC (smaller is better) 1221.7578 BIC (smaller is better) 1235.7628
Algorithm converged.
Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq Intercept 1 -10.0648 1.3659 -12.7420 -7.3876 54.29 <.0001 logf_maj 1 0.7517 0.1320 0.4929 1.0105 32.41 <.0001 logf_min 1 0.4837 0.0562 0.3735 0.5939 74.01 <.0001 Dispersion 1 0.3153 0.0519 0.2135 0.4170
NOTE: The negative binomial dispersion parameter was estimated by maximum likelihood.
10.113 0.740 0.505min
0.740 0.505min4.05 05
maj
maj
e F F
E F F
2( ) 0.313Var y
Statistical Models For Crash DataStatistical fit (Goodness of fit)
There are various methods for estimating the statistical fit of models. The methods cane be divided into two categories:
Likelihood Statistics
• Log-Likelihood
• Deviance
• Pearson Chi-Square
• Akaike’s Information Criterion (AIC)
• Bayesian Information Criterion (BIC)
Model Errors
• Mean Absolute Deviance
• Mean Squared Prediction Errors
Log-likelihood
Statistical Models For Crash Data
Poisson:
NB:
1
11
1
ln , ln ln 1 ln ln 1 ln1
ni
i i i ii i
L y y y
1
ln ln ln !n
i i i ii
L y y
expi i x β
Where:
Log-likelihood
Statistical Models For Crash Data
Poisson:
NB:
-685.34-606.80
Example – Crash Data at 3-legged signalized intersections:
Statistical Models For Crash DataStatistical fit (Goodness of fit)
The deviance statistic is defined as twice the difference between the maximum log-likelihood achievable (y=μ) and the log-likelihood of the fitted model:
ˆ ˆ( | ) 2{ ( ) ( )}D y μ y μ
When competitive models are compared, the model with the lowest deviance offers the best statistical fit. A note of caution: this is only valid when the dispersion parameter Φ is the same for each competitive model.
Statistical Models For Crash DataStatistical fit (Goodness of fit)
The deviance statistic for the Poisson model is the following:
1
ˆ2 ln ( )ˆ
ni
P i i ii i
yD y y
The deviance statistic for the Poisson-gamma model is the following:
11
11
2 ln ( ) lnˆ ˆ
ni i
NB i ii i i
y yD y y
Statistical Models For Crash DataStatistical fit (Goodness of fit)
The deviance statistic for the Poisson model is the following:
644.4PD
The deviance statistic for the Poisson-gamma model is the following:
288.9NBD
Statistical fit (Goodness of fit)
Statistical Models For Crash Data
AIC:
BIC:
2ln 2AIC L P
P = estimated coefficients + 1
n = number of observations
2ln ln( )BIC L P n
AIC and BIC penalize the fit when additional variables are added to the model.
AIC and BIC
Statistical Models For Crash Data
AIC:
BIC:
2 685.3 2 3 1,376.7PAIC
2 685.3 3 ln(255) 1,387.2PBIC
AIC and BIC penalize the fit when additional variables are added to the model.
2 606.8 2 4 1,221.6NBAIC
2 606.8 4 ln(255) 1,235.8NBBIC
Statistical fit (Model Errors)
Statistical Models For Crash Data
1
1ˆ
n
i ii
MAD yn
21
1ˆ
n
i ii
MPSE yn
Mean Absolute Deviation (MAD)
This criterion has been proposed by Oh et al. (2003) to evaluate the fit of models. The Mean Absolute Deviance (MAD) calculates the absolute difference between the estimated and observed values
Mean Squared Prediction Error (MSPE)
The Man Squared Prediction Error (MSPE) is a traditional indicator of error and calculates the difference between the estimated and observed values squared.
Recent Models for Over-dispersion:◦ Poisson-lognormal Poisson mean follows a lognormal distribution
◦ Poisson-Weibull Poisson mean follows a Weibull distribution
◦ Random-Parameters (investigation of the variance)◦ Negative Binomial-Lindley (highly dispersed data) Overcome problems with zero-inflated models.
◦ Generalized Sichel (highly dispersed data)◦ Generalized Waring (highly dispersed data – investigation of
variance)◦ Finite mixture (Poisson and Poisson-gamma – investigation
of variance and structure of data)◦ Bayesian Model Averaging (automatically compare
different models)◦ See AA&P and Safety Science for info on some of these
models.
Statistical Models For Crash Data
Recent Models for Under-dispersion:◦ Not very common; usually with low sample mean and
often based on model output (conditional on the mean).◦ All the models below can be also used for over-dispersion◦ Gamma time-dependent Observations not independent.
◦ Conway-Maxwell-Poisson Has become increasingly popular
◦ Double-Poisson Work published
◦ Hyper-Poisson Work published
Statistical Models For Crash Data
Crash data have often the characteristics that the mean μ can be very low (below 1.0)
Create problems with goodness-of-fit and prediction
Read papers by ◦ Wood, G.R. (2004) Generalised Linear Models and
Goodness of Fit Testing. Accident Analysis & Prevention, Vol. 34, pp. 417-427.
◦ Lord, D. (2006) Modeling Motor Vehicle Crashes using Poisson-gamma Models: Examining the Effects of Low Sample Mean Values and Small Sample Size on the Estimation of the Fixed Dispersion Parameter. Accident Analysis & Prevention, Vol. 38, No. 4, pp. 751-766.
Statistical Models For Crash Data
Statistical Models For Crash DataLow Mean Issue
Statistical Models For Crash DataTime Trend Effects
0
0.5
1
1.5
2
2.5
0 1 2 3 4 5 6 7
Year
Me
an
(c
ras
he
s p
er
ye
ar)
Statistical Models For Crash DataTime Trend Effects
Goal: capture changes that vary from year to year directly into the model.The model structure is given by the following:
0, 1
p
it it ji jjy x
Time Trend captured with the intercept (i.e., one intercept for each year)
Characteristic: each year is defined as a different observation. Issues: Since each site is observed at a different point in time, a temporal serial correlation exits and affects the statistical inferences of statistical models. Therefore, you need to account for this correlation into the model. Modeling approach: Generalized Estimating Equations (GEE); Random-Effects models, etc.
Bayes Methods The Bayes method approaches the
analysis of data differently than the classical method (frequentist)
Subjective judgment more easily incorporated with the observed data and models
Treat unknown coefficients of regression models as random variables
Data analysis less limited by the number of observations (can be supplemented with subjective judgment)
Computationally intensive (no longer an issue)
Bayes Methods The Bayes method makes inferences from
data using probability models for quantities that are observed and for quantities one is interested to learn about
Bayesian data analysis can be divided into three steps:◦ Setting up a full probability model: provide a joint
probability distribution for all observable and unobservable quantities
◦ Conditioning on observed data: calculating and interpreting the appropriate posterior distribution (conditional probability distribution)
◦ Evaluating the fit of the model and implication of the posterior distribution
Emphasis placed on interval estimation (confidence interval) rather than hypothesis testing
For the EB method, a different weight is assigned to the prior distribution and standard estimate respectively
In safety analyses, the weights are estimated with the assumption that the mean () for each site follows a Gamma distribution
The EB estimates has been found to outperform other estimates, such as the MLE
The EB framework is presented on next overhead
Empirical Bayes Model
Formulation:
Empirical Bayes Model
ˆ̂ ˆ (1 )y
1ˆ
1
1
where
Dispersion parameter of NB regression
Mean of a Poisson-gamma regression
Using the same example shown earlier:
Empirical Bayes Model
ˆ̂ 0.39 3.9 (1 0.39)10 7.63 1
0.393.90
12.46
0.816 0.3732ˆ 5.5 5 24,164 2,560
ˆ 3.9
u E
F1 = 24,164; F2 = 3,392; y=10
The values are estimated as follows
Crashes per year
Crashes per year
Empirical Bayes ModelC
rash
es
per
Year
Year1 2 t
MLE estimate 3.9
EB estimate 7.63
Observed value 10