Datasets and Analysis for Analysis of Crash-Related … and Analysis for Analysis of Crash-Related...
Transcript of Datasets and Analysis for Analysis of Crash-Related … and Analysis for Analysis of Crash-Related...
Datasets and Analysis for Analysis of
Crash-Related Injury
Center for the Management of Information for Safe and
Sustainable Transportation (CMISST)
Jonathan D. Rupp, Ph.D.
Overview of Talk
1. Crash data for injury analysis
2. Available Datasets
a. Contents
b. What they are good for
c. Analysis tips and pitfalls
d. Sample analyses
Crash Data for Injury Analysis
Crash data come in many flavors, but all
crash datasets are case datasets
A case dataset is one where all of the
observations have a particular outcome
Crash Data for Injury Analysis
Examples of Case Sampling Basis:
• In a crash
• In a tow-away crash
• Injured in a crash
• Died in a crash
• Etc.
Crash Data for Injury Analysis
What does this mean for analysis?
The first question you should ask before
analyzing any crash database is:
How was this database sampled?
Case Database Example
Analysis of O-ring failures for launches
prior to Challenger looked like this:
0
0.5
1
1.5
2
2.5
3
3.5
50 60 70 80 90
Nu
mb
er
of
Inc
ide
nts
Ambient Temperature (F)
Failures
Case Database Example
When successes are added, it looks like
this:
0
0.5
1
1.5
2
2.5
3
3.5
50 60 70 80 90
Nu
mb
er
of
Inc
ide
nts
Ambient Temperature (F)
Failures
Successes
Case Database Example
And when you look at percent failure, it
looks like this:
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
50 55 60 65 70 75 80 85
Pe
rce
nt
of
O-R
ing
Fa
ilu
re
Ambient Temperature (F)
Case Database Example
Analysis of O-ring failures for launches
prior to Challenger looked like this:
0
0.5
1
1.5
2
2.5
3
3.5
50 60 70 80 90
Nu
mb
er
of
Inc
ide
nts
Ambient Temperature (F)
Failures
Why Risk Analysis Can’t Be Performed
Using Only Case Data (Injury Based)
risk =# injured
#exposed=
# injured
(# injured+# uninjured)
Not in an injury-selected
case dataset
Why Risk Analysis Can’t Be Performed Using
Only Case Data (Injury Based)
Thorax Injury
Lower-Ex Injury
Head Injury
Why Risk Analysis Can’t Be Performed Using Only
Case Data
Thorax Injury
Lower-Ex Injury
Head Injury
All Occupants in Towaway
Crashes
Case Databases
What’s the message?
• Case databases are really useful, but require
caution
• Sometimes special methods can help…
• Know your sample!
Crash Datasets or, What’s Out There?
1. National Datasets
a. Fatality Analysis Reporting System (FARS)
b. National Automotive Sampling System
(NASS)
• General Estimates System (GES)
• Crashworthiness Data System (CDS)
2. State Crash Datasets
3. Hospital and Trauma Datasets
4. Data Linkage
Data Use and Analysis
FARS
• Census of all fatal crashes on public roads
in the U.S
• Good for counting fatalities in a variety of
conditions
• Not good for comparing risks because non-
fatal crashes are not included
Data Use and Analysis
FARS
What’s a fatality?
For FARS, a fatality is a death that occurs
because of crash-related injuries within 30
days of the crash
Data Use and Analysis
FARS Example
Restrained 44%
Unrestrained 48%
Unknown 8%
Motor-Vehicle Occupants Killed in 2011 (FARS)
Data Use and Analysis
FARS Example
Compare fatally injury occupants to use rates on the road
(use rates come from the National Occupant Protection
Use Survey (NOPUS))
National belt use rate for 2011: 84%
Belt use rate among fatalities in 2011: 44%
Data Use and Analysis
FARS
• Available from NHTSA ftp site:
ftp://ftp.nhtsa.dot.gov/fars
• Tips and pitfalls
- Occupants who did not die but were in a
crash with others who did die are
included
- Your analysis may require filtering on
Occupant Injury Severity and choose ―K-fatal
injury‖; this will select only fatalities
(eliminating other occupants in the crashes)
Data Use and Analysis
NASS-GES
• Nationally representative sample of
~50,000 police-reported crashes in the U.S.
per year
• Police-reported information is coded to a
common standard and entered into the
GES database
• More serious crashes are overrepresented
in the sample, so cases are weighted to
reflect the appropriate representation in the
U.S.
Data Use and Analysis
Injury in GES
Overall injury severity is assessed by police on
the KABCO scale:
K=Killed
A=Suspected Serious Injury
B=Suspected Minor Injury
C=Possible Injury
O=Property Damage Only
Data Use and Analysis
NASS-GES
• Example question: How are alcohol-related
crashes distributed over time and day of the
week?
• How is that different from all crashes?
Data Use and Analysis
NASS-GES
• Example question: Has the ESC mandate
helped?
In 2009, NHTSA mandated Electronic Stability
Control (ESC) on all new vehicles by MY 2011
Has it helped?
Data Use and Analysis
NASS-GES Example
0.0%
20.0%
40.0%
60.0%
80.0%
100.0%
MY 99-07 MY 08+
All Occupants
Pe
rce
nt
of
Oc
cu
pan
ts
Model Year Range and Occupant Injury Level
Pickups, Vans, and SUVs
Side
Rollover
Frontal
Data Use and Analysis
NASS-GES Example
0.0%
20.0%
40.0%
60.0%
80.0%
100.0%
MY 99-07 MY 08+ MY 99-07 MY 08+
All Occupants Occupants with KA Injuries
Pe
rce
nt
of
Oc
cu
pan
ts
Model Year Range and Occupant Injury Level
Pickups, Vans, and SUVs
Side
Rollover
Frontal
Data Use and Analysis
NASS-GES
• Tips and pitfalls
1. GES datasets can be downloaded from
NHTSA FTP site:
ftp://ftp.nhtsa.dot.gov/GES/
2. GES changed codes in 2009 and 2010 to
be more compatible with FARS
– Watch out for changes in variable
values across years!
Data Use and Analysis
NASS-GES
• Tips and pitfalls
3. GES includes a number of tables that
may need to be merged for analysis
4. Analysis (and table) levels:
– Crash
– Vehicle
– Occupant
Data Use and Analysis
NASS-GES
• Tips and pitfalls
5. Weights must be used to get correct
estimates from GES
– The weight variable is called
―WEIGHT‖
– This can be done using the weight
statement in any SAS procedure
6. Survey methods must be used to
estimate variance or test significance
(e.g., PROC SURVEYFREQ)
Data Use and Analysis
NASS-CDS
• Nationally representative sample of ~5,000
towaway crashes involving a light vehicle in
the U.S. per year
• In-depth crash investigations done for each
case
• Injuries are coded from medical records,
and delta-V (crash severity) is estimated
from vehicle crush
Data Use and Analysis
Injury in CDS
CDS includes medical outcome data for injured
occupants. All injuries are coded using the
Abbreviated Injury Scale (AIS) and associated
metrics can be calculated (Maximum AIS,
Injury Severity Score, etc.)
Data Use and Analysis
NASS-CDS
• Example questions:
1. How is injury outcome related to delta-V
(crash severity) for different vehicle
types?
2. How effective are seat-belts at preventing
head injury?
3. How effective are side airbags in near-
side impacts?
37
Crash Data Analysis
NASS-CDS Example: Automatic Collision
Notification (AACN) and Injury Severity Prediction
ACN & Triage Model
Crash Severity
Crash direction
Belt status
1 or 2 events
Vehicle Type
Age
Gender
Model Predicted
risk
If predicted risk is high, advisor will alert EMS
EMS may choose to increase triage priority as a result
ACN & Triage Model
Where does the injury risk model come from?
CDS Analysis
CDS has:
• Detailed injury outcome, based on medical records
• Crash severity (delta-V), based on accident
investigation
Use delta-V (and other variables) to predict injury
outcome
Injury Risk
0%
20%
40%
60%
80%
100%
0 20 40 60 80 100 120
Ris
k o
f In
jury
Delta-V (kph)
Head Risk Thorax
Lowerextr AIS3+
Develop injury risk curves like this
Injury Risk
0%
20%
40%
60%
80%
100%
0 20 40 60 80 100 120
Ris
k o
f In
jury
Delta-V (kph)
Head Risk Thorax
Lowerextr AIS3+
Choose a cutoff: If predicted risk>cutoff, alert EMS
Data Use and Analysis
NASS-CDS
• Tips and pitfalls
1. CDS datasets can be downloaded from
NHTSA FTP site:
ftp://ftp.nhtsa.dot.gov/CDS/
2. CDS changed some codes in 2009 and
2010
Data Use and Analysis
NASS-CDS
• Tips and pitfalls
3. CDS includes a number of tables that
may need to be merged for analysis
4. Analysis (and table) levels:
– Crash
– Vehicle
– Occupant
– Injury
Data Use and Analysis
NASS-CDS
• Tips and pitfalls
5. Weights must be used to get correct
estimates from CDS
– The weight variable is called
―RATWGT‖
– This can be done using the weight
statement in any SAS procedure
6. Survey methods must be used to
estimate variance or test significance
(e.g., PROC SURVEYFREQ)
Data Use and Analysis
State Crash Data
• Each state crash database is a census of
police-reports
• Each state has its own variables and codes;
to use states together, codes must be
mapped by the analyst
• State data uses KABCO for injury
Data Use and Analysis
State Crash Data
What good are state crash data?
– State databases are large! Lots of crashes
mean you can analyze rare or unusual
events (e.g., crashes involving the newest
technologies)
– Different states have different laws, so
differences in crashes between states might
reflect different effects of laws
Data Use and Analysis
State Crash Data
• State crash data example:
Helmet law change in Michigan
On April 13, 2012, a modified helmet law went into
effect in Michigan allowing motorcyclists 21 and
over to choose not to wear a helmet. What was the
effect and how is fatality risk affected by helmet
use?
Fatalities and Injuries
Apr 13-Dec 31
Year(s) Helmet
Use
Fatalities
(per
year)
Serious
Injuries
(per
year) Percent
Fatal
Percent
Serious
Injury
2011
Yes 97 574 3.2% 19%
No 6 23 7.2% 31%
2012 Yes 56 390 2.3% 16%
No 55 194 6.5% 23%
Fatalities and Injuries
Apr 13-Dec 31
• Overall fatality rate in 2011 = 3.3%
• Overall fatality rate for 2012 = 3.4%
• Fatality rate was 2.8 times higher for
those who didn’t wear helmets in 2012
compared to those who did
Who Wears Helmets?
Driver Drinking Year(s)
Helmet Use
Rate
Driver Not
Drinking 2008-11 98%
2012 76%
Driver Drinking 2008-11 90%
2012 54%
Separating the Effect of Alcohol from
the Effect of the Helmet
How do we figure out what effect the helmet has,
separate from risk-taking factors like alcohol use?
Regression models allow us to predict risk of fatality or
injury account for alcohol, speed, age, and other
factors
Bottom Line…
Taking risk-taking factors into account, we find:
• Alcohol more than quadruples the risk of death
and nearly triples the risk of serious injury
• After accounting for other risk factors, not wearing a
helmet doubles the risk of fatality and increases
the risk of serious injury by 60%
Data Use and Analysis
State Crash Data
• Sample Analysis: Effectiveness of ESC in
preventing rollovers
1. Identify make/model/year for which ESC
was standard and not standard
Data Use and Analysis
State Crash Data
• Sample Analysis: Effectiveness of ESC in
preventing rollovers
2. Identify relevant crash type and control
crash type:
a. Relevant: Single-vehicle rollover
b. Control: Rear end with rear damage
(front vehicle in rear-end collision)
Data Use and Analysis
State Crash Data
• Sample Analysis: Effectiveness of ESC in
preventing rollovers
3. Select states for analysis and identify
relevant variables and codes
e.g.:
Florida: Area(s) of impact—initial = 19
(overturn)
Michigan: Area(s) of impact—initial = 0
(rollover)
Data Use and Analysis
State Crash Data
• Sample Analysis: Effectiveness of ESC in
preventing rollovers
4. Tabulate crash type vs. vehicle
equipped/not equipped (hypothetical
example below)
Vehicle
Equipment Rollover Control Crash
ESC Equipped 300 800
No ESC 400 700
Data Use and Analysis
State Crash Data
• Sample Analysis: Effectiveness of ESC in
preventing rollovers
5. Compare rates
Vehicle
Equipment Rollover
Control
Crash Rate
ESC
Equipped 300 800 27%
No ESC 400 700 36%
Data Use and Analysis
State Crash Data
• Tips and pitfalls
1. Access to state crash data varies;
Michigan data are available at:
www.michigantrafficcrashfacts.org
Data Use and Analysis
State Crash Data
• Tips and pitfalls
2. Values within variables are not
standardized
– Every state has different contents of
variables
– User must map variables and codes to
combine and compare across states
Data Use and Analysis
State Crash Data
• Tips and pitfalls
3. State crash data are best for analyses
that require very large numbers of cases
(e.g., identifying the effect of new
equipment), but results are not
necessarily nationally representative
Data Use and Analysis
Trauma Datasets
• Trauma datasets (e.g. statewide, hospital-
specific or National Trauma Data Bank)
indicate the cause of the trauma, including
motor-vehicle crash
• All cases have presence of injury requiring
hospital admission, so tend to be serious
crashes
Data Use and Analysis
Trauma Datasets
• Do not contain many crash-related
variables (e.g., damage location, crash
configuration, seat position)
• Predictors limited to restraint use, blood-
alcohol content (of patient—BAC of driver
may or may not be included), and vehicle
type
• Best for understanding in-hospital outcome
for different treatments and injury types
Data Use and Analysis
Trauma Databases
Example analysis questions:
• What is the distribution of specific serious
injuries associated with crashes?
• Do crash victims have better or worse
outcomes (e.g., in-hospital mortality) than other
trauma patients?
• How are serious injuries and in-hospital
outcomes different for helmet-wearers vs. non-
helmet wearers
Data Use and Analysis
Trauma Databases
• Tips and pitfalls
1. Access to trauma databases is generally
more restricted than crash databases;
application must be made for NTDB and
others
2. Trauma databases are not ideal for
assessing risk of injury, since uninjured
and less injured occupants are not
included
Data Use and Analysis
Data Linkage and Integration
• Many analyses require multiple datasets
• In many cases, some exposure information is
required—e.g., travel behavior, number of licensed
drivers
• In others, different datasets have different strengths
(e.g., CDS has detail, but GES has big sample of all
police-reported crashes)
• If your dataset doesn’t have everything, look for ways
to find information in another dataset
Best Database by Question Type
Question Type Database(s)
Crash risk State crash database, GES; must
handle exposure with another
dataset or other methods
Injury risk, incidence CDS
Crash incidence GES (national); state databases
Benefits of vehicle safety content State crash databases
Effects of different laws on
outcomes in crashes
State crash databases
Treatment effects for crash victims
(e.g., predicting intubation or ICU
stays)
Trauma or hospital databases (with
crashes identified)