Datasets and Analysis for Analysis of Crash-Related … and Analysis for Analysis of Crash-Related...

69
Datasets and Analysis for Analysis of Crash-Related Injury Center for the Management of Information for Safe and Sustainable Transportation (CMISST) Jonathan D. Rupp, Ph.D.

Transcript of Datasets and Analysis for Analysis of Crash-Related … and Analysis for Analysis of Crash-Related...

Datasets and Analysis for Analysis of

Crash-Related Injury

Center for the Management of Information for Safe and

Sustainable Transportation (CMISST)

Jonathan D. Rupp, Ph.D.

Overview of Talk

1. Crash data for injury analysis

2. Available Datasets

a. Contents

b. What they are good for

c. Analysis tips and pitfalls

d. Sample analyses

Crash Data for Injury Analysis

Crash data come in many flavors, but all

crash datasets are case datasets

A case dataset is one where all of the

observations have a particular outcome

Crash Data for Injury Analysis

Examples of Case Sampling Basis:

• In a crash

• In a tow-away crash

• Injured in a crash

• Died in a crash

• Etc.

Crash Data for Injury Analysis

What does this mean for analysis?

The first question you should ask before

analyzing any crash database is:

How was this database sampled?

Crash Data for Injury Analysis

Why do we care?

Example: Space Shuttles and O-Rings…

Case Database Example

Analysis of O-ring failures for launches

prior to Challenger looked like this:

0

0.5

1

1.5

2

2.5

3

3.5

50 60 70 80 90

Nu

mb

er

of

Inc

ide

nts

Ambient Temperature (F)

Failures

Case Database Example

When successes are added, it looks like

this:

0

0.5

1

1.5

2

2.5

3

3.5

50 60 70 80 90

Nu

mb

er

of

Inc

ide

nts

Ambient Temperature (F)

Failures

Successes

Case Database Example

And when you look at percent failure, it

looks like this:

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

50 55 60 65 70 75 80 85

Pe

rce

nt

of

O-R

ing

Fa

ilu

re

Ambient Temperature (F)

Case Database Example

Analysis of O-ring failures for launches

prior to Challenger looked like this:

0

0.5

1

1.5

2

2.5

3

3.5

50 60 70 80 90

Nu

mb

er

of

Inc

ide

nts

Ambient Temperature (F)

Failures

Why Risk Analysis Can’t Be Performed

Using Only Case Data (Injury Based)

 

risk =# injured

#exposed=

# injured

(# injured+# uninjured)

Not in an injury-selected

case dataset

Why Risk Analysis Can’t Be Performed

Using Only Case Data (Injury Based)

Head Injury

Why Risk Analysis Can’t Be Performed Using

Only Case Data (Injury Based)

Thorax Injury

Lower-Ex Injury

Head Injury

Why Risk Analysis Can’t Be Performed Using Only

Case Data

Thorax Injury

Lower-Ex Injury

Head Injury

All Occupants in Towaway

Crashes

Case Databases

What’s the message?

• Case databases are really useful, but require

caution

• Sometimes special methods can help…

• Know your sample!

Crash Datasets or, What’s Out There?

1. National Datasets

a. Fatality Analysis Reporting System (FARS)

b. National Automotive Sampling System

(NASS)

• General Estimates System (GES)

• Crashworthiness Data System (CDS)

2. State Crash Datasets

3. Hospital and Trauma Datasets

4. Data Linkage

Data Use and Analysis

FARS

• Census of all fatal crashes on public roads

in the U.S

• Good for counting fatalities in a variety of

conditions

• Not good for comparing risks because non-

fatal crashes are not included

Data Use and Analysis

FARS

What’s a fatality?

For FARS, a fatality is a death that occurs

because of crash-related injuries within 30

days of the crash

Data Use and Analysis

FARS

Example:

- How many fatalities occurred in 2010 by

restraint type?

Data Use and Analysis

FARS Example

Restrained 44%

Unrestrained 48%

Unknown 8%

Motor-Vehicle Occupants Killed in 2011 (FARS)

Data Use and Analysis

FARS Example

Compare fatally injury occupants to use rates on the road

(use rates come from the National Occupant Protection

Use Survey (NOPUS))

National belt use rate for 2011: 84%

Belt use rate among fatalities in 2011: 44%

Data Use and Analysis

FARS

• Available from NHTSA ftp site:

ftp://ftp.nhtsa.dot.gov/fars

• Tips and pitfalls

- Occupants who did not die but were in a

crash with others who did die are

included

- Your analysis may require filtering on

Occupant Injury Severity and choose ―K-fatal

injury‖; this will select only fatalities

(eliminating other occupants in the crashes)

Data Use and Analysis

NASS-GES

• Nationally representative sample of

~50,000 police-reported crashes in the U.S.

per year

• Police-reported information is coded to a

common standard and entered into the

GES database

• More serious crashes are overrepresented

in the sample, so cases are weighted to

reflect the appropriate representation in the

U.S.

Data Use and Analysis

Injury in GES

Overall injury severity is assessed by police on

the KABCO scale:

K=Killed

A=Suspected Serious Injury

B=Suspected Minor Injury

C=Possible Injury

O=Property Damage Only

Data Use and Analysis

NASS-GES

• Example question: How are alcohol-related

crashes distributed over time and day of the

week?

• How is that different from all crashes?

Data Use and Analysis Crashes by day and time of day in U.S.

Evening

rush hour

Data Use and Analysis Alcohol-involved crashes by day and time of day in U.S.

1-2 am

weekends

Data Use and Analysis

NASS-GES

• Example question: Has the ESC mandate

helped?

In 2009, NHTSA mandated Electronic Stability

Control (ESC) on all new vehicles by MY 2011

Has it helped?

Data Use and Analysis

NASS-GES Example

0.0%

20.0%

40.0%

60.0%

80.0%

100.0%

MY 99-07 MY 08+

All Occupants

Pe

rce

nt

of

Oc

cu

pan

ts

Model Year Range and Occupant Injury Level

Pickups, Vans, and SUVs

Side

Rollover

Frontal

Data Use and Analysis

NASS-GES Example

0.0%

20.0%

40.0%

60.0%

80.0%

100.0%

MY 99-07 MY 08+ MY 99-07 MY 08+

All Occupants Occupants with KA Injuries

Pe

rce

nt

of

Oc

cu

pan

ts

Model Year Range and Occupant Injury Level

Pickups, Vans, and SUVs

Side

Rollover

Frontal

Data Use and Analysis

NASS-GES

• Tips and pitfalls

1. GES datasets can be downloaded from

NHTSA FTP site:

ftp://ftp.nhtsa.dot.gov/GES/

2. GES changed codes in 2009 and 2010 to

be more compatible with FARS

– Watch out for changes in variable

values across years!

Data Use and Analysis

NASS-GES

• Tips and pitfalls

3. GES includes a number of tables that

may need to be merged for analysis

4. Analysis (and table) levels:

– Crash

– Vehicle

– Occupant

Data Use and Analysis

NASS-GES

• Tips and pitfalls

5. Weights must be used to get correct

estimates from GES

– The weight variable is called

―WEIGHT‖

– This can be done using the weight

statement in any SAS procedure

6. Survey methods must be used to

estimate variance or test significance

(e.g., PROC SURVEYFREQ)

Data Use and Analysis

NASS-CDS

• Nationally representative sample of ~5,000

towaway crashes involving a light vehicle in

the U.S. per year

• In-depth crash investigations done for each

case

• Injuries are coded from medical records,

and delta-V (crash severity) is estimated

from vehicle crush

Data Use and Analysis

Injury in CDS

CDS includes medical outcome data for injured

occupants. All injuries are coded using the

Abbreviated Injury Scale (AIS) and associated

metrics can be calculated (Maximum AIS,

Injury Severity Score, etc.)

Data Use and Analysis

NASS-CDS

• Example questions:

1. How is injury outcome related to delta-V

(crash severity) for different vehicle

types?

2. How effective are seat-belts at preventing

head injury?

3. How effective are side airbags in near-

side impacts?

37

Crash Data Analysis

NASS-CDS Example: Automatic Collision

Notification (AACN) and Injury Severity Prediction

ACN & Triage Model

EDR

EDR = Event Data Recorder

A car’s ―black box‖

ACN & Triage Model

Crash Severity

Crash direction

Belt status

1 or 2 events

Vehicle Type

Age

Gender

Model Predicted

risk

If predicted risk is high, advisor will alert EMS

EMS may choose to increase triage priority as a result

ACN & Triage Model

Where does the injury risk model come from?

CDS Analysis

CDS has:

• Detailed injury outcome, based on medical records

• Crash severity (delta-V), based on accident

investigation

Use delta-V (and other variables) to predict injury

outcome

Injury Risk

0%

20%

40%

60%

80%

100%

0 20 40 60 80 100 120

Ris

k o

f In

jury

Delta-V (kph)

Head Risk Thorax

Lowerextr AIS3+

Develop injury risk curves like this

Injury Risk

0%

20%

40%

60%

80%

100%

0 20 40 60 80 100 120

Ris

k o

f In

jury

Delta-V (kph)

Head Risk Thorax

Lowerextr AIS3+

Choose a cutoff: If predicted risk>cutoff, alert EMS

Data Use and Analysis

NASS-CDS

• Tips and pitfalls

1. CDS datasets can be downloaded from

NHTSA FTP site:

ftp://ftp.nhtsa.dot.gov/CDS/

2. CDS changed some codes in 2009 and

2010

Data Use and Analysis

NASS-CDS

• Tips and pitfalls

3. CDS includes a number of tables that

may need to be merged for analysis

4. Analysis (and table) levels:

– Crash

– Vehicle

– Occupant

– Injury

Data Use and Analysis

NASS-CDS

• Tips and pitfalls

5. Weights must be used to get correct

estimates from CDS

– The weight variable is called

―RATWGT‖

– This can be done using the weight

statement in any SAS procedure

6. Survey methods must be used to

estimate variance or test significance

(e.g., PROC SURVEYFREQ)

Data Use and Analysis

State Crash Data

• Each state crash database is a census of

police-reports

• Each state has its own variables and codes;

to use states together, codes must be

mapped by the analyst

• State data uses KABCO for injury

Data Use and Analysis

State Crash Data

What good are state crash data?

– State databases are large! Lots of crashes

mean you can analyze rare or unusual

events (e.g., crashes involving the newest

technologies)

– Different states have different laws, so

differences in crashes between states might

reflect different effects of laws

Data Use and Analysis

State Crash Data

• State crash data example:

Helmet law change in Michigan

On April 13, 2012, a modified helmet law went into

effect in Michigan allowing motorcyclists 21 and

over to choose not to wear a helmet. What was the

effect and how is fatality risk affected by helmet

use?

Fatalities and Injuries

Apr 13-Dec 31

Year(s) Helmet

Use

Fatalities

(per

year)

Serious

Injuries

(per

year) Percent

Fatal

Percent

Serious

Injury

2011

Yes 97 574 3.2% 19%

No 6 23 7.2% 31%

2012 Yes 56 390 2.3% 16%

No 55 194 6.5% 23%

Fatalities and Injuries

Apr 13-Dec 31

• Overall fatality rate in 2011 = 3.3%

• Overall fatality rate for 2012 = 3.4%

• Fatality rate was 2.8 times higher for

those who didn’t wear helmets in 2012

compared to those who did

Who Wears Helmets?

Driver Drinking Year(s)

Helmet Use

Rate

Driver Not

Drinking 2008-11 98%

2012 76%

Driver Drinking 2008-11 90%

2012 54%

Separating the Effect of Alcohol from

the Effect of the Helmet

How do we figure out what effect the helmet has,

separate from risk-taking factors like alcohol use?

Regression models allow us to predict risk of fatality or

injury account for alcohol, speed, age, and other

factors

Bottom Line…

Taking risk-taking factors into account, we find:

• Alcohol more than quadruples the risk of death

and nearly triples the risk of serious injury

• After accounting for other risk factors, not wearing a

helmet doubles the risk of fatality and increases

the risk of serious injury by 60%

Data Use and Analysis

State Crash Data

• Sample Analysis: Effectiveness of ESC in

preventing rollovers

1. Identify make/model/year for which ESC

was standard and not standard

Data Use and Analysis

State Crash Data

• Sample Analysis: Effectiveness of ESC in

preventing rollovers

2. Identify relevant crash type and control

crash type:

a. Relevant: Single-vehicle rollover

b. Control: Rear end with rear damage

(front vehicle in rear-end collision)

Data Use and Analysis

State Crash Data

• Sample Analysis: Effectiveness of ESC in

preventing rollovers

3. Select states for analysis and identify

relevant variables and codes

e.g.:

Florida: Area(s) of impact—initial = 19

(overturn)

Michigan: Area(s) of impact—initial = 0

(rollover)

Data Use and Analysis

State Crash Data

• Sample Analysis: Effectiveness of ESC in

preventing rollovers

4. Tabulate crash type vs. vehicle

equipped/not equipped (hypothetical

example below)

Vehicle

Equipment Rollover Control Crash

ESC Equipped 300 800

No ESC 400 700

Data Use and Analysis

State Crash Data

• Sample Analysis: Effectiveness of ESC in

preventing rollovers

5. Compare rates

Vehicle

Equipment Rollover

Control

Crash Rate

ESC

Equipped 300 800 27%

No ESC 400 700 36%

Data Use and Analysis

State Crash Data

• Tips and pitfalls

1. Access to state crash data varies;

Michigan data are available at:

www.michigantrafficcrashfacts.org

Data Use and Analysis

State Crash Data

• Tips and pitfalls

2. Values within variables are not

standardized

– Every state has different contents of

variables

– User must map variables and codes to

combine and compare across states

Data Use and Analysis

State Crash Data

• Tips and pitfalls

3. State crash data are best for analyses

that require very large numbers of cases

(e.g., identifying the effect of new

equipment), but results are not

necessarily nationally representative

Data Use and Analysis

Trauma Datasets

• Trauma datasets (e.g. statewide, hospital-

specific or National Trauma Data Bank)

indicate the cause of the trauma, including

motor-vehicle crash

• All cases have presence of injury requiring

hospital admission, so tend to be serious

crashes

Data Use and Analysis

Trauma Datasets

• Do not contain many crash-related

variables (e.g., damage location, crash

configuration, seat position)

• Predictors limited to restraint use, blood-

alcohol content (of patient—BAC of driver

may or may not be included), and vehicle

type

• Best for understanding in-hospital outcome

for different treatments and injury types

Data Use and Analysis

Trauma Databases

Example analysis questions:

• What is the distribution of specific serious

injuries associated with crashes?

• Do crash victims have better or worse

outcomes (e.g., in-hospital mortality) than other

trauma patients?

• How are serious injuries and in-hospital

outcomes different for helmet-wearers vs. non-

helmet wearers

Data Use and Analysis

Trauma Databases

• Tips and pitfalls

1. Access to trauma databases is generally

more restricted than crash databases;

application must be made for NTDB and

others

2. Trauma databases are not ideal for

assessing risk of injury, since uninjured

and less injured occupants are not

included

Data Use and Analysis

Data Linkage and Integration

• Many analyses require multiple datasets

• In many cases, some exposure information is

required—e.g., travel behavior, number of licensed

drivers

• In others, different datasets have different strengths

(e.g., CDS has detail, but GES has big sample of all

police-reported crashes)

• If your dataset doesn’t have everything, look for ways

to find information in another dataset

Best Database by Question Type

Question Type Database(s)

Crash risk State crash database, GES; must

handle exposure with another

dataset or other methods

Injury risk, incidence CDS

Crash incidence GES (national); state databases

Benefits of vehicle safety content State crash databases

Effects of different laws on

outcomes in crashes

State crash databases

Treatment effects for crash victims

(e.g., predicting intubation or ICU

stays)

Trauma or hospital databases (with

crashes identified)

Data Use and Analysis

Questions?

Thanks for your attention.

University of Michigan Transportation Research Institute (UMTRI)