Finding a Range Using Statistics In Traffic Crash...

Introduction Prob Theory Rand Vars Sampling Range Examples Conclusion

Finding a Range Using Statistics In Traffic CrashReconstruction

Jeremy DailyThe University of Tulsa

Jackson Hole Scientific Investigations, Inc.

1st Annual Traffic Crash Reconstruction Cruise Conference

6-13 July 2008


Abstract

Quantities used in crash reconstruction often have someinherent variation and a single value is not appropriate. Theeasiest way to overcome this deficiency is to use a range: ahigh and a low value. The question is how do we determine arange in a logical and mathematically consistent fashion?We can use sampling statistics and probability theory togenerate a distribution of possible values. The investigatorchooses a significance level and the corresponding range isdetermined based on a Bootstrap (Monte Carlo) samplingscheme implemented in Excel. The results provide aconservative range and can deal with small sample sizes. Theresults, however, may not make physical sense and reality musttake precedent over statistics. Crash related examples areincluded.


Outline of Presentation

1 IntroductionMotivationStatistical Definitions

2 Probability TheoryAxioms of ProbabilityConditional ProbabilityBayes’ Theorem

3 Random VariablesNormal DistributionCentral Limit Theorem

4 SamplingDescriptive Statistics

Statistics for the MeanStatistics for the Variance

5 Determining a RangeFrequentist AssumptionsSignificance levelsThe Bootstrapping Method

6 ExamplesDrag FactorsCrush Stiffness CoefficientsPerception-Reaction TimeWalking SpeedsDrag Sleds vs. Accelerometers

7 Conclusion


Outline








7 Conclusion


How do we determine a range?

Example

An investigator conducts 4 skid tests with a similar vehicle on thesurface of a crash site and gets the following results:

0.7630.720

0.7510.743

These values are not within 5% of each other.

What values should we use in a reconstruction?

Do we discard any of the data? If so, which value is not valid?

The answer: There is a 95% chance that the drag factor for thecrash was between 0.65 and 0.83


Motivation

Crash Reconstruction by nature is highly variable.

Statistics provide some tools to help us quantify the variationfound in crash reconstruction.

All calculations can be performed with a spreadsheet.A preprogrammed spreadsheet is publically available athttp://www.jhscientific.com/cgi-bin/downloads.py.

We can deal with both large samples and small samples.

Experience DOES matter! Mathematics does not turn baddata into good data. All answers should be checked.

http://www.jhscientific.com/cgi-bin/downloads.py


Two Types of Uncertainty

Aleatory uncertainty describes the inherent variationassociated with the physical system. This is alsocalled the noise in a system.

Epistemic uncertainty is a result of our ignorance. This can beeither because we lack enough data or ourmathematical models are not good enough. If wecollect more data, our understanding improves.

We will have one tool that deals with both types of uncertainty.


Statistics Definitions

Stochastic is a term given to a quantity (variable) that neverhas one specific value. Its values are represented by aprobability function.

Deterministic is the converse to stochastic in that the value of adeterministic quantity is unique for a given situation.

Constants are parameters that never change. For our purposes,the acceleration due to gravity is a constant 32.2ft/s2.


Stochastic Variables

What are some common quantities in reconstruction that arestochastic?

Drag Factor

Crush Stiffness Coefficients

Human Performance

Walking SpeedsPerception TimeReaction Time

Some variables are difficult to quantify using statistics

Departure angles from a crashTake off angle for an airborne analysis

Always make sure the result of a statistical analysis makessense given the physical evidence!!


Probability and Statistics

Statistics is the art of learning from data.

Probability theory provides the framework to discuss theinterpretation of the data.


Outline








7 Conclusion


What is Probability?

Objective probability is probability theory that is based on aknown outcome, such as rolling fair dice. This alsoincludes the relative frequency definition:

P(A) = limN→∞

nA

N

which says the probability of event A is the ratio ofthe number of occurrences of A, nA, to the totalnumber of trials, N. This is considered thefrequentist approach.

Subjective probability is a personal probability expressing yourdegree of belief. This is where expert experiencemanifests itself in the logical constructs of probabilitytheory. This is considered the Bayesian approach.


Axioms of Probability

1 The probability of an event is represented as a numberbetween 0 and 1.

If the probability of an event is 0, then the event will neverhappen.If the probability of an event is 1, then the event will alwaysoccur.

2 The probability of all events must add to 1

3 The probability of two independent events is the sum of theirindividual probabilities.


An Example of a Die

Example

When a six sided die is rolled, there are six independent, equallylikely events that can occur. What is the probability of rolling aneven number?

All numbers have a 1/6 chance of occurring assuming the die isfair because the probability of all events must add to 1. Theprobability of rolling an even number on a die is

P(Even) = P( )+P( )+P( )

1

6+

1

6+

1

6=

3

6= 0.5


Conditional Probability

Definition

Conditional probability has very logical basis. The notation P(A|B)says the probability of event A given event B. This allows us tocondition the probability given some event.

P(A|B) =P(A and B)

P(B)


Rolling the Dice

Example

Consider rolling two dice, one at a time. What is the probability ofgetting a 3?A three can happen one of two ways: Either or . Theprobability is written mathematically as:

P( and ) or P( and )

P( )×P( )+P( )×P( )(

1

6

)(1

6

)

+

(1

6

)(1

6

)

=2

36


Rolling the Dice

Example

What if we roll one die at a time and the first die shows up ?Now the probability of getting a three has changed. It is nowconditional.

P(3| ) =P(3 and )

P( )

=13616

=1

6

The probability of the final outcome is conditioned by the resultsof the first event.



Theorem

The probability every real world event is conditional, even if onlyon background information.

Proof.

Every real event was influenced by either the previous event or thesurroundings. Our belief of events in a crash depend on ourexperience and the evidence at the scene.



Example

The probability of a vehicle negotiating a turn depends on theradius of the turn and the friction of the road. The friction isdependent on the weather and the weather is dependent on theseason and so forth.

Conditional probability that includes background information iswritten as:

P(A|B, I )

The background information is always present in an analysis andmust be understood.


Bayes’ Theorem

Theorem

Bayes’ theory allows us to work with conditional probabilities:

P(A|B) =P(B|A)P(A)

P(B|A)P(A)+P(B|not A)P(not A)

This can lead to a discussion of joint and marginal probability (offtopic).


Priors, Posteriors and Likelihood

Simplify Bayes’ Theorem:

P(Event|Data, I ) ∝ P(Data|Event, I )P(Event|I )

I is the background information

P(Event|I ) is the prior probability

P(Data|Event, I ) is the likelihood function

P(Event|Data, I ) is the posterior probability

We can combine our knowledge with test results to get a newdistribution.What is a probability distribution function?


Outline








7 Conclusion


Random Variables

Definition

A random variable is the numerical outcome of a randomexperiment.

An event needs to be coded or operationalized to become arandom variable.

Some random events have little meaning as random variables.Examples: driver fatigue or human emotion

Random variables are described in terms of probabilitydistributions.


The Normal Distribution

f (x)

µ µ +σµ −σ

Figure: The probability density function of a Normal distribution with amean of µ (mu) and a standard deviation of σ (sigma). The shadedregion represents 68% of the area and is bounded by ±1σ . The area isequal to the probability.


The Normal Distribution

Definition

The equation for the probability density function (PDF) of theNormal (Gaussian) distribution is:

f (x) =1

σ√

2πexp

(

−1

2

[x −µ

σ

]2)

It is written shorthand as N(µ,σ).


Normal Distributions Representing Drag Factor

Example

Plot the Probability Density Functions (PDFs) of the drag factorsfor a new Crown Victoria and an old 3/4 ton Pickup. The PDF ofthe Crown Victoria is:

N(0.8,0.03)

The PDF of the Pickup is:

N(0.6,0.08)


Normal Distributions Representing Drag Factor

0.8f

PDF

0.6

Figure: The normal distribution representing the truck has a lower meanbut more spread.


Central Limit Theorem

Why do we use a normal distribution?

Theorem

Variables that are influenced by many different factors that areunrelated approximate a normal distribution.

Note: Normal distributions are found in all aspects of physicalphenomena. They are the most likely distributions given no otherinformation.


Outline








7 Conclusion


Sampling from a Population

A Population includes all possible values.

A sample is a random and unbiased subset of the population.

Non random sampling and biased samples are the pitfalls ofthis method.In human testing, the demographic of your sample must besimilar to the person in question.People learn and train themselves thus introducing biasunknowingly.

Assume from here on that we have n correct data points.


The Sample Mean

Definition

The sample mean, x is the arithmetic average of the data:

x =x1 + x2 + x3 + · · ·+ xn

n

This is also written as:

x =∑n

i=1 xi

n

where ∑ is the summation symbol and means to add all the itemsin the list together.

In Excel: =AVERAGE(array)


The Sample Standard Deviation

Definition

The sample variance, s2 is obtained with the following formula:

s2 =(x1− x)2 +(x2− x)2 + · · ·+(xn − x)2

n−1

This is also written as:

s2 =∑n

i=1(xi − x)2

n−1

The standard deviation is the square root of the variance:

s =

√

∑ni=1(xi − x)2

n−1

In Excel: =STDEV(array) and =VAR(array)


Other Descriptions of Data

Central Tendency

Median: 50th percentileMode: the most frequentFor a symmetric distribution: mean = median = mode

Measures of Spread

RangePercentilesBox and Whisker Plots


Drag Factor Example

Example

The sample mean, x of the given data is:

x =0.763+0.720+0.751+0.743

4= 0.744

The sample variance is:

s2 = VAR(0.763,0.720,0.751,0.743) = 0.0003289

The sample standard deviation is:

s =√

0.0003289 = 0.01814


Estimators

The sample mean and sample standard deviation are the mostlikely estimates of the true population mean and true populationstandard deviation.

x ⇐⇒ µs ⇐⇒ σ

Since these sample statistics are just estimates, the true populationparameters follow some distribution with respect to their respectiveestimators.


Distribution of the Mean

Theorem

The distribution of sample means follow a Student-t distributionthat considers the number of samples in the data– regardless ofthe underlying distribution.

x −µs/√

n∼ tn,α

where the term s/√

n is called the standard error.

Note:

As the number of samples becomes large, the standard error dropsto zero and the sample mean approaches the true mean.


The Student-t distribution

0 1 2 3 4−1−2−3−4x

f (x)

Figure: The Student-t distributions become more narrow as the numberof samples increase until it approaches the standard normal distribution,N(0,1). The dashed line has 2 degrees of freedom, the solid line has 5d.o.f. and the dotted line is the standard normal. Notice there is morearea in the tail region for small samples.


An Example of the Distribution of the Mean

Example

Construct the two sided confidence interval for the mean of the 4drag factor samples at the α = 0.05 significance level.We can write the distribution of the true population mean as:

µ = x + t4,0.05

(s√n

)

The standard error is:

s√n

=0.01814√

4= 0.009068


A Picture of the Distribution of the Mean

0.64 0.66 0.68 0.70 0.72 0.74 0.76 0.78 0.80

µ

x = 0.744

Figure: The distribution of the mean follows a Student-t distribution. Forthis example the critical t value for 4 dof and 95% confidence is 2.7764.This gives a confidence bound on the mean as 0.744±2.7764(0.009068)which is a bound between 0.719 and 0.769.

Note:

The critical t value is computed in Excel as =TINV(.05,4)


The Distribution of the Variance

We have only examined the mean. What about the variance (std.dev.)?

Theorem

The distribution of the true variance follows a χ2 (chi-squared)distribution:

(n−1)s2

σ2∼ χ2

n−1

Where the χ2 distribution depends on the degrees of freedom(n−1). The mean and standard deviation of a χ2

n−1 distributionare:

µ = n−1

σ2 = 2(n−1)


The χ2 Distribution

0.05

0.10

0.15

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

x

pdf (x)

Figure: Some examples of χ2 distributions. The mean of the distributionscorrespond to the numbers of degrees of freedom (n−1). The dashedline has 2 dof, the black line has 5 dof, and the red line has 10 dof.


An Example of the Distribution of the Variance

We are only interested in the upper bound on the variance.Find the lower critical χ2 value (=CHIINV(.95,3) in Excel):χ2

left = 0.3518Determine the upper bound of the variance:

σ2upper =

(n−1)s2

χ2(n−1),left

σ2upper =

(3)0.0003289

0.3518= 0.002804

σupper =√

σ2upper = 0.05296

0 1 2 3 4 5 6 7 8 9 10 χ2

pdf (χ2)

Figure: To find the largest variation we must divide by the smallest (left)critical χ2 value.


Outline








7 Conclusion


Determining a Range

Variation exists in variable used in reconstruction.

A range captures the essence of the variation with simplicity.

How do we determine the correct range?1 Frequentist Approach2 Bayesian Approach

Bayesian approaches let you incorporate guesses and priorexperience. This is beyond the scope of this lecture.


Frequentist (Classical) Assumptions

The population of samples is Normally distributed.

Every sample is independent of all other samples.

Every sample comes from the same parent distribution.

The mean and standard deviation of the parent population areunknown.


Significance Levels

Definition

The significance level is the amount of probability that aproposition is not true. We denote the significance as α . Thischoice is arbitrary but the commonly used significance levels areα = 0.10, α = 0.05, and, α = 0.01.

We use this definition to determine the most likely intervals of aparticular value.


Two Tails or One Tail

x

f (x)

µ µ +1.645σ

(a) A one sided confidence interval

x

f (x)

µ µ +1.96σµ −1.96σ

(b) A two sided confidence interval

Figure: The difference in a one sided and two sided confidence interval.In each case 95% of the area is enclosed.


Overall equation

pdf (x) = x + tn

(s√n

)

︸︷︷︸

µ

+tn−1

√

(n−1)s2

χ2n−1

︸︷︷︸

σ

(1)

The overall distribution depends on three underlying distributions:tn, tn−1, and χ2

n−1

It seems as if the problem is getting harder...


Bootstrapping (Monte Carlo) Method

1 Sample each distribution independentlyUse the =RAND() function in Excel

2 Combine the sampled distributions according to Eq. 1

3 Rank the results

4 Choose the values corresponding to the desired significanceFor a 95% two sided interval with 10,000 samples the lowerbound is the 250th sorted sample and the upper bound is the9750th sorted sample.


A Picture of the Range

Emperical Cummulative Probability

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.6 0.65 0.7 0.75 0.8 0.85 0.9

Drag Factor

CD

F

Monte Carlo CDF

Range

Figure: The result of the Monte Carlo simulation represented as acumulative distribution. The actual value has a 95% chance of lying inthe range shown above.


Range Estimation of a Normally Distributed Variable With an Unknown Mean

and Unknown Standard Deviation

Label Example Drag Factor Tests

Data 0.763

0.720

0.751

0.743

Summary Statistics

Count 4

Sample Mean 0.744

Sample Std Dev 0.01814

Standard Error 0.009068030

Variance 0.0003289167

Significance 95%

alpha 0.05 (this can be changed)

Confidence Interval for the Mean

Student-t 2.7764 (two tailed)

Low Mean 0.7191

High Mean 0.7694

Upper Bound on the Variance

Chi Squared Left 0.351846 (one sided)

Max Variance 0.002804

Max Stdev 0.052957

Bootstrap (Monte Carlo) Results (use this range if it makes sense)

lower bound 0.653upper bound 0.833

Significance Bounds on Most Likely Normal Distribution

lower bound 0.709

upper bound 0.780 (this is computed from summary Stats)

Difference (should be positive)

upper bound 0.055450

lower bound 0.053324

Instructions For Use:

1.) Enter the Desired Data in Column B

2.) Adjust the Value of alpha as Desired (0.05 is commonly accepted)

3.) Press F9 to Calculate the Worksheet (Calculation takes a long time)

Note: The Monte Carlo results may change slightly with each calculation


Concluding Our Example

The results of the simulation for our drag factor data are:

flower = 0.65

fupper = 0.83

For a 100 ft slide to stop the calculated speeds would be from44.83 mph to 49.59 mph.


Outline








7 Conclusion


Another Drag Factor Example

Example

Consider three skid tests from an accelerometer mounted in anexemplar vehicle. What range of the drag factor should be used ifthe three readings are 0.786, 0.812, and 0.794? Let x denote thedrag factor.

Note:

These measurements are within 5%.




Label Example Drag Factor Tests

Data 0.786

0.812

0.794

Summary Statistics

Count 3

Sample Mean 0.797



Variance 0.00018

Significance 95%




Low Mean 0.7729

High Mean 0.8218




Max Stdev 0.05880




lower bound 0.771











Summary Statistics

Sample mean:

x =0.786+0.812+0.794

3= 0.797

Sample Variance:

s2 =(0.786−0.797)2 +(0.812−0.797)2 +(0.794−0.797)2

3−1=0.00018

Sample Standard Deviation:

s =√

s2 = 0.01332

Standard Error:

StdErr =0.01332√

3= 0.00769


Confidence Interval on the Mean

Specify the significance level: α = 0.05

Find the critical t value (=TINV(Prob,DOF) in Excel):tcrit = 3.182

µ

pdf (µ)

xx −3.182 s√n

x +3.182 s√n

Figure: The 95% confidence interval for the mean is 0.797±0.0244.


Confidence Bound on the Variance

We are only interested in the upper bound on the variance.

Find the lower critical χ2 value (=CHIINV(Prob,DOF) inExcel): χ2

left = 0.1026

Determine the upper bound of the variance:

σ2upper =

(n−1)s2

χ2(n−1),left

σ2upper =

(2)0.000177

0.1026= 0.00345

σupper =√

σ2upper = 0.0588

0 1 2 3 4 5 6 7 8 9 10 χ2

pdf (χ2)


Run Monte Carlo on Overall Equation

Evaluate 10,000 trials of the equation

Rank all 10,000 results

Extract the 250th and 9750th result to create a range.

Emperical Cummulative Probability

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.5 0.6 0.7 0.8 0.9 1 1.1

Drag Factor

CD

F

Range

Figure: The result of the Monte Carlo simulation represented as acumulative distribution. The range extracted is 0.68 to 0.91.


The Range from the Most Likely Normal Distribution

The range given from the normal distribution with a mean of xand a standard deviation of s will always be tighter than previouslydetermined.

For our example x is the drag factor:

55 0.60 0.65 0.70 0.75 0.80 0.85 0.90

x

x

Figure: The shaded region represents 95% of the area and is bounded byx ±1.96s. This gives the lower bound of 0.771 and the upper bound is0.823. This technique does not account for any uncertianty in the meanor standard deviation.


Crush Stiffness

We can use this for more than drag factors...

Example

A query of the StifCalcs database for 1991-1993 frontal crash testdata for the Honda Accord provided the following data for the Astiffness coefficient values:

354.7600.0649.7362.4908.2353.6386.7

578.9364.2424.9620.2857.1553.5627.6

936.8311.4624.6601.5752.4405.2326.7

703.6320.9347.3332.8328.1

Let’s determine a range to use in an analysis using thespreadsheet...




Label A Stiffness 4N6XPRT StifCalcs for 1991-1993 Honda Accord Frontal Impacts

Data 354.7

600

649.7

362.4

908.2

353.6

386.7 Summary Statistics

578.9 Count 26

364.2 Sample Mean 523.577

424.9 Sample Std Dev 195.740

620.2 Standard Error 38.388

857.1 Variance 38313.980

533.5

627.6 Significance 95%

936.8 alpha 0.05 (this can be changed)

311.4

624.6 Confidence Interval for the Mean

601.5 Student-t 2.06 (two tailed)

752.4 Low Mean 444.67

405.2 High Mean 602.48

326.7

703.6 Upper Bound on the Variance

320.9 Chi Squared Left 14.611 (one sided)

347.3 Max Variance 65554.909

332.8 Max Stdev 256.037

328.1




lower bound 139.934











Problem: Huge Scatter

1 The spreadsheet gives the following:

Alow = 89.18 lb/in

Ahigh = 960.39 lb/in

2 The range is absurd!!

Ratio of s to x is largeUnderlying distribution does not follow a Normal Distribution

3 Further investigation reveals that the data included vehicle tovehicle data combined with vehicle to barrier data.

4 Cannot take samples from two different populations.

Note:

Excessive ranges indicate improper data or testing techniques. Inthis case, check the original crash test reports and ensure the samephysical process is being measured.


Are These Data Normal?

Comparison of the Cumulative Probability Functions

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 200 400 600 800 1000 1200

A Stiffness Coefficient

Cu

mu

lati

ve P

rob

ab

ilit

y (

CD

F)

Normal CDF

A Coeff Emperical CDF

Figure: The actual data are sorted and ranked. Their ranking is dividedby n+1 to get the percentile (probability). The plot of the percentile(empirical CDF) is compared to the plot of the Normal CDF[=NORMINV(prob,mean,stddev)]. These data do not appear to benormally distributed. (There are two different populations.)


Perception-Reaction Time Example

Adjusted Perception-Reaction time was measured for a vehiclefollowing situation.

The data were taken from [Muttart, 2005] and entered intothe spreadsheet.

A normal driver was defined as the driver in the middle twothirds so α = 1/3.

Results for 66%, 95% are compared.

Plots of the CDF are plotted.




Label PRT for Vehicle following for rad/s > .007 (Muttart, 2005)

Data 2.120

1.200

1.100

1.100

1.550

1.300


1.500 Count 35




0.900 Variance 0.1503840336

1.720



0.950



0.800 Low Mean 1.2514

1.160 High Mean 1.3801

0.900



1.550 Max Variance 0.170777

1.000 Max Stdev 0.413251

1.400

2.320 Bootstrap (Monte Carlo) Results (use this range if it makes sense)

1.220 lower bound 0.9150.980 upper bound 1.698

1.690

1.200 Significance Bounds on Most Likely Normal Distribution

1.140 lower bound 0.940

1.070 upper bound 1.691 (this is computed from summary Stats)

1.870

1.610 Difference (should be positive)











Label PRT for Vehicle following for rad/s > .007 (Muttart, 2005)

Data 2.120

1.200

1.100

1.100

1.550

1.300


1.500 Count 35




0.900 Variance 0.1503840336

1.720



0.950



0.800 Low Mean 1.1826

1.160 High Mean 1.4488

0.900




1.000 Max Stdev 0.485812

1.400


1.220 lower bound 0.4810.980 upper bound 2.157

1.690

1.200 Significance Bounds on Most Likely Normal Distribution

1.140 lower bound 0.556

1.070 upper bound 2.076 (this is computed from summary Stats)

1.870

1.610 Difference (should be positive)









Are PRT Data Normal?

Cumulative Probability of Vehicle Following PRT

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.000 0.500 1.000 1.500 2.000 2.500 3.000

Perception-Reaction Time

Cu

mu

lati

ve P

rob

ab

ilit

y

Emperical

Normal

LogNormal

Figure: The actual data are sorted and ranked. Their ranking is dividedby n+1 to get the percentile (probability). The plot of the percentile(empirical CDF) is compared to the plot of the Normal CDF[=NORMINV(prob,mean,stddev)]. A log normal CDF is also shown.These data are near normal.


Examine Walking Speeds for Insight on Sample Size

Walking speeds were measured of a reconstruction class(Wisconsin 2002)

Pedestrians were timed walking 100 ft

Average walking speeds were computed in mph for eachparticipant

The data was analyzed unsorted (as it was obtained)

Example

Perform the range analysis for data sets of different sizes.




Label Walking Speeds of a Reconstruction Class (Ft. McCoy, Wisconsin 2005)

Data 3.833

3.394

Summary Statistics

Count 2

Sample Mean 3.613



Variance 0.0962611539

Significance 95%




Low Mean 2.6693

High Mean 4.5572




Max Stdev 4.947788


lower bound -13.350upper bound 16.270


lower bound 3.005














Data 3.833

3.394

3.742

Summary Statistics

Count 3

Sample Mean 3.656



Variance 0.0536720707

Significance 95%




Low Mean 3.2305

High Mean 4.0819




Max Stdev 1.022925




lower bound 3.202














Data 3.833

3.394

3.742

3.021

Summary Statistics

Count 4

Sample Mean 3.497



Variance 0.1366764031

Significance 95%




Low Mean 2.9841

High Mean 4.0106




Max Stdev 1.079520




lower bound 2.773














Data 3.833

3.394

3.742

3.021

3.742

3.328


2.862 Count 8

Sample Mean 3.380



Variance 0.1337944855

Significance 95%




Low Mean 3.0815

High Mean 3.6779




Max Stdev 0.657361




lower bound 2.663














Data 3.833

3.394

3.742

3.742

3.021

3.328


2.862 Count 12




3.236 Variance 0.0979774229

Significance 95%




Low Mean 3.1197

High Mean 3.5135




Max Stdev 0.485370




lower bound 2.703














Data 3.833

3.394

3.742

3.742

3.021

3.328


2.862 Count 16




3.236 Variance 0.0990953990

2.924



3.311



Low Mean 3.0728

High Mean 3.4065




Max Stdev 0.452455




lower bound 2.623














Data 3.833

3.394

3.742

3.742

3.021

3.328


2.862 Count 20




3.236 Variance 0.0822245426

2.924



3.311



3.183 Low Mean 3.0925

3.008 High Mean 3.3600




Max Stdev 0.392963




lower bound 2.664














Data 3.833

3.394

3.742

3.742

3.021

3.328


2.862 Count 24




3.236 Variance 0.0819145686

2.924



3.311



3.183 Low Mean 3.1287

3.008 High Mean 3.3698

3.145




Max Stdev 0.379373




lower bound 2.688














Data 3.833

3.394

3.742

3.742

3.021

3.328


2.862 Count 28




3.236 Variance 0.0824666760

2.924



3.311



3.183 Low Mean 3.1661

3.008 High Mean 3.3885

3.145




3.242 Max Stdev 0.371292

3.212


3.761 lower bound 2.670upper bound 3.908


lower bound 2.714











The Bounds on Walking Speeds According to the Number of

Samples

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

0 4 8 12 16 20 24 28

Number of Samples

Wa

lkin

g S

pe

ed

s (

mp

h)

Upper Bound

MLE Upper Bound

Mean

MLE Lower Bound

Lower Bound


Sample Size Effect

As the number of samples increases, we recover the inherentvariability.

With small samples we are more uncertain the fixed butunknown parameters.

More samples reduces the difference between ”most likely”and Bootstrap Method.

Walking speeds follow a normal distribution.


Graph Showing the Effect of Sample Size

Cumulative Probability of Walking Speeds

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2.50 3.00 3.50 4.00

Walking Speed (mph)

Cu

mu

lati

ve P

rob

ab

ilit

y

Emperical

Normal


Variation Between Drag Sleds and Accelerometers

Example

Determine the range from a series of friction tests based from dragsleds and accelerometers.

Friction determination of the same surface at the same timeusing different techniques

NAPARS Conference June of 2005 in Columbus OH

Vehicle mounted accelerometer (VC3000)

Friction based on drag sleds

Determine a range given both sets of data


Range Estimation of a Normally Distributed Variable With an Unkown Mean and

Unknown Standard Deviation

Label Drag Factor NAPARS VC3000

Data 0.640

0.645

0.625

0.661

0.645

0.645


0.599 Count 15




0.645 Variance 0.00081

0.626

0.630 Confidence 95%




Low Mean 0.6280

High Mean 0.6594




Max Stdev 0.04161

Monte Carlo Results (use this range if it makes sense)


Confidence Bounds on Most Likely Normal Distribution

lower bound 0.588











Range Estimation of a Normally Distributed Variable With an Unkown Mean and

Unknown Standard Deviation

Label Drag Factor NAPARS Drag Sled

Data 0.736

0.670

0.736

0.730

0.730

0.730


0.800 Count 21




0.500 Variance 0.01118

0.550

0.500 Confidence 95%


0.736



0.730 Low Mean 0.6555

0.920 High Mean 0.7515

0.675




Max Stdev 0.14352

Monte Carlo Results (use this range if it makes sense)


Confidence Bounds on Most Likely Normal Distribution

lower bound 0.496











Drag Sled and Accelerometer Remarks

The range from the accelerometer was 0.57 to 0.71

The range from the drag sleds was was 0.46 to 0.94

This was the same surface– different techniques!

More consistent testing methods reduce variation.

Make sure the samples come from the same population.

Make sure the population represents the proper physics.

The probability of getting values in the tails is smaller thangetting a centrally located value.


Outline








7 Conclusion


Conclusions

If the parent population is near normally distributed, then arange can be determined based on sampling statistics.

As the number of samples increases, the unknown populationmean and standard deviation will converge to the estimatedmean and standard deviation.

If an underlying distribution is clearly not normal, then therange obtained herein may be inappropriate.

The distribution of the mean always follows a Student-tdistribution.

If the population is normally distributed and the range seemtoo large, then more samples must be obtained.

This technique does not acknowledge any prior understandingof the data.


References I

B. M. Ayyub and R. H. McCuen.Probability, Statistics, and Reliability for Engineers.CRC Press LLC, Boca Raton, 1997.

Colin G. G. Aitken and Franco Taroni.Statistics and the Evaluation of Evidence for ForensicScientists.John Wiley & Sons, Chichester, 2004.

W. Bartlett et al.Evaluating the uncertianty in various measurement taskscommon to accident reconstruction.In Accident Reconstruction SP-1666, number 2002-01-0546 inSP, pages 57–70. Society of Automotive Engineers,Warrendale, PA, March 2002.


References II

R. B. Dean and W. J. Dixon.Simplified statistics for small numbers of observations.Analytical Chemistry, 23(4):636–638, 1951.

Larry Gonick and Woollcott Smith.The Cartoon Guide To Statistics.Harper Collins, New York, 1993.A good introduction to statistics.

James Joyce.Bayes’ theorem.In Ed-ward N. Zalta, editor, The Stanford Encyclopedia of Philosophy.http://plato.stanford.edu/archives/win2003/entries/bayes-

Winter 2003.Last accessed on 17 August 2005.

http://plato.stanford.edu/archives/win2003/entries/bayes-theorem/


References III

C. C. O. Marks.Accident analysis uncertianty in the forensic context.Personal Communication, 2002.

Jeffrey W. Muttart, William F. Messerschmidt, and Larry G.Gillen.Relationship between relative velocity detection and driverresponse time in vehicle following situations.In Human Factors in Driving, Telematics and Seating Comfort(SP-1934), number 2005-01-0427. Society of AutomotiveEngineers, Warrendale, PA, 2005.


References IV

Douglas C. Montgomery.Design and Analysis of Experiments.John Wiley and Sons, New York, 5th edition, 2004.A complete discussion of designing, conducting, and analyzingexperiments.

William L. Oberkampf, Jon C. Helton, and Kari Sentz.Mathematical representation of uncertainty.American Institute of Aeronautics and Astronautics (AIAA),(1645):1–22, 2001.

Bernard Roberson and G. A. Vignaux.Interpreting Evidence: Evaluating Foresic Science in theCourtroom.John Wiley & Sons, Chichester, 1995.


References V

Rinaldo B. Schinazi.Probability with Statistical Applications.Birkhauser, Boston, 2001.

Daniel W. Vomhof.A comprehensible approach to statistical evaluation indocument examination.Personal Communication, 2002.

Pete Wildman.Estimating a population standard deviation or variance.http://wind.cc.whecn.edu/~pwildman/statnew/estimating_a_population_standard_devation_or_varia

2002.Last accessed on 28 August 2005.

http://wind.cc.whecn.edu/~pwildman/statnew/estimating_a_population_standard_devation_or_variance.htm

Finding a Range Using Statistics In Traffic Crash...

Documents

Transcript of Finding a Range Using Statistics In Traffic Crash...