Statistics Introduction to Statistic [stuh-tis-tik] noun. A numerical fact or datum, especially one...
-
Upload
lindsay-martin -
Category
Documents
-
view
216 -
download
0
Transcript of Statistics Introduction to Statistic [stuh-tis-tik] noun. A numerical fact or datum, especially one...
Statistics
Introduction to
Statistic [stuh-tis-tik] noun . A numerical fact or datum, especially one computed from a sample
How long does the ball take to fall?
Measured values: See Board
• How do we decide which of these measured values is correct?
• How do we discuss the variation in our measurements?
MeanAlso known as “Average”Add all results, and divide by the number
of measurements. Equation form:
Propagation of Uncertainty
AccuracySources of Inaccuracy:
Broken measurement device
Parallax Random error ?
PrecisionSources of
Imprecision: Multiple
measurement methods
Systematic error?
Low bias, high variability
High bias, low variability
Variance and Standard DeviationSquared deviation: How much variation is
there from the mean?
Variance: measures the absolute distance observations are from the mean
n
ii
n xxnn
xxxxxxs
1
222
22
12 1....
ErrorError is the difference between the
measured and expected value
Error is how we make sense of differences between two measurements that should be the same
Error is NOT mistakes! If you made a mistake, do it again.
Types of Error DescriptionsFor a true mean, µ, and standard deviation, σ, the sample mean has an uncertainty of the mean over the square root of the number of samples.
Gives a measure of reliability of the mean.
Sample standard error tells you how close your sample mean should be to the true mean.
Using the Standard Error
xx inside
confirmed
xx outside not confirmed
This is the simplest way of using data to confirm or refute a hypothesis.
This is also what is used to create the error bars. x
x
x
Example with dataSet of values: 2, 4, 4, 4, 5, 5, 7, 9
Mean:
Standard Deviation: samples
tmeasuremenmean
#
2
2
8
42001113 22222222
2 4 4 4 5 5 7 9
85
Error TypesFalse positive: Say things are different
when they are the sameFalse negative: Say things are the same
when they are different.
No effect, null hypothesis
true
Effect exists,Null hypothesis
false
Reject null hypothesis
Type I error(false positive)
Correct
Accept null hypothesis
Correct Type II error(false negative)
Group Discussion: What happens to Standard Deviation as
sample size increases?
What does that imply about sample error?
Define standard deviation and sample error in your own words?
Summary
n
ii
n xxnn
xxxxxxs
1
222
22
12 1....
x
x
x
Mean
StandardDeviation
Variance
SampleStandard
Error
Types of Graphs: Continuous vs. Catagorical
Examples?Options:
Times of ball rolling down ramp with increasing steepness
Sales of coffee, tea and soft drinks at a restaurant
Time it takes students to commute to UCISAT scores of varying ethnic groups
Density Curve
Low values indicate a small spread (all values close to the mean)
high values indicate a large spread (all values far from the mean)
Normal Distribution
• Particularly important class of density curve
• Symmetric, unimodal, •bell-shaped
• Mean, μ, is at the center of the curve
• Probabilities are the area under the curve
• Total area = 1
The Empirical Rule
In a normal distribution with mean μ and standard deviation of σ:
• 68% of observations fall within 1 σ of the mean
• 95% of observation fall within 2 σ of the mean
• 99.7% observations fall within 3 σ of the mean
B ADF C
Example with dataSet of values: 2, 4, 4, 4, 5, 5, 7, 9
Mean:
Standard Deviation: samples
tmeasuremenmean
#
2
2
8
42001113 22222222
2 4 4 4 5 5 7 9
85
Central Limit Theorem If X follows a normal distribution with mean μ and standard
deviation σ, then x̄ is also normally distributed with mean What if X is not normally distributed?
When sampling from any population with mean μ and standard deviation σ, when n is large, the sampling distribution of x ̄ is approximately normal:
As the number of measurements increase, they will approach a normal distribution (Gaussian).
2
2
2
2
22
2
2
2
2
x
e
N
exP
x
xN
x
P x e x2
http://www.intuitor.com/statistics/CLAppClasses/CentLimApplet.htm
Visit This webpage to play with the numbers
Central Limit Theorem Summary
For large N of sample, the distribution of those mean values will be:
which is a normal distribution.
Normal distribution of CLT is independent of the type of distribution of data.
Where else would this become problematic?
Where can it still be used, but issues should be considered?
Effective Statistics
You might have strong association, but how do you prove causation? (that x causes y?)
Good evidence for causation: a well designed experiment where all other variables that cause changes in the response variable are controlled
The Scientific/Statistic Process
1. Formulating a scientific question2. Decide on the population you are
interested in3. Select a sample4. Observational study or experiment? 5. Collect data6. Analyze data7. State your conclusion
Ways to collect information from sample
Anecdotal evidenceAvailable dataObservational studyExperiment
Some CautionsStatistics can not account for poor
experimental designThere is no sharp border between
“significant” and “non-significant” correlation, only increasing and decreasing evidence
Lack of significance may be due to poorly designed experiment
z-test
• All normal distributions are the same if we standardize our data: • Units of size σ• Mean μ as center
• If x is an observation from a normal distribution, the standardized value of x is called the z-score
• Z-scores tell how many standard deviations away from the mean an observation is
z- test procedure
• To use: find the mean, standard deviation, and standard error
• Use these statistics along with the observed value to find Z value
• Consult the z-score table to find P(Z) the determined z
Equation for hypothesis testing:
ExampleJacob scores 16 on the ACT. Emily scores
670 on the SAT. Assuming that both tests measure scholastic aptitude, who has the higher score? The SAT scores for 1.4 million students in a recent graduating class were roughly normal with a mean of 1026 and standard deviation of 209. The ACT scores for more than 1 million students in the same class were roughly normal with mean of 20.8 and standard deviation of 4.8.
Example Continued
Jacob – ACT
Score: 16Mean: 20.8 Standard Dev.: 4.8
Emily - SAT
Score: 670 Mean: 1026Standard Dev.: 209
“Backwards” z-testWhat if we are given a probability (P(Z))
and we are interested in finding the observed value corresponding to the probability.?Find the Z-scoreSet up the probability (could be 2 sided)
P(-z0<Z<zo) = Convert the score to x by
Necessary assumptions for t-Test
1. Population is normally distributed.2. Sample is randomly selected from the
unknown population.3. Standard deviation of the unknown
population is the same as the known population.
So, we can take the sample standard deviation as an estimate of the known population.
ns
xt
/
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 820
0.2
0.4
0.6
0.8
1
Probability that fish populations are the same average length in each lake
T Test Accumulating Data (N) Progressively
# of samples included in analysis from each lake
Prob
abili
ty th
at th
e fis
h ar
e th
e sa
me
in b
oth
lake
s
This is typical of the kind of data many of you may generate. Let’s take a quickLook at how this T Test calculated from the data, using Excel.
z versus t proceduresUse z procedures if you know the
population standard deviationUse t procedure if you don’t know the
population standard deviationUsually we don’t know the population
standard deviation, unless told otherwiseCentral Limit Theorem
χ2-test (Goodness-of-fit) Users Guide• χ2-test tells us whether distributions of
categorical variables differ from one another• Can use to determine if your data conforms
to a functional fit.• Compares multiple means to multiple
expected values.• Can only use when you have multiple data
sets that cannot be combined into one mean.• Use when comparing means to expected
values.
χ2-testXi is each individual meanµi is each expected valueΔXi = uncertainty in Xi
d = # of mean values• χ2/d table gives probability that data matches
expected values.• In χ2/d , d is count of independent
measurements.
2 X i i 2
X i2
i1
d
χ2- (Goodness-of-fit) Test ProcedureFind averages and uncertainty for each
average.Calculate χ2 using averages, uncertainties,
and expected values.Count number of independent variables.Use table to find probability of fit accuracy
based on χ2/d and number of independent variables (d).
Example• Launch a bottle
rocket with several different volumes of water.
• Measure height of flight multiple times for each volume.
• You decide you have a fit of:
• Plot of fit with data on left.
)(m/ml V 10 - (m/ml) V0.204y 22-4
Example
7 degrees offreedom
Probability of fit ≈50%
50% of the time, chance alone could produce a larger χ2 value.
No reason to reject fit.
•This does not mean that other fits might not match the data better, so try other fits and see which one is closest.
Interpreting Results Probability is
how similar data is to expected value.
Large P means data is similar to expected value.
Small P means data is different than expected value.
SummaryPropagation of uncertainty
MeanAccuracy vs. PrecisionErrorStandard deviation Central Limit Theorem
Fit Testsz-testt-testχ2-test