Inference Using Formulas
description
Transcript of Inference Using Formulas
Statistics: Unlocking the Power of Data Lock5
Inference Using Formulas
STAT 101
Dr. Kari Lock Morgan
Chapter 6• t-distribution• Formulas for standard errors• Normal and t based inference• Matched pairs
Statistics: Unlocking the Power of Data Lock5
Confidence Interval Formula
*sample statistic z SE
From original data
From bootstrap
distribution
From N(0,1)
IF SAMPLE SIZES ARE LARGE…
Statistics: Unlocking the Power of Data Lock5
Formula for p-values
From randomization
distribution
From H0
sample statistic null valueSE
z
From original data
Compare z to N(0,1) for p-value
IF SAMPLE SIZES ARE LARGE…
Statistics: Unlocking the Power of Data Lock5
Standard Error
• Wouldn’t it be nice if we could compute the standard error without doing thousands of simulations?
• We can!!!
Statistics: Unlocking the Power of Data Lock5
Parameter Distribution Standard Error
ProportionNormal
Difference in Proportions
Normal
Mean t, df = n – 1
Difference in Means t, df = min(n1, n2) – 1
Correlation t, df = n – 2
Standard Error Formulas
(1 )p pn
2
n
1 1
1
2 2
2
(1 ) (1 )p p p pn n
2 21 2
1 2n n
Statistics: Unlocking the Power of Data Lock5
SE Formula Observationsn is always in the denominator (larger sample size
gives smaller standard error)
Standard error related to square root of 1/n
Standard error formulas use population parameters… (uh oh!)
For intervals, plug in the sample statistic(s) as your best guess at the parameter(s)
For testing, plug in the null value for the parameter(s), because you want the distribution assuming H0 true
Statistics: Unlocking the Power of Data Lock5
Null ValuesSingle proportion: H0
: p = p0 => use p0 for p
Difference in proportions: H0: p1 = p2
use the overall sample proportion from both groups (called the pooled proportion) as an estimate for both p1 and p2
Means: Standard deviations have nothing to do with the null, so just use sample statistic s
Correlation: H0: ρ = 0 => use ρ = 0
Statistics: Unlocking the Power of Data Lock5
• For quantitative data, we use a t-distribution instead of the normal distribution
• This arises because we have to estimate the standard deviations
•The t distribution is very similar to the standard normal, but with slightly fatter tails (to reflect the uncertainty in the sample standard deviations)
t-distribution
Statistics: Unlocking the Power of Data Lock5
• The t-distribution is characterized by its degrees of freedom (df)
• Degrees of freedom are based on sample size• Single mean: df = n – 1 • Difference in means: df = min(n1, n2) – 1• Correlation: df = n – 2
• The higher the degrees of freedom, the closer the t-distribution is to the standard normal
Degrees of Freedom
Statistics: Unlocking the Power of Data Lock5
t-distribution
Statistics: Unlocking the Power of Data Lock5
Aside: William Sealy Gosset
Statistics: Unlocking the Power of Data Lock5
• A matched pairs experiment compares units to themselves or another similar unit
• Data is paired (two measurements on one unit, twin studies, etc.).
• Look at the difference for each pair, and analyze as a single quantitative variable
• Matched pairs experiments are particularly useful when responses vary a lot from unit to unit; can decrease standard deviation of the response (and so decrease the standard error)
Matched Pairs
Statistics: Unlocking the Power of Data Lock5
Golden Balls: Split or Steal?
• Both people split: split the money• One split, one steal: stealer gets all the money• Both steal: no one gets any money
Would you split or steal?
a) Splitb) Steal
http://www.youtube.com/watch?v=p3Uos2fzIJ0
Van den Assem, M., Van Dolder, D., and Thaler, R., “Split or Steal? Cooperative Behavior When the Stakes Are
Large,” available at SSRN: http://ssrn.com/abstract=1592456, 2/19/11.
Statistics: Unlocking the Power of Data Lock5
To DoDo Project 1 (due Friday, 3pm)
Read Chapter 6
Do HW 5 (due Wednesday, 3/19)