Inference Using Formulas

Statistics: Unlocking the Power of Data Lock5

Inference Using Formulas

STAT 101

Dr. Kari Lock Morgan

Chapter 6• t-distribution• Formulas for standard errors• Normal and t based inference• Matched pairs


Confidence Interval Formula

*sample statistic z SE

From original data

From bootstrap

distribution

From N(0,1)

IF SAMPLE SIZES ARE LARGE…


Formula for p-values

From randomization

distribution

From H0

sample statistic null valueSE

z

From original data

Compare z to N(0,1) for p-value

IF SAMPLE SIZES ARE LARGE…


Standard Error

• Wouldn’t it be nice if we could compute the standard error without doing thousands of simulations?

• We can!!!


Parameter Distribution Standard Error

ProportionNormal

Difference in Proportions

Normal

Mean t, df = n – 1

Difference in Means t, df = min(n1, n2) – 1

Correlation t, df = n – 2

Standard Error Formulas

(1 )p pn

2

n

1 1

1

2 2

2

(1 ) (1 )p p p pn n

2 21 2

1 2n n


SE Formula Observationsn is always in the denominator (larger sample size

gives smaller standard error)

Standard error related to square root of 1/n

Standard error formulas use population parameters… (uh oh!)

For intervals, plug in the sample statistic(s) as your best guess at the parameter(s)

For testing, plug in the null value for the parameter(s), because you want the distribution assuming H0 true


Null ValuesSingle proportion: H0

: p = p0 => use p0 for p

Difference in proportions: H0: p1 = p2

use the overall sample proportion from both groups (called the pooled proportion) as an estimate for both p1 and p2

Means: Standard deviations have nothing to do with the null, so just use sample statistic s

Correlation: H0: ρ = 0 => use ρ = 0


• For quantitative data, we use a t-distribution instead of the normal distribution

• This arises because we have to estimate the standard deviations

•The t distribution is very similar to the standard normal, but with slightly fatter tails (to reflect the uncertainty in the sample standard deviations)

t-distribution


• The t-distribution is characterized by its degrees of freedom (df)

• Degrees of freedom are based on sample size• Single mean: df = n – 1 • Difference in means: df = min(n1, n2) – 1• Correlation: df = n – 2

• The higher the degrees of freedom, the closer the t-distribution is to the standard normal

Degrees of Freedom


t-distribution


Aside: William Sealy Gosset


• A matched pairs experiment compares units to themselves or another similar unit

• Data is paired (two measurements on one unit, twin studies, etc.).

• Look at the difference for each pair, and analyze as a single quantitative variable

• Matched pairs experiments are particularly useful when responses vary a lot from unit to unit; can decrease standard deviation of the response (and so decrease the standard error)

Matched Pairs


Golden Balls: Split or Steal?

• Both people split: split the money• One split, one steal: stealer gets all the money• Both steal: no one gets any money

Would you split or steal?

a) Splitb) Steal

http://www.youtube.com/watch?v=p3Uos2fzIJ0

Van den Assem, M., Van Dolder, D., and Thaler, R., “Split or Steal? Cooperative Behavior When the Stakes Are

Large,” available at SSRN: http://ssrn.com/abstract=1592456, 2/19/11.

http://www.youtube.com/watch?v=p3Uos2fzIJ0


To DoDo Project 1 (due Friday, 3pm)

Read Chapter 6

Do HW 5 (due Wednesday, 3/19)

http://stat.duke.edu/courses/Spring13/sta101.002/project1.pdf

Inference Using Formulas

Documents

Transcript of Inference Using Formulas