STA 101 Final Review - Statistical Sciencetjl13/s101/final_review.pdf · STA 101 Final Review...

STA 101 Final Review

Statistics 101

Thomas Leininger

June 24, 2013

Announcements

All work (besides projects) should be returned to you and shouldbe entered on Sakai.

Office Hour: 2–3pm today (Old Chem 114)

Final Exam: 9am–12pm HEREBring a calculator–no cell phones, laptops, tablets, etc.Allowed one 8 1

2 × 11 inch cheat sheet with notes on both sides.You must create this yourself.

Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 2 / 19

Topics for today

different conditions and hypotheses in each test (and a list of tests) — when touse Z, T, F stats, 2 vs 1 sample tests, degrees of freedom

pooled variance, pooled proportion (when to use)

confidence intervals – is it always two-sided?, how to interpret CI of differenceb/w 2 means

chi-square – what we are testing, when to use, how to approach hypotheses

ANOVA (filling in the chart)

Regression – interpreting linear lines and writing them, correlation, residuals

Type I, II error

Bayesian probability – won’t be on there (but cond’l probability might)

MLR – won’t be on there


Review

1 Review

Statistics 101

STA 101 Final Review Thomas Leininger

Review

What you need to know about HTs:

Format for answering a hypothesis test question:

1 State the null and alternative hypotheses2 Check conditions3 Calculate the test statistic (T, Z, etc.) and standard error (if

needed)4 Calculate the p-value (double if two-sided hypothesis)5 Reject or fail to reject the null hypothesis6 Interpret your decision in context of the problem

Know how to interpret a p-value in context


Review

What you need to know about CIs:

Format for answering a confidence interval question:

1 Check conditions2 Find and state the critical value (z?, t?df )

3 Calculate the standard error4 Calculate the confidence interval5 Interpret your confidence interval in context of the problem


Review

Different types of tests

General conditions for HTs/CIs with means or proportions:

Independence–random samples (¡10% of population sampled)

nearly normal data–either we know the population is normal orwe have to use CLT

Conditions for CLT:

sample looks nearly normal, no large skew, no major outliers. Ifnot met, should use randomization

sample size ≥ 30. Else, you should use a t-distribution.


Review

One sample test for mean

Conditions for one sample test:Independence–random samples (¡10% of population sampled)

sample looks nearly normal, no large skew, no major outliers. If not met, should userandomization

sample size ≥ 30. Else, you should use a t-distribution.

Note: if paired data, use differences as a one sample test

Conditions for two sample test:Independence–random samples (¡10% of population sampled)

sample looks nearly normal, no large skew, no major outliers. If not met, should userandomization

both sample sizes ≥ 30. Else, you should use a t-distribution (if either or both are smallerthan 30).


Review

Recap - inference for one proportion

Population parameter: p, point estimate: p

Conditions:independence- random sampleat least 10 successes and failures- if not→ randomization

Standard error: SE =√

p(1−p)n

for CI: use pfor HT: use p0


Review

Recap - inference for one proportion

Population parameter: p, point estimate: pConditions:

independence- random sampleat least 10 successes and failures- if not→ randomization

Standard error: SE =√

p(1−p)n

for CI: use pfor HT: use p0


Review

Recap - comparing two proportions

Population parameter: (p1 − p2), point estimate: (p1 − p2)

Conditions:independence within groups- random sample and 10% condition met for both groupsindependence between groupsat least 10 successes and failures in each group- if not→ randomization

SE(p1−p2) =

√p1(1−p1)

n1+

p2(1−p2)n2

for CI: use p1 and p2for HT:

when H0 : p1 = p2: use ppool =# suc1+#suc2

n1+n2when H0 : p1 − p2 = (some value other than 0): use p1 and p2

- this is pretty rare


Review

Recap - comparing two proportions

Population parameter: (p1 − p2), point estimate: (p1 − p2)Conditions:

independence within groups- random sample and 10% condition met for both groupsindependence between groupsat least 10 successes and failures in each group- if not→ randomization

SE(p1−p2) =

√p1(1−p1)

n1+

p2(1−p2)n2

for CI: use p1 and p2for HT:

when H0 : p1 = p2: use ppool =# suc1+#suc2

n1+n2when H0 : p1 − p2 = (some value other than 0): use p1 and p2

- this is pretty rare


Review

Reference - standard error calculations

one sample two samples

mean SE = s√n

SE =

√s2

1n1+

s22

n2

proportion SE =√

p(1−p)n SE =

√p1(1−p1)

n1+

p2(1−p2)n2

When working with means, it’s very rare that σ is known, so weusually use s.When working with proportions,

if doing a hypothesis test, p comes from the null hypothesisif constructing a confidence interval, use p instead


Review



mean SE = s√n

SE =

√s2

1n1+

s22

n2

proportion SE =√

p(1−p)n SE =

√p1(1−p1)

n1+

p2(1−p2)n2

When working with means, it’s very rare that σ is known, so weusually use s.

When working with proportions,if doing a hypothesis test, p comes from the null hypothesisif constructing a confidence interval, use p instead


Review



mean SE = s√n

SE =

√s2

1n1+

s22

n2

proportion SE =√

p(1−p)n SE =

√p1(1−p1)

n1+

p2(1−p2)n2

When working with means, it’s very rare that σ is known, so weusually use s.When working with proportions,

if doing a hypothesis test, p comes from the null hypothesisif constructing a confidence interval, use p instead


Review

When to use pooled standard error

For a two sample HT/CI for a difference of means, we can pool ourinformation about the variance if s1 and s2 can be assumed to beroughly equal

If we can do this, then we replace s21 and s2

2 with s2pool, where

s2pool =

s21(n1 − 1) + s2

2(n2 − 1)n1 + n2 − 2

The degrees of freedom for the t-distribution are now df = n1 + n2 − 2


Review

When to use pooled proportion

When doing a HT for a two sample test for a difference of proportions,the null hypothesis is that p1 = p2 or p1 − p2 = 0

Since we assume that H0 is true in a HT, we assume ppool = p1 = p2,which means our standard error is

SE =

√ppool(1 − ppool)

n1+

ppool(1 − ppool)n2

We estimate ppool with ppool =# of successes1 + # of successes2

n1 + n2


Review

More on confidence intervals

how to interpret diff b/w two means


Review

ANOVA—filling in chart

Exercise 5.40Educational attainment

Less than HS HS Jr Coll Bachelor’s Graduate TotalMean 38.67 39.6 41.39 42.55 40.85 40.45SD 15.81 14.97 18.1 13.62 15.51 15.17n 121 546 97 253 155 1,172

Df Sum Sq Mean Sq F value Pr(>F)

degree XXXXX XXXXX 501.54 XXXXX 0.0682

Residuals XXXXX 267,382 XXXXX

Total XXXXX XXXXX


Review

Chi-square tests

Goodness of Fit: 1 variable

H0 : There is no inconsistency between the observed and theexpected counts.

HA : There is an inconsistency between the observed and theexpected counts.

Test of Independence: 2 variables

H0 : Variable 1 and Variable 2 are independent.

HA : Variable 1 and Variable 2 are not independent.


Review

Regression

Exercise 7.28

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.012701 0.012638 -1.005 0.332

bac$Beers 0.017964 0.002402 7.480 2.97e-06

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

2 4 6 8

0.05

0.10

0.15

Cans of beer

BA

C (

gram

s pe

r de

cilit

er)

Residual standard error: 0.02044 on 14 degrees of freedom

Multiple R-squared: 0.7998, Adjusted R-squared: 0.7855

F-statistic: 55.94 on 1 and 14 DF, p-value: 2.969e-06

Write the equation of the regression line. Interpret the slope andintercept in context.

Do the data provide strong evidence that drinking more cans of beer isassociated with an increase in blood alcohol? State the null andalternative hypotheses, report the p-value, and state your conclusion.

What is R2? Interpret R2 in context. What is the correlation?


Review

Regression

Conditions for regression

linearity

nearly normal residuals

constant variability (of residuals)

Residuals

Predict BAC content for someone who has had 5 cans of beer:

ˆBAC = −0.012701 + 0.017964 ∗ 5 = 0.077099

Observed BAC is 0.10

Residual is yi − yi = 0.10 − 0.077 = 0.023


Review

Type I, Type II error

Decisionfail to reject H0 reject H0

H0 true X Type 1 ErrorTruth

HA true Type 2 Error X

Type 1 error is rejecting H0 when you shouldn’t have, and theprobability of doing so is α (significance level)

Type 2 error is failing to reject H0 when you should have, and theprobability of doing so is β (more complicated to calculate)

Power of a test is the probability of correctly rejecting H0, and theprobability of doing so is 1 − β

In hypothesis testing, we want to keep α and β low, but there areinherent trade-offs.


Review

Calculating sample sizes

For our CIs, we can calculate the minimum sample size needed toprovide a certain margin of error

For a desired (maximum) ME, call it m, we start with

m ≤ ME = critical value × SE (function on n)

Solve for n, should always look like n ≥ 123.45

Then round n up to the next whole number (123.45→ 124).

For a two-sample CI, you would need both sample sizes to be at leastthis large.


STA 101 Final Review - Statistical Sciencetjl13/s101/final_review.pdf · STA 101 Final Review...

Documents

Transcript of STA 101 Final Review - Statistical Sciencetjl13/s101/final_review.pdf · STA 101 Final Review...