STA 101 Final Review - Statistical Sciencetjl13/s101/final_review.pdf · STA 101 Final Review...
Transcript of STA 101 Final Review - Statistical Sciencetjl13/s101/final_review.pdf · STA 101 Final Review...
STA 101 Final Review
Statistics 101
Thomas Leininger
June 24, 2013
Announcements
All work (besides projects) should be returned to you and shouldbe entered on Sakai.
Office Hour: 2–3pm today (Old Chem 114)
Final Exam: 9am–12pm HEREBring a calculator–no cell phones, laptops, tablets, etc.Allowed one 8 1
2 × 11 inch cheat sheet with notes on both sides.You must create this yourself.
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 2 / 19
Topics for today
different conditions and hypotheses in each test (and a list of tests) — when touse Z, T, F stats, 2 vs 1 sample tests, degrees of freedom
pooled variance, pooled proportion (when to use)
confidence intervals – is it always two-sided?, how to interpret CI of differenceb/w 2 means
chi-square – what we are testing, when to use, how to approach hypotheses
ANOVA (filling in the chart)
Regression – interpreting linear lines and writing them, correlation, residuals
Type I, II error
Bayesian probability – won’t be on there (but cond’l probability might)
MLR – won’t be on there
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 3 / 19
Review
1 Review
Statistics 101
STA 101 Final Review Thomas Leininger
Review
What you need to know about HTs:
Format for answering a hypothesis test question:
1 State the null and alternative hypotheses2 Check conditions3 Calculate the test statistic (T, Z, etc.) and standard error (if
needed)4 Calculate the p-value (double if two-sided hypothesis)5 Reject or fail to reject the null hypothesis6 Interpret your decision in context of the problem
Know how to interpret a p-value in context
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 4 / 19
Review
What you need to know about CIs:
Format for answering a confidence interval question:
1 Check conditions2 Find and state the critical value (z?, t?df )
3 Calculate the standard error4 Calculate the confidence interval5 Interpret your confidence interval in context of the problem
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 5 / 19
Review
Different types of tests
General conditions for HTs/CIs with means or proportions:
Independence–random samples (¡10% of population sampled)
nearly normal data–either we know the population is normal orwe have to use CLT
Conditions for CLT:
sample looks nearly normal, no large skew, no major outliers. Ifnot met, should use randomization
sample size ≥ 30. Else, you should use a t-distribution.
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 6 / 19
Review
One sample test for mean
Conditions for one sample test:Independence–random samples (¡10% of population sampled)
sample looks nearly normal, no large skew, no major outliers. If not met, should userandomization
sample size ≥ 30. Else, you should use a t-distribution.
Note: if paired data, use differences as a one sample test
Conditions for two sample test:Independence–random samples (¡10% of population sampled)
sample looks nearly normal, no large skew, no major outliers. If not met, should userandomization
both sample sizes ≥ 30. Else, you should use a t-distribution (if either or both are smallerthan 30).
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 7 / 19
Review
Recap - inference for one proportion
Population parameter: p, point estimate: p
Conditions:independence- random sampleat least 10 successes and failures- if not→ randomization
Standard error: SE =√
p(1−p)n
for CI: use pfor HT: use p0
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 8 / 19
Review
Recap - inference for one proportion
Population parameter: p, point estimate: pConditions:
independence- random sampleat least 10 successes and failures- if not→ randomization
Standard error: SE =√
p(1−p)n
for CI: use pfor HT: use p0
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 8 / 19
Review
Recap - inference for one proportion
Population parameter: p, point estimate: pConditions:
independence- random sampleat least 10 successes and failures- if not→ randomization
Standard error: SE =√
p(1−p)n
for CI: use pfor HT: use p0
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 8 / 19
Review
Recap - comparing two proportions
Population parameter: (p1 − p2), point estimate: (p1 − p2)
Conditions:independence within groups- random sample and 10% condition met for both groupsindependence between groupsat least 10 successes and failures in each group- if not→ randomization
SE(p1−p2) =
√p1(1−p1)
n1+
p2(1−p2)n2
for CI: use p1 and p2for HT:
when H0 : p1 = p2: use ppool =# suc1+#suc2
n1+n2when H0 : p1 − p2 = (some value other than 0): use p1 and p2
- this is pretty rare
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 9 / 19
Review
Recap - comparing two proportions
Population parameter: (p1 − p2), point estimate: (p1 − p2)Conditions:
independence within groups- random sample and 10% condition met for both groupsindependence between groupsat least 10 successes and failures in each group- if not→ randomization
SE(p1−p2) =
√p1(1−p1)
n1+
p2(1−p2)n2
for CI: use p1 and p2for HT:
when H0 : p1 = p2: use ppool =# suc1+#suc2
n1+n2when H0 : p1 − p2 = (some value other than 0): use p1 and p2
- this is pretty rare
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 9 / 19
Review
Recap - comparing two proportions
Population parameter: (p1 − p2), point estimate: (p1 − p2)Conditions:
independence within groups- random sample and 10% condition met for both groupsindependence between groupsat least 10 successes and failures in each group- if not→ randomization
SE(p1−p2) =
√p1(1−p1)
n1+
p2(1−p2)n2
for CI: use p1 and p2for HT:
when H0 : p1 = p2: use ppool =# suc1+#suc2
n1+n2when H0 : p1 − p2 = (some value other than 0): use p1 and p2
- this is pretty rare
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 9 / 19
Review
Recap - comparing two proportions
Population parameter: (p1 − p2), point estimate: (p1 − p2)Conditions:
independence within groups- random sample and 10% condition met for both groupsindependence between groupsat least 10 successes and failures in each group- if not→ randomization
SE(p1−p2) =
√p1(1−p1)
n1+
p2(1−p2)n2
for CI: use p1 and p2for HT:
when H0 : p1 = p2: use ppool =# suc1+#suc2
n1+n2when H0 : p1 − p2 = (some value other than 0): use p1 and p2
- this is pretty rare
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 9 / 19
Review
Reference - standard error calculations
one sample two samples
mean SE = s√n
SE =
√s2
1n1+
s22
n2
proportion SE =√
p(1−p)n SE =
√p1(1−p1)
n1+
p2(1−p2)n2
When working with means, it’s very rare that σ is known, so weusually use s.When working with proportions,
if doing a hypothesis test, p comes from the null hypothesisif constructing a confidence interval, use p instead
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 10 / 19
Review
Reference - standard error calculations
one sample two samples
mean SE = s√n
SE =
√s2
1n1+
s22
n2
proportion SE =√
p(1−p)n SE =
√p1(1−p1)
n1+
p2(1−p2)n2
When working with means, it’s very rare that σ is known, so weusually use s.
When working with proportions,if doing a hypothesis test, p comes from the null hypothesisif constructing a confidence interval, use p instead
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 10 / 19
Review
Reference - standard error calculations
one sample two samples
mean SE = s√n
SE =
√s2
1n1+
s22
n2
proportion SE =√
p(1−p)n SE =
√p1(1−p1)
n1+
p2(1−p2)n2
When working with means, it’s very rare that σ is known, so weusually use s.When working with proportions,
if doing a hypothesis test, p comes from the null hypothesisif constructing a confidence interval, use p instead
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 10 / 19
Review
When to use pooled standard error
For a two sample HT/CI for a difference of means, we can pool ourinformation about the variance if s1 and s2 can be assumed to beroughly equal
If we can do this, then we replace s21 and s2
2 with s2pool, where
s2pool =
s21(n1 − 1) + s2
2(n2 − 1)n1 + n2 − 2
The degrees of freedom for the t-distribution are now df = n1 + n2 − 2
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 11 / 19
Review
When to use pooled proportion
When doing a HT for a two sample test for a difference of proportions,the null hypothesis is that p1 = p2 or p1 − p2 = 0
Since we assume that H0 is true in a HT, we assume ppool = p1 = p2,which means our standard error is
SE =
√ppool(1 − ppool)
n1+
ppool(1 − ppool)n2
We estimate ppool with ppool =# of successes1 + # of successes2
n1 + n2
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 12 / 19
Review
More on confidence intervals
how to interpret diff b/w two means
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 13 / 19
Review
ANOVA—filling in chart
Exercise 5.40Educational attainment
Less than HS HS Jr Coll Bachelor’s Graduate TotalMean 38.67 39.6 41.39 42.55 40.85 40.45SD 15.81 14.97 18.1 13.62 15.51 15.17n 121 546 97 253 155 1,172
Df Sum Sq Mean Sq F value Pr(>F)
degree XXXXX XXXXX 501.54 XXXXX 0.0682
Residuals XXXXX 267,382 XXXXX
Total XXXXX XXXXX
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 14 / 19
Review
Chi-square tests
Goodness of Fit: 1 variable
H0 : There is no inconsistency between the observed and theexpected counts.
HA : There is an inconsistency between the observed and theexpected counts.
Test of Independence: 2 variables
H0 : Variable 1 and Variable 2 are independent.
HA : Variable 1 and Variable 2 are not independent.
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 15 / 19
Review
Regression
Exercise 7.28
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.012701 0.012638 -1.005 0.332
bac$Beers 0.017964 0.002402 7.480 2.97e-06
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
2 4 6 8
0.05
0.10
0.15
Cans of beer
BA
C (
gram
s pe
r de
cilit
er)
Residual standard error: 0.02044 on 14 degrees of freedom
Multiple R-squared: 0.7998, Adjusted R-squared: 0.7855
F-statistic: 55.94 on 1 and 14 DF, p-value: 2.969e-06
Write the equation of the regression line. Interpret the slope andintercept in context.
Do the data provide strong evidence that drinking more cans of beer isassociated with an increase in blood alcohol? State the null andalternative hypotheses, report the p-value, and state your conclusion.
What is R2? Interpret R2 in context. What is the correlation?
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 16 / 19
Review
Regression
Exercise 7.28
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.012701 0.012638 -1.005 0.332
bac$Beers 0.017964 0.002402 7.480 2.97e-06
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
2 4 6 8
0.05
0.10
0.15
Cans of beer
BA
C (
gram
s pe
r de
cilit
er)
Residual standard error: 0.02044 on 14 degrees of freedom
Multiple R-squared: 0.7998, Adjusted R-squared: 0.7855
F-statistic: 55.94 on 1 and 14 DF, p-value: 2.969e-06
Write the equation of the regression line. Interpret the slope andintercept in context.
Do the data provide strong evidence that drinking more cans of beer isassociated with an increase in blood alcohol? State the null andalternative hypotheses, report the p-value, and state your conclusion.
What is R2? Interpret R2 in context. What is the correlation?
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 16 / 19
Review
Regression
Exercise 7.28
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.012701 0.012638 -1.005 0.332
bac$Beers 0.017964 0.002402 7.480 2.97e-06
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
2 4 6 8
0.05
0.10
0.15
Cans of beer
BA
C (
gram
s pe
r de
cilit
er)
Residual standard error: 0.02044 on 14 degrees of freedom
Multiple R-squared: 0.7998, Adjusted R-squared: 0.7855
F-statistic: 55.94 on 1 and 14 DF, p-value: 2.969e-06
Write the equation of the regression line. Interpret the slope andintercept in context.
Do the data provide strong evidence that drinking more cans of beer isassociated with an increase in blood alcohol? State the null andalternative hypotheses, report the p-value, and state your conclusion.
What is R2? Interpret R2 in context. What is the correlation?
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 16 / 19
Review
Regression
Exercise 7.28
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.012701 0.012638 -1.005 0.332
bac$Beers 0.017964 0.002402 7.480 2.97e-06
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
2 4 6 8
0.05
0.10
0.15
Cans of beer
BA
C (
gram
s pe
r de
cilit
er)
Residual standard error: 0.02044 on 14 degrees of freedom
Multiple R-squared: 0.7998, Adjusted R-squared: 0.7855
F-statistic: 55.94 on 1 and 14 DF, p-value: 2.969e-06
Write the equation of the regression line. Interpret the slope andintercept in context.
Do the data provide strong evidence that drinking more cans of beer isassociated with an increase in blood alcohol? State the null andalternative hypotheses, report the p-value, and state your conclusion.
What is R2? Interpret R2 in context. What is the correlation?
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 16 / 19
Review
Regression
Conditions for regression
linearity
nearly normal residuals
constant variability (of residuals)
Residuals
Predict BAC content for someone who has had 5 cans of beer:
ˆBAC = −0.012701 + 0.017964 ∗ 5 = 0.077099
Observed BAC is 0.10
Residual is yi − yi = 0.10 − 0.077 = 0.023
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 17 / 19
Review
Regression
Conditions for regression
linearity
nearly normal residuals
constant variability (of residuals)
Residuals
Predict BAC content for someone who has had 5 cans of beer:
ˆBAC = −0.012701 + 0.017964 ∗ 5 = 0.077099
Observed BAC is 0.10
Residual is yi − yi = 0.10 − 0.077 = 0.023
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 17 / 19
Review
Type I, Type II error
Decisionfail to reject H0 reject H0
H0 true X Type 1 ErrorTruth
HA true Type 2 Error X
Type 1 error is rejecting H0 when you shouldn’t have, and theprobability of doing so is α (significance level)
Type 2 error is failing to reject H0 when you should have, and theprobability of doing so is β (more complicated to calculate)
Power of a test is the probability of correctly rejecting H0, and theprobability of doing so is 1 − β
In hypothesis testing, we want to keep α and β low, but there areinherent trade-offs.
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 18 / 19
Review
Calculating sample sizes
For our CIs, we can calculate the minimum sample size needed toprovide a certain margin of error
For a desired (maximum) ME, call it m, we start with
m ≤ ME = critical value × SE (function on n)
Solve for n, should always look like n ≥ 123.45
Then round n up to the next whole number (123.45→ 124).
For a two-sample CI, you would need both sample sizes to be at leastthis large.
Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 19 / 19