STATISTICS 200 - personal.psu.edupersonal.psu.edu/drh20/200/lectures/lecture21.pdfSTAT 200 students...

Post on 05-May-2018

214 views 1 download

Transcript of STATISTICS 200 - personal.psu.edupersonal.psu.edu/drh20/200/lectures/lecture21.pdfSTAT 200 students...

STATISTICS 200 Lecture #21 Tuesday, November 1, 2016 Textbook: 9.7, 9.8, 11.1, 11.2, 11.3, 11.4

• Apply sampling distribution for one sample mean to confidence intervals. • Apply sampling distribution for difference of two sample means to confidence intervals. • Apply sampling distribution for sample mean of (paired) differences to confidence intervals. • Recognize similarities between one mean and mean of paired differences.

Objectives:

We have begun a strong focus on Inference

One population proportion

Two population proportions

One population

mean Difference between Means

Mean difference

Proportions Means

This week

Example 2 from Thursday: We ask each of 31 students “how many regular ‘text’ friends do you have?”

Clicker Question: What kind of variable is this? A. Categorical B. Quantitative

Survey results: n = 31 X-bar = 6 friends s = 2.0 friends

Calculate a 95% Confidence Interval: How can we estimate the population mean number of regular “text” friends for all STAT 200 students using these data?

Confidence Interval Formula sample estimate ± (margin of error) sample estimate ± (multiplier × standard error)

Generic Formula:

Survey results: n = 31 X-bar = 6 friends s = 2.0 friends

6.00±2.04⇥ 2.0p31

= 6.00±0.73

Thus, the 95% CI is

We are 95% confident that the… a.  sample mean b.  sample proportion c.  population mean d.  population proportion e.  range of values for the …number of regular “text” friends for STAT 200 students is between 5.3 and 6.7 friends.

Confidence Interval Interpretation

Calculated Interval: 6.0 ± 0.7 friends (5.3 to 6.7 friends)

Confidence Interval Conclusion

In the population, we may conclude, with 95% confidence, that on average, STAT 200 students have A. more than 6 friends. B.  more than 4 friends. C.  fewer than 5 friends. D.  fewer than 6 friends.

95% C.I.: 5.3 to 6.7 friends

Are all sampling distributions normal? _____

When do we have to be cautious? 1.  with _____ sample sizes 2.  where the original population is not ______ in shape

One-Sample t procedure is valid if one of the conditions for

normality is met:

Sample data suggest a normal shape

We have a large sample size (n ≥ __)

or

Sampling distribution will look normal in shape

small

No

30

normal

Example: Compare predicted GPAs of males and females in STAT 200

Q: What do you think your actual GPA will be when you gradaute?

Students in this class: Representative sample (?) of all STAT 200 students, with nf=157 and nm=130.

Parameter of interest: µf � µm

Example: Compare predicted GPAs of males and females in STAT 200

• Parameter of interest: • Estimate of the parameter: • Statistics collected from the sample:

Xf �Xm

µf � µm

Xf = 3.470, sf = 0.286

Xm = 3.456, sm = 0.304

What do we need in order to create a CI for ? µf � µm

Formula for CI for

sample estimate ± (multiplier × standard error)

µf � µm

Xf �XmRoughly 2.0 for 95% confidence

???

What is the standard error of ? Xf �Xm

p(S.E.#1)2 + (S.E.#2)2

Example: Compare predicted GPAs of males and females in STAT 200

Xf = 3.470, sf = 0.286, nf = 157

Xm = 3.456, sm = 0.304, nm = 130

Here are the data, summarized:

Thus, • Estimate = 3.470 – 3.456, which is 0.014. • Multiplier = roughly 2 (more on this later…) • SE (estimate) =

r0.2862

157+

0.3042

130= 0.035

Example: Are smokers and non-smokers different heights on average?

Variable SmokeCig N N* Mean SE Mean StDevHeight No 264 5 67.275 0.249 4.041 Yes 19 0 68.211 0.736 3.207

Summary of class data from Minitab:

Based on these data, a 95% CI would be roughly:

(A) 264 – 19 ± 2 × sqrt(4.0412 + 3.2072) (B) 67.28 – 68.21 ± 2 × sqrt(4.0412 + 3.2072) (C) 264 – 19 ± 2 × sqrt(0.2492 + 0.7362) (D) 67.28 – 68.21 ± 2 × sqrt(0.2492 + 0.7362)

However, there is a slight problem with the multiplier of 2…

Example: Computer versus TV 25 students in a liberal arts course were given a survey that asked them how many hours per week they watched television and how many hours per week they used a computer. The goal is to determine if there is a difference in the mean number of hours spent per week on computers versus TV.

Consider the statements below: •  The two samples are dependent •  The experimental unit is a student •  The response variable is quantitative •  This is a randomized experiment

Clicker Question: How many of those

statements are TRUE? A.  0 C. 2 B.  1 D. 3

Example: Computer versus TV

student Computer TV

1 30 20

2 20 25

3 10 10

4 10 5

“ “ “ 25 20 15.0

(experimental) unit: student

response variable: Number of hours

Variation you want to… •  reduce: the variation from student to student •  explain: the variation due to type of screen

What if we construct a independent samples CI?

Difference = mu (Computer) - mu (TV) 95% CI for difference: (-3.29, 9.17)

Conclusion: Since the C.I. contains ____, can ______ claim that a

difference exists. 0 not

Problem: • The two samples are ___________

(paired) •  ______ measurements on each unit

When the two-sample t procedure is incorrectly used,

•  it captures unwanted variation found

with the two individual standard deviations

•  It is less able to find significance

Instead use: _________ procedure

dependent

two

Paired t

Data used for paired analysis

What do you notice when examining the signs of the differences?

Summary Statistics for Samples

student

Computer TV Comp - TV

1 30 20 10

2 20 25 -5

3 10 10 0

4 10 5 5

… … … … 25 20 15.0 5

Mean 17.04 14.10 2.94 StDev s1 =12.36 s2 = 9.26 sd = 5.34

Mean of the differences

Confidence Interval (key: treat it like a single mean) Calculate a 95% confidence interval to estimate the population mean difference in hours spent on a computer vs watching TV

nstd d*±

2.9 5.34

25 df = n-1 = 24

Calculation: 2.9 ± [2.06 × 5.34/sqrt(25)] = 2.9 ± 2.2 =

n = 25 students

0.7 to 5.1

If you understand today’s lecture… 11.25, 11.30, 11.32, 11.33, 11.45, 11.46, 11.52, 11.53, 11.55

Objectives: • Apply sampling distribution for one sample mean to confidence intervals. • Apply sampling distribution for difference of two sample means to confidence intervals. • Apply sampling distribution for sample mean of (paired) differences to confidence intervals. • Recognize similarities between one mean and mean of paired differences.