Chapter 15 Inference in Practice PSLS/2eChapter 151.

18
Chapter 15 Inference in Practice PSLS/2e Chapter 15 1

Transcript of Chapter 15 Inference in Practice PSLS/2eChapter 151.

Page 1: Chapter 15 Inference in Practice PSLS/2eChapter 151.

Chapter 15

Inference in Practice

PSLS/2e Chapter 15 1

Page 2: Chapter 15 Inference in Practice PSLS/2eChapter 151.

Effective use of inferential methods requires more than knowing the facts. It requires understanding the reasoning behind the

process.

Page 3: Chapter 15 Inference in Practice PSLS/2eChapter 151.

z Procedures• If we know standard deviation before data collected, the

confidence interval for is:

• To test H0: = 0, we use this statistic:

• These are called z procedures because they rely on critical values from the Z~N(0,1) density function

PSLS/2e Chapter 15 3

Page 4: Chapter 15 Inference in Practice PSLS/2eChapter 151.

Conditions for Z Procedures1. Data must resemble an SRSSRS from the population

Ask: “where did the data come from?”– Bad samples Bad samples (see next slide) invalidate methods

2. Population must be NormalNormal …BUT…a fact known as the Central Limit Central Limit TheoremTheorem tells us the sampling distribution of x-bar will be Normal even if the population is not Normal ifif the sample is “large enough”

– In practice, z procedures are robust in large samples3. Population standard deviation must be knownmust be known

before data are collected …Chapter 17 will introduce procedures that can be used when is not known

PSLS/2e Chapter 15 4

Page 5: Chapter 15 Inference in Practice PSLS/2eChapter 151.

Examples of BadBad SamplesSamples• Convenience samples - selecting members of the population

that are easiest to reach– Example: sample of mall shoppers teenagers and retired people

will be over-represented• Voluntary response samples - people who choose themselves

by responding to a broad appeal– Example: online polls are useless scientifically

(people who take the trouble to respond are not representative of the larger population)

• Under-coverage - some groups in the population are left out or underrepresented

– Example: using telephone listing to select subjects (not everyone has a listed phone number

• If the data do not come from an SRS or a randomized experiment conclusions are open to challenge.

• Always ask where the data came from.Always ask where the data came from.

PSLS/2e Chapter 15 5

Page 6: Chapter 15 Inference in Practice PSLS/2eChapter 151.

Inference about µ 604/20/23 Inference about µ 6

Normality Assumption and the Central Limit Theorem

Normality can be assumed Normality can be assumed when when n n is large because of is large because of the the Central Limit TheoremCentral Limit Theorem

• Sample size less than 15: “Normality” can be assumed if data are symmetric, have a single peak and no outliers. If data are highly skewed, avoid z [and t] procedures.

• Sample size at least 15: Normality can be assumed unless data are strongly skewed or have outliers.

• Large samples n > 30 - 60: Normality can be assumed even for skewed distributions when the sample is large (n ≥ ~40)

Page 7: Chapter 15 Inference in Practice PSLS/2eChapter 151.
Page 8: Chapter 15 Inference in Practice PSLS/2eChapter 151.

Inference about µ 804/20/23 Inference about µ 8

Can Normality be assumed?

Moderately sized dataset (n = 20) w/strong skew. Normality cannot be assumed

Do NOT use z [or t] procedures

Page 9: Chapter 15 Inference in Practice PSLS/2eChapter 151.

Inference about µ 904/20/23 Inference about µ 9

Can Normality be assumed?Extremely large data set (n ≈ 1000)

The data has a strong positive skew

But since sample is large, central limit theorem is strong and we can assume Normality.

Do use z [or t] procedures.

Page 10: Chapter 15 Inference in Practice PSLS/2eChapter 151.

Inference about µ 1004/20/23 Inference about µ 10

Can Normality be assumed?

The distribution has no clear departures from Normality. Therefore, we can trust z [and t] procedures.

n is moderate

Page 11: Chapter 15 Inference in Practice PSLS/2eChapter 151.

Additional Caution: GIGO

PSLS/2e Chapter 15 11

• Garbage In, Garbage Out • A study is only as good as the quality of the data• CIs and P-values are valueless when the

INFORMATION is of POOR QUALITY• Example: Self-reported data can be inaccurate and

biased

Page 12: Chapter 15 Inference in Practice PSLS/2eChapter 151.

Additional Caution: P-values• P-values (significance tests) are often misunderstood• Even large differences can fail to be significant if the

sample is small • Statistical significance does NOT tell us whether a finding is

important statistical significance is NOT the same as practical significance

• P values are NOT the probability that H0 is true; it is the probability the data came from a distribution in which H0 is correct

• Failure to reject H0 is NOT the same as accepting H0• Although = 0.05 is a common cut-off, there is NO

set border between “significant” and “insignificant” results, surely God loves P = .06 nearly as much as P = .05.

PSLS/2e Chapter 15 12

Page 13: Chapter 15 Inference in Practice PSLS/2eChapter 151.

Margin of Error (m)• When estimating µ with C confidence, the margin of error:

• The margin of error = half the CI length indicates the precision of the estimate

• z* and σ are immutable at a given level of confidence • To increase precision, increase the sample size:

↑ n → ↓ m → ↑ precision

PSLS/2e Chapter 15 13

m zn

Page 14: Chapter 15 Inference in Practice PSLS/2eChapter 151.

Choosing a Sample Size

PSLS/2e Chapter 15 14

To determine the sample size requirement to achieve margin of error m when estimating µ use:

2

m

σzn

Page 15: Chapter 15 Inference in Practice PSLS/2eChapter 151.

Example: National Assessment of Educational Progress (NAEP) Math Scores

PSLS/2e Chapter 15 15

NEAP math scores predict success following High School

Suppose that we want to estimate a population mean NAEP scores with 90% confidence and want the margin of error to be no more than ±5 points

We know the NEAP math scores have = 60

What sample size will be required to enable us to create such an interval?

Page 16: Chapter 15 Inference in Practice PSLS/2eChapter 151.

Example

PSLS/2e Chapter 15 16

NAEP Quantitative Scores

If you round down your margin of error will be bigger If you round up your margin of error will be smaller (a good

thing).Always round UP to next integer. Study 400 individuals so m no greater than 5.

z σn

m

2

5

2(1.645)(60)

= 399.67

Page 17: Chapter 15 Inference in Practice PSLS/2eChapter 151.

Example: Decrease margin of error m

PSLS/2e Chapter 15 17

Now suppose we want to estimate the population mean NAEP scores with 90% confidence and want the margin of error not to

exceed 3 points (recall that = 60).

What sample size will be required to enable us to create such an interval?

Page 18: Chapter 15 Inference in Practice PSLS/2eChapter 151.

Case Study

PSLS/2e Chapter 15 18

NAEP Quantitative Scores

Therefore resolve to study 1083 (so that the margin of error does not exceed 3 points.

Note that lowering the margin of error to 3 points, required a much larger sample size!