1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

24
1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals

Transcript of 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

Page 1: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

1. Exams2. Sampling Distributions3. Estimation + Confidence Intervals

Page 2: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

INFERENTIAL STATISTICS Samples are only estimates of the population

Sample statistics will be slightly off from the true values of its population’s parameters

Sampling error: The difference between a sample statistic and a

population parameter

Probability theory Permits us to estimate the accuracy or

representativeness of the sample

Page 3: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

The “Catch-22” of Inferential Statistics

When we collect a sample, we know nothing about the population’s distribution of scores

We can calculate the mean (x-bar) & standard deviation (s) of our sample, but and are unknown

The shape of the population distribution (normal?) is also unknown

Page 4: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

μ = ??? (N= Thousands)

SampleN = 150

Probability Theory Allows Us To Answer:What is the likelihood that a given sample statistic

accurately represents a population parameter?

X=9.6

Number of serious crimes committed in year priorto prison for inmates entering the prison system

Page 5: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

Sampling Distribution(a.k.a. “Distribution of Sample Outcomes”)

“OUTCOMES” = proportions, means, etc. From repeated random sampling, a

mathematical description of all possible sampling event outcomes

And the probability of each one

Permits us to make the link between sample and population…

Answer the question: “What is the probability that a sample finding is due to chance?”

Page 6: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

Relationship between Sample, Sampling Distribution & Population

POPULATION

SAMPLING DISTRIBUTION

(Distribution of sample means or proportions)

SAMPLE

•Empirical (exists in reality)but unknown

•Nonempirical (theoretical or hypothetical)Laws of probability allow us to describe its characteristics(shape, central tendency,dispersion)

•Empirical & known (e.g.,distribution shape, mean, standard deviation)

Page 7: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

Sampling Distribution: Characteristics

Central tendency Sample means will cluster around the population mean Since samples are random, the sample means should be

distributed equally on either side of the population mean The mean of the sampling distribution is always

equal to the population mean

Shape: Normal distribution Central Limit Theorem:

Regardless of the shape of a raw score distribution (sample or population) of an interval-ratio variable, the sampling distribution will be approximately normal, as long as sample size is ≥ 100

Page 8: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

Sampling Distribution: Characteristics

Dispersion: Standard Error (SE) Measures the spread of sampling error that occurs

when a population is sampled repeatedly Same thing as standard deviation of the sampling

distribution Tells exactly how much error, on average, should

exist between the sample mean & the population mean

Formula:

σ / √N However, because σ usually isn’t known, s

(sample standard deviation) is used to estimate population standard deviation

Page 9: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

Sampling Distribution Standard Error

Law of Large Numbers: The larger the sample size (N), the more probable it is that the sample mean will be close to the population mean In other words: a big sample works better (should

give a more accurate estimate of the pop.) than a small one

Makes sense if you study the formula for standard error

Page 10: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

1. Estimation

StatisticalMethods

DescriptiveStatistics

InferentialStatistics

EstimationHypothesis

Testing

StatisticalMethods

DescriptiveStatistics

InferentialStatistics

EstimationHypothesis

TestingESTIMATION

Page 11: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

Introduction to Estimation Estimation procedures

Purpose: To estimate population parameters from sample

statistics Using the sampling distribution to infer from a sample to

the population

Most commonly used for polling data2 components:

Point estimate Confidence intervals

Page 12: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

Estimation

Point Estimate: Value of a sample statistic used to estimate a population parameter

Confidence Interval: A range of values around the point estimate

Confidence IntervalPoint Estimate

Confidence Limit (Lower)

Confidence Limit (Upper)

.58.546 .614

Page 13: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

Example CNN Poll (CNN.com; Feb 20, 2009): Slight majority thinks

stimulus package will improve economy

“The White House's economic stimulus plan isn't a surefire winner with the American public, but a majority does think the recovery plan will help. According to a new poll, fifty-three percent said the plan will improve economic conditions, while 44 percent said it won't stimulate the economy.”

“On an individual level, there was less hope for improvement. According to the poll, 67 percent said it would not help them personally.”

“The Poll was conducted Wednesday and Thursday (Feb 18-19, 2009), with 1,046 people questioned by telephone. The survey's sampling error is plus or minus 3 percentage points.”

Page 14: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

Estimation POINT ESTIMATES

(another way of saying sample statistics)

CONFIDENCE INTERVAL a.k.a. “MARGIN OF ERROR” Indicates that over the long

run, 95 percent of the time, the true pop. value will fall within a range of +/- 3

Point estimates & confidence interval should be reported together

“…but a majority does think the recovery plan will help, according to a new poll. Fifty-three percent said the plan will improve economic conditions, while 44 percent said it won't stimulate the economy.

…. The Poll was conducted Wednesday and Thursday (Feb 18-19, 2009), with 1,046 people questioned by telephone. The survey's sampling error is plus or minus 3 percentage points.

Page 15: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

Estimation1 : Pick Confidence Level

Confidence LEVEL Probability that the unknown population

parameter falls within the interval Alpha (

The probability that the parameter is NOT within the interval

Confidence level = 1 -

Conventionally, confidence level values are almost always 95%or 99%

Page 16: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

Procedure for Constructing an Interval Estimate

2. Divide the probability of error equally into the upper and lower tails of the distribution (2.5% error in each tail with 95% confidence level) Find the corresponding Z score

0.95

-1.96 1.96

.025 .025

Z scores

Page 17: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

Procedure for Constructing an Interval Estimate

3. Construct the confidence interval Proportions (like the eavesdropping poll example):

Sample point estimate (convert % to a proportion): “Fifty-three percent said the plan will improve economic

conditions…” 0.53

Sample size (N) = 1,046 Formula 7.3 in Healey

Numerator = (your proportion) (1- proportion) 95% confidence level (replicating results from article) 99% confidence level – intervals widen as level of

confidence increases

Page 18: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

Example 1: Estimate for the economic recovery poll p = .53 (53% think it will help) Z = 1.96 (95% confidence interval) N = 1046 (sample size) What happens when we…

Recalculate for N = 10,000N back to original, recalculate for p. = .90Back to original, but change confidence level to 99%

Page 19: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

Example 2

Houston Chronicle (2008) — A University of Texas poll to be released today shows Republican presidential candidate John McCain and GOP Sen. John Cornyn leading by comfortable margins in Texas, as expected. But the statewide survey of 550 registered voters has one very surprising finding: 23 percent of Texans are convinced that Democratic presidential nominee Barack Obama is a Muslim. The Obama-is-a-Muslim confusion is caused by fallacious

Internet rumors and radio talk-show gossip. McCain went so far at one of his town hall meetings to grab a microphone from a woman who claimed that Obama was an Arab.

1. GIVEN THIS INFO, IDENTIFY A POINT ESTIMATE & CALCULATE THE CONFIDENCE INTERVAL (ASSUMING A 95% CONFIDENCE LEVEL).

2. CALCULATE THE CONFIDENCE INTERVAL ASSUMING A 99% CONFIDENCE LEVEL

Page 20: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

Sample means and proportions (like the .53 [53%] & .23 [23%]) are UNBIASED estimates of the population parameters We know that the mean of the sampling distribution = the

pop. Mean Other sample statistics (such as standard deviation) are

biased The standard deviation of a sample is by definition

smaller than the standard deviation of the population

Bottom line: A good estimate is UNBIASED Trustworthy estimator of the pop. parameter

A Good Estimate is Unbiased

Page 21: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

Efficiency Refers to the extent to which the sampling distribution is

clustered about its mean Efficiency depends largely on sample size

As the sample size increases, the sampling distribution gets tighter (more narrow)

BOTTOM LINE: THE LESS SPREAD (THE SMALLER THE S.E.), THE BETTER

A Good Estimate is Efficient

Page 22: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

Estimation of Population Means EXAMPLE:

A researcher has gathered information from a random sample of 178 households. Construct a confidence interval to estimate the population mean at the 95% level: An average of 2.3 people reside in each household.

Standard deviation is .35.

Page 23: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

PROCEDURE FOR CONSTRUCTING ANINTERVAL ESTIMATE

A random sample of 429 college students was interviewed They reported they had spent an average of $178 on

textbooks during the previous semester. If the standard deviation (s) of these data is $15 construct an estimate of the population at the 95% confidence level.

They reported they had missed 2.8 days of class per semester because of illness. If the sample standard deviation is 1.0, construct an estimate of the population mean at the 99% confidence level.

Two individuals are running for mayor of Duluth. You conduct an election survey of 100 adult Duluth residents 1 week before the election and find that 45% of the sample support candidate Long Duck Dong, while 40% plan to vote for candidate Singalingdon. Using a 95% confidence level, based on your findings, can you

predict a winner?

Page 24: 1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.

What influences confidence intervals?

The width of a confidence interval depends on three things The confidence level can be raised (e.g., to

99%) or lowered (e.g., to 90%)

N: we have more confidence in larger sample sizes so as N increases, the interval decreases

Variation: more variation = more error % agree closer to 50% Higher standard deviations