Statistics for Social and Behavioral Sciences Session #14: Estimation, Confidence Interval (Agresti...

Statistics for Socialand Behavioral Sciences

Session #14:Estimation, Confidence Interval

(Agresti and Finlay, Chapter 5)

Prof. Amine Ouazad

Statistics Course Outline

PART I. INTRODUCTION AND RESEARCH DESIGN

PART II. DESCRIBING DATA

PART III. DRAWING CONCLUSIONS FROM DATA: INFERENTIAL

STATISTICS

PART IV. : CORRELATION AND CAUSATION: REGRESSION

ANALYSIS

Week 1

Weeks 2-4

Weeks 5-9

Weeks 10-14

This is where we talk about Zmapp and Ebola!

Firenze or Lebanese Express now

Last 2 Sessions• A statistic is a random variable.• The distribution of a statistic is called its sampling

distribution.• In particular the mean of a variable in a sample is a

statistic.• The expected value of the sample mean is equal to the

true mean.• The standard deviation of the sample mean is called the

standard error.• Central Limit theorem: with a large sample size, the

sampling distribution of the mean of X is normal, and the empirical rule applies. The standard error is sX / √N.

Last 2 Sessions

• For a proportion (X is 0,1): sX = √( p (1-p) ). As we typically do not observe the true proportion p, but the sample proportion p.

• For other variables (X is not 0,1): As we do not observe the true standard deviation sX but rather the sample standard deviation sX, we approximate sX by sX and thus approximate the standard error by sX / √N.

• We are interested in estimating parameters, but we only observe statistics. Can we use statistics as estimators?

Outline

1. Back to ZomatoJust applying the formulas we know

2. Estimators:Point EstimatorBiased vs Unbiased EstimatorsEfficient vs Inefficient EstimatorsInterval Estimator

Next time: Estimation, Confidence Intervals (continued) Chapter 5 of A&F

Back to Zomato

1. What statistical issue would preclude us from using the Central Limit Theorem?

2. Assuming we can use the CLT, what is the Margin of Error on Cafe Firenze and Lebanese Express’s ratings? Think !!

• Questions:1. When rating a restaurant, what are the possible choices for

the user?2. What is 3.4 on this rating?3. What are we trying to estimate?4. What is the formula for the standard error of ratings?

• Is a rating X a 0,1 variable?

5. What is the standard deviation sX of ratings?6. Finally what is the standard error of the rating 3.4?7. And what is the margin of error for the rating 3.4?

(MoE = twice the standard error)

Recap: Central Limit Theorem

• Central Limit Theorem: with large sample size, the distribution of the sample mean is normal, with mean the true mean and with standard deviation (=standard error) equal to:

• X is not 0,1: Approximate the true standard deviation sX using the sample standard deviation sX.

• X is 0,1: Approximate sX = √( p (1-p) ) , where p is the true proportion, using the sample proportion for p.

Café Firenze’s case

Back to Zomato

• If we had all the ratings of individual users:– John 3 “Hated it, service is poor”– Abdullah 4 “Great venue”– Anthony 5 “Perfect, loved the al dente pasta”– Claire 3 “Ok for a downtown lunch”– Al Bloom 3 “The italian restaurant of the

world”– John Sexton 3 “Can achieve more”– Ayesha 3 “There are alternatives”

• The average is 3.4, and we would find sX=…………….

Zomato Problemo

• The website only reports the sample mean of ratings…

• We thus have to figure out a conservative of sX (the largest possible).

• What is the highest possible sx?

Outline

1. Back to ZomatoJust applying the formulas we know

2. Estimators:Point EstimateBiased vs Unbiased EstimatorsEfficient vs Inefficient EstimatorsInterval Estimate

Next time: Estimation, Confidence Intervals (continued) Chapter 5 of A&F

Parameters and their point estimatesParameters (« True » values) Point Estimate

Population mean mExample: Population mean rating of Cafe Firenze

Sample mean mSample mean rating of Cafe Firenze

Population median Sample median

Population standard deviation sX

Example: Population standard deviation of ratings of Cafe Firenze

Sample standard deviation sX.Sample standard deviation of ratings of Cafe Firenze

Population variance sX2 Sample variance sX

2

Population p-th percentile Sample p-th percentile

• This is called a “point estimate” because we give a single number (a “point” on the axis).

Biased vs Unbiased Estimator• We have seen that to get the standard error of the

sample mean, we need to have an estimate of sX.• So far we have used:

• And the textbook has given:

• These are two different estimators of the same quantity sX.

• The textbook’s estimator of sX is unbiased.These two formulas are “point estimates”.

Efficient vs Inefficient Estimator

• Among all possible estimators, an estimator is efficient if it has the smallest standard error.

• The standard error of

• Is smaller than the standard error of

• The slides’ version is efficient, while the textbook’s version is unbiased. There is a conundrum.

These two formulas are “point estimates”.

What do you actually need to remember?

• “Good” estimators are unbiased and efficient.– The sample mean is an unbiased and efficient

estimator of the population mean.• “Less good” estimators may be either unbiased

or efficient.– The sample standard deviation with denominator N-1

is unbiased but inefficient.– The sample standard deviation with denominator N is

biased but efficient.– We keep using the formula we learnt…

Parameters and Interval Estimate• An interval estimate is an interval of numbers

around the point estimate, which includes the parameter with probability either 90%, 95%, or 99%.

• Example: “the interval estimate[156.2 cm – 0.49cm ; 156.2 cm + 0.49cm]includes the population average height with probability 95%.”

Parameters and Interval Estimate• An interval estimate that includes the parameter with

probability 95% is called a 95% confidence interval.

• The expression “95% confidence interval” is widely used.

• Example: “[156.2 cm – 0.49cm ; 156.2 cm + 0.49cm]is a 95% confidence interval for the population average height.”

How do we build a 95% confidence interval?

• Goal: estimate the population average m.• From previous session:

[m – MoE ; m + MoE] includes the sample mean with probability 95%.

• We conclude: the interval[m – MoE; m+MoE] includes the population mean with probability 95%.

[m – MoE; m+MoE] is a 95% confidence interval for m.

MoE = 1.96 x Standard ErrorStandard Error = sX/√N

Wrap up• Central Limit theorem: with a large sample size, the

sampling distribution of the sample mean of X is normal, and the empirical rule applies. The standard error is the standard deviation of the sampling distribution sX / √N.

• For a proportion: sX = √( p (1-p) ). As we typically do not observe the true proportion p, but the sample proportion p.

• For other variables: As we do not observe the true standard deviation sX but rather the sample standard deviation sX, we approximate the standard error by sX / √N.

• We are interested in estimating parameters, but we only observe statistics. Can we use statistics as estimators? Estimators can be unbiased, and efficient.

Coming up: Readings:• This week and next week:

– Chapter 5 entirely – estimation, confidence intervals.– Understand the confidence interval, the point estimate.

• Online quiz on Thursday.• Deadlines are sharp and attendance is followed.• Tonight is the midterm election!! • Watch : http://www.msnbc.com/jose-diaz-balart/watch/is-2014-the-margin-of-error-midterms--

349919811638

For help:

• Amine OuazadOffice 1135, Social Science [email protected] hour: Tuesday from 5 to 6.30pm.

• GAF: Irene [email protected] recitations. At the Academic Resource Center, Monday from 2 to 4pm.

mailto:[email protected]



Statistics for Social and Behavioral Sciences Session #14: Estimation, Confidence Interval (Agresti...

Documents

Transcript of Statistics for Social and Behavioral Sciences Session #14: Estimation, Confidence Interval (Agresti...