Statistics for Social and Behavioral Sciences Session #14: Estimation, Confidence Interval (Agresti...
-
Upload
charity-marsh -
Category
Documents
-
view
214 -
download
0
Transcript of Statistics for Social and Behavioral Sciences Session #14: Estimation, Confidence Interval (Agresti...
Statistics for Socialand Behavioral Sciences
Session #14:Estimation, Confidence Interval
(Agresti and Finlay, Chapter 5)
Prof. Amine Ouazad
Statistics Course Outline
PART I. INTRODUCTION AND RESEARCH DESIGN
PART II. DESCRIBING DATA
PART III. DRAWING CONCLUSIONS FROM DATA: INFERENTIAL
STATISTICS
PART IV. : CORRELATION AND CAUSATION: REGRESSION
ANALYSIS
Week 1
Weeks 2-4
Weeks 5-9
Weeks 10-14
This is where we talk about Zmapp and Ebola!
Firenze or Lebanese Express now
Last 2 Sessions• A statistic is a random variable.• The distribution of a statistic is called its sampling
distribution.• In particular the mean of a variable in a sample is a
statistic.• The expected value of the sample mean is equal to the
true mean.• The standard deviation of the sample mean is called the
standard error.• Central Limit theorem: with a large sample size, the
sampling distribution of the mean of X is normal, and the empirical rule applies. The standard error is sX / √N.
Last 2 Sessions
• For a proportion (X is 0,1): sX = √( p (1-p) ). As we typically do not observe the true proportion p, but the sample proportion p.
• For other variables (X is not 0,1): As we do not observe the true standard deviation sX but rather the sample standard deviation sX, we approximate sX by sX and thus approximate the standard error by sX / √N.
• We are interested in estimating parameters, but we only observe statistics. Can we use statistics as estimators?
Outline
1. Back to ZomatoJust applying the formulas we know
2. Estimators:Point EstimatorBiased vs Unbiased EstimatorsEfficient vs Inefficient EstimatorsInterval Estimator
Next time: Estimation, Confidence Intervals (continued) Chapter 5 of A&F
Back to Zomato
1. What statistical issue would preclude us from using the Central Limit Theorem?
2. Assuming we can use the CLT, what is the Margin of Error on Cafe Firenze and Lebanese Express’s ratings? Think !!
• Questions:1. When rating a restaurant, what are the possible choices for
the user?2. What is 3.4 on this rating?3. What are we trying to estimate?4. What is the formula for the standard error of ratings?
• Is a rating X a 0,1 variable?
5. What is the standard deviation sX of ratings?6. Finally what is the standard error of the rating 3.4?7. And what is the margin of error for the rating 3.4?
(MoE = twice the standard error)
Recap: Central Limit Theorem
• Central Limit Theorem: with large sample size, the distribution of the sample mean is normal, with mean the true mean and with standard deviation (=standard error) equal to:
• X is not 0,1: Approximate the true standard deviation sX using the sample standard deviation sX.
• X is 0,1: Approximate sX = √( p (1-p) ) , where p is the true proportion, using the sample proportion for p.
Café Firenze’s case
Back to Zomato
• If we had all the ratings of individual users:– John 3 “Hated it, service is poor”– Abdullah 4 “Great venue”– Anthony 5 “Perfect, loved the al dente pasta”– Claire 3 “Ok for a downtown lunch”– Al Bloom 3 “The italian restaurant of the
world”– John Sexton 3 “Can achieve more”– Ayesha 3 “There are alternatives”
• The average is 3.4, and we would find sX=…………….
Zomato Problemo
• The website only reports the sample mean of ratings…
• We thus have to figure out a conservative of sX (the largest possible).
• What is the highest possible sx?
Outline
1. Back to ZomatoJust applying the formulas we know
2. Estimators:Point EstimateBiased vs Unbiased EstimatorsEfficient vs Inefficient EstimatorsInterval Estimate
Next time: Estimation, Confidence Intervals (continued) Chapter 5 of A&F
Parameters and their point estimatesParameters (« True » values) Point Estimate
Population mean mExample: Population mean rating of Cafe Firenze
Sample mean mSample mean rating of Cafe Firenze
Population median Sample median
Population standard deviation sX
Example: Population standard deviation of ratings of Cafe Firenze
Sample standard deviation sX.Sample standard deviation of ratings of Cafe Firenze
Population variance sX2 Sample variance sX
2
Population p-th percentile Sample p-th percentile
• This is called a “point estimate” because we give a single number (a “point” on the axis).
Biased vs Unbiased Estimator• We have seen that to get the standard error of the
sample mean, we need to have an estimate of sX.• So far we have used:
• And the textbook has given:
• These are two different estimators of the same quantity sX.
• The textbook’s estimator of sX is unbiased.These two formulas are “point estimates”.
Efficient vs Inefficient Estimator
• Among all possible estimators, an estimator is efficient if it has the smallest standard error.
• The standard error of
• Is smaller than the standard error of
• The slides’ version is efficient, while the textbook’s version is unbiased. There is a conundrum.
These two formulas are “point estimates”.
What do you actually need to remember?
• “Good” estimators are unbiased and efficient.– The sample mean is an unbiased and efficient
estimator of the population mean.• “Less good” estimators may be either unbiased
or efficient.– The sample standard deviation with denominator N-1
is unbiased but inefficient.– The sample standard deviation with denominator N is
biased but efficient.– We keep using the formula we learnt…
Parameters and Interval Estimate• An interval estimate is an interval of numbers
around the point estimate, which includes the parameter with probability either 90%, 95%, or 99%.
• Example: “the interval estimate[156.2 cm – 0.49cm ; 156.2 cm + 0.49cm]includes the population average height with probability 95%.”
Parameters and Interval Estimate• An interval estimate that includes the parameter with
probability 95% is called a 95% confidence interval.
• The expression “95% confidence interval” is widely used.
• Example: “[156.2 cm – 0.49cm ; 156.2 cm + 0.49cm]is a 95% confidence interval for the population average height.”
How do we build a 95% confidence interval?
• Goal: estimate the population average m.• From previous session:
[m – MoE ; m + MoE] includes the sample mean with probability 95%.
• We conclude: the interval[m – MoE; m+MoE] includes the population mean with probability 95%.
[m – MoE; m+MoE] is a 95% confidence interval for m.
MoE = 1.96 x Standard ErrorStandard Error = sX/√N
Wrap up• Central Limit theorem: with a large sample size, the
sampling distribution of the sample mean of X is normal, and the empirical rule applies. The standard error is the standard deviation of the sampling distribution sX / √N.
• For a proportion: sX = √( p (1-p) ). As we typically do not observe the true proportion p, but the sample proportion p.
• For other variables: As we do not observe the true standard deviation sX but rather the sample standard deviation sX, we approximate the standard error by sX / √N.
• We are interested in estimating parameters, but we only observe statistics. Can we use statistics as estimators? Estimators can be unbiased, and efficient.
Coming up: Readings:• This week and next week:
– Chapter 5 entirely – estimation, confidence intervals.– Understand the confidence interval, the point estimate.
• Online quiz on Thursday.• Deadlines are sharp and attendance is followed.• Tonight is the midterm election!! • Watch : http://www.msnbc.com/jose-diaz-balart/watch/is-2014-the-margin-of-error-midterms--
349919811638
For help:
• Amine OuazadOffice 1135, Social Science [email protected] hour: Tuesday from 5 to 6.30pm.
• GAF: Irene [email protected] recitations. At the Academic Resource Center, Monday from 2 to 4pm.