We’ll use this data AirlineComplaints.xls AirlineComplaints.xls.
Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences...
-
Upload
vanessa-mavis-burke -
Category
Documents
-
view
215 -
download
0
Transcript of Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences...
Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.
Today we’ll use the very same principles, but we’ll do tests to use sample proportions of categorical variables to make inferences about parameters.
E.g., a random sample of households in a poor neighborhood in Lima, Peru, finds that 38% of the households are headed by single women.
Is this neighborhood proportion representative of the population proportion of Lima poor neighborhoods), or is it different enough to test statistically significant?
E.g., since a non-government organization began providing enterprise training & subsidies to women in a poor neighborhood in Buenos Aires, a random sample finds that the neighborhood’s proportion of income-earning adult women has increased from 34% to 39%.
Is this after vs. before proportion due to sampling variability? Or is it different enough to be statistically significant?
The same random sample finds that 41% of the sampled women who participated in the program & 36% of the sampled women who didn’t participate are now engaged in income-earning activities.
Is this difference in proportions due to sampling variability? Or is it different enough to be statistically significant?
Sample proportion: a binomial count within a sample divided by the sample size-n.
It is a categorical variable (e.g., yes vs. no; lived vs. died)
Sample Proportion
Premises
Random sample of independent observations.
Binomial (i.e. ‘success’/’failure’) count.
The population must be at least 10 times larger than the sample.
There must be at least 10 observations for p & at least 10 observations for 1 – p.
If these sample assumptions are met, then the difference between the two proportions being compared (e.g., observed proportion vs. benchmark; two-sample proportions; after vs. before proportion) is approximately standard normal in distribution.
Moore/McCabe use a more precise than ‘traditional’ estimate of confidence interval for proportions called the ‘Wilson estimate.’
The Stata command is:
. ci binaryvar, binomial wilson
As we’ll later discuss, there are other options besides ‘wilson’.
Wilson estimate of the population proportion based on sample data:
= X + 2/n + 4
Standard error of the proportion (i.e. based on sample data):
41 n)p(pse
Approximate level C confidence interval for the proportion:
Large-sample (n >5) significance test for a population proportion
se*zp
The Stata command to find a ‘Wilson estimate’ of a population proportion based on a confidence interval of sample data:
. ci hmath, binomial wilson
. ci hmath, b w level(90)
. ci hmath, b w l(99)
Other binomial options: exact, agresti, jeffreys.
To Repeat: The Steps
Step 1: Ask if the binomial assumptions are fulfilled (including that both the expected #failures & the expected #successes >10).
Step 2: Do a frequency table or bar graph of the binary variable & display the variable’s sample proportion.
Step 3: If all checks out okay, state the null hypothesis & the alternative hypothesis.
Step 4: Conduct the hypothesis test.
. use hsb2, clear
. gen hmath=math>=60 & math<.
. la var hmath “Honors math (>=60)”
Example
. tab hmath
Honors math(>=60) Freq. Percent Cum.
0 151 75.50 75.501 49 24.50 100.00
Total 200 100.00
. ci hmath, binomial wilson
Variable Obs Mean Std. Err. [95% Conf Interval]
hmath | 200 .245 .0304118 .1905687 .3090424
Ho: hmath=.265; Ha: hmath~=.265.
Is hmath significantly different from .265 (two-sided test, i.e. does the mean of hmath fall outside the confidence interval)?
Does using other command options (or no option except ‘binomial’) make a difference?
A Two-Sided CI Hypothesis Test
How to Conduct a Large-Sample Hypothesis Test for a Population
Proportion: prtest
. ‘prtest’ allows testing one- or two-sided hypotheses.
Check the premises & data.
Test the hypothesis.
. prtest hmath = .265One-sample test of proportion hmath: Number of obs = 200
Variable Mean Std. Err. z P>z [95% Conf. Interval]
hmath .245 .0304118 8.05609 0.0000 .1853941 .3046059
Ho: proportion(hmath) = .265
Ha: hmath < .265 Ha: hmath ~= .265 Ha: hmath > .265
z = -0.641 z = -0.641 z = -0.641
P < z = 0.2608 P > z = 0.5216 P > z = 0.7392
Conclusion: Fail to reject Ho.
Recall that conclusions are always uncertain.
How to Conduct a Test Comparing Two Proportions
Ho: female hmath = male hmath
Ha: female hmath ~= male hmath
Check the premises & the data:.
tab female hmath
female=1 | math>=60 male=0 | 0 1 | Total------------------------------------------- male | 68 23 | 91 female | 83 26 | 109 ------------------------------------------- Total | 151 49 | 200
State the hypotheses.
Ho: female hmath = male hmath
Ha: female hmath ~= male hmath
Step 3: test the hypothesis.
. prtest hmath, by(female)Two-sample test of proportion
male: Number of obs = 91
female: Number of obs = 109
Variable Mean Std. Err. z P>z [95% Conf. Interval]
male .2527473 .0455571 5.54792 0.0000 .1634569 .3420376
female .2385321 .0408212 5.84334 0.0000 .158524 .3185402
diff .0142151 .0611704 -.1056767 .134107
under Ho: .0610714 .232763 0.8159
Ho: proportion(male) - proportion(female) = diff = 0
Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0
z = 0.233 z = 0.233 z = 0.233
P < z = 0.5920 P > z = 0.8159 P > z = 0.4080
Conclusion: Fail to reject Ho.
Substantive conclusions? Next research steps?
Here’s an after vs. before example.
. Does a summer math course significantly increase the proportion of students who qualify for honors math?
Check the sample premises.
Display the data proportion.
Test the hypothesis:
Ho: post-test honors proportion = pre-test honors proportion (i.e. difference = 0)
Ha: post-test honors proportion > pre-test honors proportion (i.e. difference > 0)
. prtesti 191 .271 200 .245Two-sample test of proportion x: Number of obs = 191
y: Number of obs = 200
------------------------------------------------------------------------------
Variable | Mean Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .271 .0321612 .2079653 .3340347
y | .245 .0304118 .1853941 .3046059
-------------+----------------------------------------------------------------
diff | .026 .044263 -.0607539 .1127539
| under Ho: .0442491 0.59 0.557
------------------------------------------------------------------------------
Ho: proportion(x) - proportion(y) = diff = 0
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
z = 0.588 z = 0.588 z = 0.588
P < z = 0.7216 P > |z| = 0.5568 P > z = 0.2784
Test conclusion?
Results are always uncertain.