Post on 30-Mar-2018
Survey commands in STATA
Carlo AzzarriDECRG
Sample survey: Albania 2005 LSMS
4 strata (Central, Coastal, Mountain, Tirana)
455 Primary Sampling Units (PSU)
8 HHs by PSU * 455 = 3,640 HHs
. svyset PSU [pw=popw], str(stratum)
pweight: “popw”
VCE: linearized
Single unit: missing
Strata 1: “stratum”
SU 1: “PSU”
FPC 1: <zero>
.svy supports many estimation commands:mean, proportion, ratio, totalcnreg, cnsreg, glm, intreg, nl, ols, tobit, treatreg, truncregstcox, stregprobit, logit, biprobit, cloglog,…clogit, mlogit, mprobit, oloig oprobit, slogitnbreg, poisson, zip, zinp. Ivols, ivprobit, ivtobit. Heckman, heckprob
svy command: general syntax
averageproportionmodel estimation
Examples
. mean TOTPCCONS [pw=popw], over(urban)
Mean estimation Number of obs = 3638
Urban: urban = Urban
Rural: urban = Rural
--------------------------------------------------------------
Over | Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
TOTPCCONS |
Urban | 12094.48 203.4169 11695.66 12493.3
Rural | 8160.521 125.9879 7913.507 8407.535
--------------------------------------------------------------
average
. svy: mean TOTPCCONS, over(urban)
(running mean on estimation sample)
Survey: Mean estimation
Number of strata = 4 Number of obs = 3638
Number of PSUs = 455 Population size = 3068195
Design df = 451
Urban: urban = Urban
Rural: urban = Rural
--------------------------------------------------------------
| Linearized
Over | Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
TOTPCCONS |
Urban | 12094.48 296.8412 11511.12 12677.84
Rural | 8160.521 209.3881 7749.024 8572.018
--------------------------------------------------------------
svy command: average
. estat effects, deff srssubpop (in Stata 10…)
Urban: urban = Urban
Rural: urban = Rural
------------------------------------------------
| Linearized
Over | Mean Std. Err. DEFF
-------------+----------------------------------
TOTPCCONS |
Urban | 12094.48 296.8412 2.82798
Rural | 8160.521 209.3881 3.5515
------------------------------------------------
This value means that the sample variance is 2.8 times bigger than it would be if the survey were based on the same sample size but selected randomly
svy command: average (DEFF)
differences?mean is the samestd. error is higherC.I. widensurban C.I. 11,696-12,493 (w/out sampling design)
11,511-12,678 (w/ sampling design)statistical difference between groups less likely because of overlap (not in this case)design effect
. ttest TOTPCCONS [aw=popw], by(urban)
. reg TOTPCCONS urban [aw=popw]
(sum of wgt is 3.0682e+06)
Source | SS df MS Number of obs = 3638
-------------+------------------------------ F( 1, 3636) = 358.24
Model | 1.3869e+10 1 1.3869e+10 Prob > F = 0.0000
Residual | 1.4076e+11 3636 38713907 R-squared = 0.0897
-------------+------------------------------ Adj R-squared = 0.0894
Total | 1.5463e+11 3637 42516582.5 Root MSE = 6222.1
------------------------------------------------------------------------------
TOTPCCONS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
urban | -3933.959 207.8452 -18.93 0.000 -4341.463 -3526.454
_cons | 16028.44 340.3616 47.09 0.000 15361.12 16695.76
------------------------------------------------------------------------------
average (test)
. svy: mean TOTPCCONS, over(urban)
. lincom [TOTPCCONS]Urban-[TOTPCCONS]Rural
( 1) [TOTPCCONS]Urban - [TOTPCCONS]Rural = 0
------------------------------------------------------------------------------
| Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 3933.959 365.0498 10.78 0.000 3216.549 4651.368
------------------------------------------------------------------------------
3,934 = 12,094 (urban) - 8,160 (rural)
svy command: average (test)
. svy: reg TOTPCCONS urban
Survey: Linear regression
Number of strata = 4 Number of obs = 3638Number of PSUs = 455 Population size = 3068194.7
Design df = 451F( 1, 451) = 116.13Prob > F = 0.0000R-squared = 0.0897
------------------------------------------------------------------------------| Linearized
TOTPCCONS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------
urban | -3933.959 365.0498 -10.78 0.000 -4651.368 -3216.549_cons | 16028.44 631.5923 25.38 0.000 14787.21 17269.67
------------------------------------------------------------------------------
model estimation (w/ dummy)
. proportion poor [pw=popw], over(urban)
Proportion estimation Number of obs = 3638
no: poor = noyes: poor = yes
Urban: urban = Urban
Rural: urban = Rural
--------------------------------------------------------------Over | Proportion Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------no |
Urban | .8881416 .009669 .8691843 .9070988Rural | .7575375 .013601 .7308709 .7842041
-------------+------------------------------------------------yes |
Urban | .1118584 .009669 .0929012 .1308157Rural | .2424625 .013601 .2157959 .2691291
--------------------------------------------------------------
proportion
. svy: mean poor, over(urban)
(running mean on estimation sample)
Survey: Mean estimation
Number of strata = 4 Number of obs = 3638
Number of PSUs = 455 Population size = 3068195
Design df = 451
Urban: urban = Urban
Rural: urban = Rural
--------------------------------------------------------------
| Linearized
Over | Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
poor |
Urban | .1118584 .0108146 .0906052 .1331116
Rural | .2424625 .0195528 .2040366 .2808883
--------------------------------------------------------------
svy command: proportion
. estat effects, deff srssubpop (in Stata 10…)
Urban: urban = Urban
Rural: urban = Rural
------------------------------------------------
| Linearized
Over | Mean Std. Err. DEFF
-------------+----------------------------------
poor |
Urban | .1118584 .0108146 2.35214
Rural | .2424625 .0195528 3.40943
------------------------------------------------
Only 1/2.35 as many observations would be needed to measure the urban PHC if a simple random sample were used (instead of the cluster sample with the design effect of 2.35)
svy command: proportion (DEFF)
. ttest poor [aw=popw], by(urban)
. reg poor urban [aw=popw](sum of wgt is 3.0682e+06)
Source | SS df MS Number of obs = 3638-------------+------------------------------ F( 1, 3636) = 104.20
Model | 15.286226 1 15.286226 Prob > F = 0.0000Residual | 533.38963 3636 .146696818 R-squared = 0.0279
-------------+------------------------------ Adj R-squared = 0.0276Total | 548.675856 3637 .15085946 Root MSE = .38301
------------------------------------------------------------------------------poor | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------urban | .1306041 .0127943 10.21 0.000 .1055193 .1556888_cons | -.0187456 .0209516 -0.89 0.371 -.0598237 .0223325
------------------------------------------------------------------------------
proportion (test)
. svy: mean poor, over(urban)
. lincom [poor]Urban-[poor]Rural
( 1) [poor]Urban - [poor]Rural = 0
------------------------------------------------------------------------------
| Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | -.1306041 .0223587 -5.84 0.000 -.1745441 -.086664
------------------------------------------------------------------------------
svy command: proportion (test)
. reg TOTPCCONS TOTPCINCOME [pw=popw]
------------------------------------------------------------------------------
| Robust
TOTPCCONS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
TOTPCINCOME | .1676896 .0150931 11.11 0.000 .1380979 .1972814
_cons | 7680.692 200.8623 38.24 0.000 7286.878 8074.506
------------------------------------------------------------------------------
. svy: reg TOTPCCONS TOTPCINCOME
(running regress on estimation sample)
------------------------------------------------------------------------------
| Linearized
TOTPCCONS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
TOTPCINCOME | .1676896 .0150648 11.13 0.000 .1380838 .1972954
_cons | 7680.692 238.9625 32.14 0.000 7211.074 8150.31
------------------------------------------------------------------------------
model estimation (actual sample)
model estimation (actual sample)0
100
02
000
300
04
000
S.E
. of
the
pre
dict
ion
0 10 000 0 200 000 3 000 00TOT PCIN CO ME
w/ a ctual s am ple
Standard Errors
model estimation (4 times actual sample)0
100
02
000
300
04
000
S.E
. of
the
pre
dict
ion
0 10 000 0 200 000 3 000 00TOT PCIN CO ME
Two -sta ge strat ified SRS
w/ 4 t im es the actual sample
Standard Errors
. reg TOTPCCONS TOTPCINCOME [pw=popw]
------------------------------------------------------------------------------| Robust
TOTPCCONS | Coef. Std. Err. t P>|t| [95% Conf.Interval]-------------+----------------------------------------------------------------TOTPCINCOME | .1676896 .007545 22.23 0.000 .1529005 .1824787
_cons | 7680.692 100.4104 76.49 0.000 7483.875 7877.509------------------------------------------------------------------------------
. svy: reg TOTPCCONS TOTPCINCOME(running regress on estimation sample)
------------------------------------------------------------------------------| Linearized
TOTPCCONS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------TOTPCINCOME | .1676896 .0150648 11.13 0.000 .1380838 .1972954
_cons | 7680.692 238.9625 32.14 0.000 7211.074 8150.31------------------------------------------------------------------------------
model estimation (4 times actual sample)
Main message“Respondents in the same cluster are likely to be somewhat similar to one another”. As a result, in a clustered sample “selecting an additional member from the same cluster adds less new information than would a completely independent selection” (Health Survey for England: The Health of Young People '95 – 97)
Statistics and parameters do not differ (as long as weights are used), but standard errors do, so……always take sampling design into account, otherwise inaccurate/wrong inference