Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean...

21
Survey commands in STATA Carlo Azzarri DECRG

Transcript of Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean...

Page 1: Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean estimation Number of obs = 3638 ... Proportion estimation Number of obs = 3638 no:

Survey commands in STATA

Carlo AzzarriDECRG

Page 2: Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean estimation Number of obs = 3638 ... Proportion estimation Number of obs = 3638 no:

Sample survey: Albania 2005 LSMS

4 strata (Central, Coastal, Mountain, Tirana)

455 Primary Sampling Units (PSU)

8 HHs by PSU * 455 = 3,640 HHs

Page 3: Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean estimation Number of obs = 3638 ... Proportion estimation Number of obs = 3638 no:

. svyset PSU [pw=popw], str(stratum)

pweight: “popw”

VCE: linearized

Single unit: missing

Strata 1: “stratum”

SU 1: “PSU”

FPC 1: <zero>

.svy supports many estimation commands:mean, proportion, ratio, totalcnreg, cnsreg, glm, intreg, nl, ols, tobit, treatreg, truncregstcox, stregprobit, logit, biprobit, cloglog,…clogit, mlogit, mprobit, oloig oprobit, slogitnbreg, poisson, zip, zinp. Ivols, ivprobit, ivtobit. Heckman, heckprob

svy command: general syntax

Page 4: Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean estimation Number of obs = 3638 ... Proportion estimation Number of obs = 3638 no:

averageproportionmodel estimation

Examples

Page 5: Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean estimation Number of obs = 3638 ... Proportion estimation Number of obs = 3638 no:

. mean TOTPCCONS [pw=popw], over(urban)

Mean estimation Number of obs = 3638

Urban: urban = Urban

Rural: urban = Rural

--------------------------------------------------------------

Over | Mean Std. Err. [95% Conf. Interval]

-------------+------------------------------------------------

TOTPCCONS |

Urban | 12094.48 203.4169 11695.66 12493.3

Rural | 8160.521 125.9879 7913.507 8407.535

--------------------------------------------------------------

average

Page 6: Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean estimation Number of obs = 3638 ... Proportion estimation Number of obs = 3638 no:

. svy: mean TOTPCCONS, over(urban)

(running mean on estimation sample)

Survey: Mean estimation

Number of strata = 4 Number of obs = 3638

Number of PSUs = 455 Population size = 3068195

Design df = 451

Urban: urban = Urban

Rural: urban = Rural

--------------------------------------------------------------

| Linearized

Over | Mean Std. Err. [95% Conf. Interval]

-------------+------------------------------------------------

TOTPCCONS |

Urban | 12094.48 296.8412 11511.12 12677.84

Rural | 8160.521 209.3881 7749.024 8572.018

--------------------------------------------------------------

svy command: average

Page 7: Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean estimation Number of obs = 3638 ... Proportion estimation Number of obs = 3638 no:

. estat effects, deff srssubpop (in Stata 10…)

Urban: urban = Urban

Rural: urban = Rural

------------------------------------------------

| Linearized

Over | Mean Std. Err. DEFF

-------------+----------------------------------

TOTPCCONS |

Urban | 12094.48 296.8412 2.82798

Rural | 8160.521 209.3881 3.5515

------------------------------------------------

This value means that the sample variance is 2.8 times bigger than it would be if the survey were based on the same sample size but selected randomly

svy command: average (DEFF)

Page 8: Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean estimation Number of obs = 3638 ... Proportion estimation Number of obs = 3638 no:

differences?mean is the samestd. error is higherC.I. widensurban C.I. 11,696-12,493 (w/out sampling design)

11,511-12,678 (w/ sampling design)statistical difference between groups less likely because of overlap (not in this case)design effect

Page 9: Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean estimation Number of obs = 3638 ... Proportion estimation Number of obs = 3638 no:

. ttest TOTPCCONS [aw=popw], by(urban)

. reg TOTPCCONS urban [aw=popw]

(sum of wgt is 3.0682e+06)

Source | SS df MS Number of obs = 3638

-------------+------------------------------ F( 1, 3636) = 358.24

Model | 1.3869e+10 1 1.3869e+10 Prob > F = 0.0000

Residual | 1.4076e+11 3636 38713907 R-squared = 0.0897

-------------+------------------------------ Adj R-squared = 0.0894

Total | 1.5463e+11 3637 42516582.5 Root MSE = 6222.1

------------------------------------------------------------------------------

TOTPCCONS | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

urban | -3933.959 207.8452 -18.93 0.000 -4341.463 -3526.454

_cons | 16028.44 340.3616 47.09 0.000 15361.12 16695.76

------------------------------------------------------------------------------

average (test)

Page 10: Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean estimation Number of obs = 3638 ... Proportion estimation Number of obs = 3638 no:

. svy: mean TOTPCCONS, over(urban)

. lincom [TOTPCCONS]Urban-[TOTPCCONS]Rural

( 1) [TOTPCCONS]Urban - [TOTPCCONS]Rural = 0

------------------------------------------------------------------------------

| Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

(1) | 3933.959 365.0498 10.78 0.000 3216.549 4651.368

------------------------------------------------------------------------------

3,934 = 12,094 (urban) - 8,160 (rural)

svy command: average (test)

Page 11: Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean estimation Number of obs = 3638 ... Proportion estimation Number of obs = 3638 no:

. svy: reg TOTPCCONS urban

Survey: Linear regression

Number of strata = 4 Number of obs = 3638Number of PSUs = 455 Population size = 3068194.7

Design df = 451F( 1, 451) = 116.13Prob > F = 0.0000R-squared = 0.0897

------------------------------------------------------------------------------| Linearized

TOTPCCONS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------

urban | -3933.959 365.0498 -10.78 0.000 -4651.368 -3216.549_cons | 16028.44 631.5923 25.38 0.000 14787.21 17269.67

------------------------------------------------------------------------------

model estimation (w/ dummy)

Page 12: Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean estimation Number of obs = 3638 ... Proportion estimation Number of obs = 3638 no:

. proportion poor [pw=popw], over(urban)

Proportion estimation Number of obs = 3638

no: poor = noyes: poor = yes

Urban: urban = Urban

Rural: urban = Rural

--------------------------------------------------------------Over | Proportion Std. Err. [95% Conf. Interval]

-------------+------------------------------------------------no |

Urban | .8881416 .009669 .8691843 .9070988Rural | .7575375 .013601 .7308709 .7842041

-------------+------------------------------------------------yes |

Urban | .1118584 .009669 .0929012 .1308157Rural | .2424625 .013601 .2157959 .2691291

--------------------------------------------------------------

proportion

Page 13: Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean estimation Number of obs = 3638 ... Proportion estimation Number of obs = 3638 no:

. svy: mean poor, over(urban)

(running mean on estimation sample)

Survey: Mean estimation

Number of strata = 4 Number of obs = 3638

Number of PSUs = 455 Population size = 3068195

Design df = 451

Urban: urban = Urban

Rural: urban = Rural

--------------------------------------------------------------

| Linearized

Over | Mean Std. Err. [95% Conf. Interval]

-------------+------------------------------------------------

poor |

Urban | .1118584 .0108146 .0906052 .1331116

Rural | .2424625 .0195528 .2040366 .2808883

--------------------------------------------------------------

svy command: proportion

Page 14: Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean estimation Number of obs = 3638 ... Proportion estimation Number of obs = 3638 no:

. estat effects, deff srssubpop (in Stata 10…)

Urban: urban = Urban

Rural: urban = Rural

------------------------------------------------

| Linearized

Over | Mean Std. Err. DEFF

-------------+----------------------------------

poor |

Urban | .1118584 .0108146 2.35214

Rural | .2424625 .0195528 3.40943

------------------------------------------------

Only 1/2.35 as many observations would be needed to measure the urban PHC if a simple random sample were used (instead of the cluster sample with the design effect of 2.35)

svy command: proportion (DEFF)

Page 15: Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean estimation Number of obs = 3638 ... Proportion estimation Number of obs = 3638 no:

. ttest poor [aw=popw], by(urban)

. reg poor urban [aw=popw](sum of wgt is 3.0682e+06)

Source | SS df MS Number of obs = 3638-------------+------------------------------ F( 1, 3636) = 104.20

Model | 15.286226 1 15.286226 Prob > F = 0.0000Residual | 533.38963 3636 .146696818 R-squared = 0.0279

-------------+------------------------------ Adj R-squared = 0.0276Total | 548.675856 3637 .15085946 Root MSE = .38301

------------------------------------------------------------------------------poor | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------urban | .1306041 .0127943 10.21 0.000 .1055193 .1556888_cons | -.0187456 .0209516 -0.89 0.371 -.0598237 .0223325

------------------------------------------------------------------------------

proportion (test)

Page 16: Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean estimation Number of obs = 3638 ... Proportion estimation Number of obs = 3638 no:

. svy: mean poor, over(urban)

. lincom [poor]Urban-[poor]Rural

( 1) [poor]Urban - [poor]Rural = 0

------------------------------------------------------------------------------

| Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

(1) | -.1306041 .0223587 -5.84 0.000 -.1745441 -.086664

------------------------------------------------------------------------------

svy command: proportion (test)

Page 17: Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean estimation Number of obs = 3638 ... Proportion estimation Number of obs = 3638 no:

. reg TOTPCCONS TOTPCINCOME [pw=popw]

------------------------------------------------------------------------------

| Robust

TOTPCCONS | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

TOTPCINCOME | .1676896 .0150931 11.11 0.000 .1380979 .1972814

_cons | 7680.692 200.8623 38.24 0.000 7286.878 8074.506

------------------------------------------------------------------------------

. svy: reg TOTPCCONS TOTPCINCOME

(running regress on estimation sample)

------------------------------------------------------------------------------

| Linearized

TOTPCCONS | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

TOTPCINCOME | .1676896 .0150648 11.13 0.000 .1380838 .1972954

_cons | 7680.692 238.9625 32.14 0.000 7211.074 8150.31

------------------------------------------------------------------------------

model estimation (actual sample)

Page 18: Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean estimation Number of obs = 3638 ... Proportion estimation Number of obs = 3638 no:

model estimation (actual sample)0

100

02

000

300

04

000

S.E

. of

the

pre

dict

ion

0 10 000 0 200 000 3 000 00TOT PCIN CO ME

w/ a ctual s am ple

Standard Errors

Page 19: Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean estimation Number of obs = 3638 ... Proportion estimation Number of obs = 3638 no:

model estimation (4 times actual sample)0

100

02

000

300

04

000

S.E

. of

the

pre

dict

ion

0 10 000 0 200 000 3 000 00TOT PCIN CO ME

Two -sta ge strat ified SRS

w/ 4 t im es the actual sample

Standard Errors

Page 20: Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean estimation Number of obs = 3638 ... Proportion estimation Number of obs = 3638 no:

. reg TOTPCCONS TOTPCINCOME [pw=popw]

------------------------------------------------------------------------------| Robust

TOTPCCONS | Coef. Std. Err. t P>|t| [95% Conf.Interval]-------------+----------------------------------------------------------------TOTPCINCOME | .1676896 .007545 22.23 0.000 .1529005 .1824787

_cons | 7680.692 100.4104 76.49 0.000 7483.875 7877.509------------------------------------------------------------------------------

. svy: reg TOTPCCONS TOTPCINCOME(running regress on estimation sample)

------------------------------------------------------------------------------| Linearized

TOTPCCONS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------TOTPCINCOME | .1676896 .0150648 11.13 0.000 .1380838 .1972954

_cons | 7680.692 238.9625 32.14 0.000 7211.074 8150.31------------------------------------------------------------------------------

model estimation (4 times actual sample)

Page 21: Survey commands in STATA - World Banksiteresources.worldbank.org/INTPOVRES/Resources/477227...Mean estimation Number of obs = 3638 ... Proportion estimation Number of obs = 3638 no:

Main message“Respondents in the same cluster are likely to be somewhat similar to one another”. As a result, in a clustered sample “selecting an additional member from the same cluster adds less new information than would a completely independent selection” (Health Survey for England: The Health of Young People '95 – 97)

Statistics and parameters do not differ (as long as weights are used), but standard errors do, so……always take sampling design into account, otherwise inaccurate/wrong inference