Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s...
-
Upload
noah-osborne -
Category
Documents
-
view
216 -
download
0
Transcript of Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s...
![Page 1: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/1.jpg)
Lynn Lethbridge
SHRUG November, 2010
![Page 2: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/2.jpg)
What is Bootstrapping?A method to estimate a statistic’s sampling
distribution
Bootstrap samples are drawn repeatedly with replacement from the original data
From each new sample, the statistic is re-calculated and saved in a dataset (ie 200 bootstraps, 200 statistics)
The standard error of the statistic is calculated as the standard deviation of the bootstrap statistics
Bootstrapping not used for the point estimate
![Page 3: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/3.jpg)
When to Use BootstrappingDistribution has no clear analytical solution
eg Gini coefficient, poverty intensityTest for sensitivityComplex survey design (not random)
eg Statistics Canada surveys are a stratified, multistage design Households within clusters within strata are
selected Observations will not be independent – variance
calculated the usual way will be underestimated
![Page 4: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/4.jpg)
Two ProgramsOne is ‘traditional’ bootstrapping
re-sampling from the original sampleThe second is bootstrapping using Statistics
Canada survey dataStatistics Canada does the re-sampling heavy
lifting in most of its surveysUse the bootstrap weights provided to
calculate the standard error
![Page 5: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/5.jpg)
Program 1Project where we examined the effect of
trade on ‘poverty intensity’ in Canada/USUsed state/province level measures in
regression analysisUsed bootstrapping to measure robustness of
results given a different mix of policiesOur dataset consists of 61 unique observations
of states and provinces. Re-sample to see if results are affected if we had a different make-up of regions
![Page 6: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/6.jpg)
/** run the regression with original sample to get point estimates */ proc reg data=orig.pov97 outest=work.estpoint(keep=intercept lmurate aveuiben tradeimp tradeexp sambearn can); model sst = lmurate aveuiben tradeimp tradeexp sambearn can; weight invse; title " 1997"; run; proc transpose data=work.estpoint out=work.estpoint2(drop=_label_ rename=(col1=coef)); run;
![Page 7: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/7.jpg)
/* put sample size in a macro */ proc means data=orig.pov97 noprint; var year; output out=work.out n=totnum; run; data _null_; set work.out; call symput ('totnum', totnum); run;
![Page 8: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/8.jpg)
/** make a temporary file of original dataset */ data work.pov97; set orig.pov97; run; /* initiate bootstrap dataset */ data work.boot97fin; set _null_; run; options nonotes; /* create macro for number of bootstraps */ %let bt=1000;
![Page 9: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/9.jpg)
%macro boot; /** construct new sample of 61 observations - randomly drawn with replacement */ data work.boot; do i=1 to &totnum; _p=ceil(ranuni(i+&x)*&totnum); do obsnum=_p to _p; set work.pov97 point=obsnum; if _error_ then abort; output; end; end; stop; run;
![Page 10: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/10.jpg)
/* estimate coefficients from bootstrap sample*/ proc reg data=work.boot noprint outest=work.est(keep=intercept lmurate aveuiben tradeimp tradeexp sambearn can); model sst = lmurate aveuiben tradeimp tradeexp sambearn can; weight invse; title " 1997"; run; /** add coefficients to dataset */ data work.boot97fin; set work.boot97fin work.est; run; %mend boot;
![Page 11: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/11.jpg)
/** invoke the boot macro 1000 times */ %macro reps; %do x=1 %to &bt; %boot; %end; %mend reps; %reps;
![Page 12: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/12.jpg)
options notes; /** calculate the standard deviation of each bootstrapped coefficient */ proc means data=work.boot97fin n mean std; output out=work.std std=intercept lmurate aveuiben tradeimp tradeexp sambearn can; run; proc transpose data= work.std (drop=_type_ _freq_)out=work.std2(drop=_label_ rename=(col1=se)); run;
![Page 13: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/13.jpg)
/** merge point estimates together with standard errors and calculate statistics */
data work.final; merge work.estpoint2 work.std2; t=coef/se; pvalue=(1-probnorm(abs(t)))*2; run; proc print data= work.final; run;
![Page 14: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/14.jpg)
Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.05648 0.02317 2.44 0.0181 lmurate 1 0.06210 0.01433 4.33 <.0001 aveuiben 1 -0.00009479 0.00003002 -3.16 0.0026 tradeimp 1 -0.07186 0.12541 -0.57 0.5690 tradeexp 1 0.02107 0.13190 0.16 0.8737 sambearn 1 -0.06155 0.04973 -1.24 0.2212 can 1 -0.03489 0.02739 -1.27 0.2081
![Page 15: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/15.jpg)
1997 The MEANS Procedure Variable Label N Mean Std Dev ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Intercept Intercept 1000 0.0581707 0.0305142 lmurate 1000 0.0616976 0.0178248 aveuiben 1000 -0.000101532 0.000037820 tradeimp 1000 -0.0258204 0.1743886 tradeexp 1000 -0.0355008 0.1880651 sambearn 1000 -0.0635708 0.0673242 can 1000 -0.0228619 0.0402765 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
![Page 16: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/16.jpg)
Obs _NAME_ coef se t pvalue 1 intercept 0.056482 0.03051 1.85102 0.06417 2 lmurate 0.062098 0.01782 3.48378 0.00049 3 aveuiben -0.000095 0.00004 -2.50627 0.01220 4 tradeimp -0.071862 0.17439 -0.41208 0.68028 5 tradeexp 0.021066 0.18807 0.11202 0.91081 6 sambearn -0.061547 0.06732 -0.91419 0.36062 7 can -0.034891 0.04028 -0.86628 0.38634
![Page 17: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/17.jpg)
Program 2Project using the National Longitudinal
Survey of Children and Youth (NLSCY)
Examined the effect of having a child with disabilities on the health of mothers and fathers
Ordered Probit utilizing Statistics Canada NLSCY bootstrap weights to estimate standard errors
![Page 18: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/18.jpg)
WeightingMany survey datasets include sampling weights
so results will represent the population
The mechanics of using bootstrap weights are the same as for sampling weights
Each individual in survey has a sample weight and all the bootstrap weights
Re-estimate your model or statistic over and over using a different weight each time
![Page 19: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/19.jpg)
Bootstrap Weight Derivation
Re-sampling A Miracle
Occurs
Bootstrap Weights
![Page 20: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/20.jpg)
/** macros to indicate the dependent variable and independent variables */ %let depvar=momhealth00; %let indepvars=hhdis00 momage00 momlthigh00 momcertdip00 momunivdeg00 momimm eqinc00 hhchlt500 kids01700 momvg94 momg94 momfp94 momsmokesdaily00; /** separate macro for the independent variables and intercept */ %let allrhs=intercept_2 intercept_3 intercept_4 intercept_5 &indepvars;
![Page 21: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/21.jpg)
/*** get point estimates using sample weight */ proc logistic data=nlscy.age615validboot descending outest=work.point(keep=&allrhs); model &depvar= &indepvars / link=normit maxiter=50 rsq; weight dwtcwd1l / norm; where validdis=1; title " mom 2000 "; run; /** transpose the date which contains the point estimates */ proc transpose data=work.point out=work.pointtrans(drop=_label_ rename=(col1=coef)); run;
![Page 22: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/22.jpg)
/** put data into memory */ data work.age615validboot; set nlscy.age615validboot; run; /** create empty dataset for coefficients */ data work.probitboot; set _null_; run; %global bt; %let bt=1000; /** 1000 bootstrap weights provided;*/
![Page 23: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/23.jpg)
%macro boot; options nonotes; %do i=1 %to &bt; proc logistic data=work.age615validboot noprint descending outest=work.est(keep=&allrhs); model &depvar =&indepvars / link=normit maxiter=50 rsq; weight bsw&i / norm; where validdis=1; title " mom 2000 "; run; data work.probitboot; set work.probitboot work.est; run; %end; options notes; %mend boot; %boot;
![Page 24: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/24.jpg)
/** calculate the standard deviation */ proc means data=work.probitboot n mean std ; output out=work.std std=&allrhs; run; proc transpose data=work.std(drop=_type_ _freq_) out=work.std2(drop=_label_ rename=(col1=se)); run;
![Page 25: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/25.jpg)
data work.final; merge work.pointtrans work.std2; /** Wald chi square */ z=coef/se; chi=z*z; pvaluechi=1-probchi(chi,1); run; proc print; title " married moms"; run;
![Page 26: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/26.jpg)
Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 5 1 -2.9050 0.1513 368.5150 <.0001 Intercept 4 1 -2.0956 0.1451 208.6086 <.0001 Intercept 3 1 -1.0202 0.1429 50.9855 <.0001 Intercept 2 1 0.2247 0.1424 2.4906 0.1145 hhdis00 1 0.3052 0.0427 51.1371 <.0001 momage00 1 0.00579 0.00314 3.4098 0.0648 momlthigh00 1 0.1499 0.0583 6.6078 0.0102 momcertdip00 1 -0.0731 0.0384 3.6231 0.0570 momunivdeg00 1 -0.1781 0.0433 16.9065 <.0001 momimm 1 0.3377 0.0419 64.9256 <.0001 eqinc00 1 -2.95E-6 6.018E-7 24.0756 <.0001 hhchlt500 1 -0.1872 0.0876 4.5628 0.0327 kids01700 1 -0.1262 0.0161 61.0665 <.0001 momvg94 1 0.6181 0.0350 312.6018 <.0001 momg94 1 1.1116 0.0458 589.8279 <.0001 momfp94 1 1.5644 0.0912 294.0294 <.0001 momsmokesdaily00 1 0.1706 0.0430 15.7629 <.0001
![Page 27: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/27.jpg)
The MEANS Procedure Variable N Mean Std Dev ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Intercept_5 1000 -2.9650753 0.3107804 Intercept_4 1000 -2.1470196 0.2770212 Intercept_3 1000 -1.0465351 0.2621726 Intercept_ 1000 0.2091371 0.2622451 hhdis00 1000 0.2846419 0.0973226 momage00 1000 0.0057067 0.0055820 momlthigh00 1000 0.1293874 0.0932894 momcertdip00 1000 -0.0739417 0.0772243 momunivdeg00 1000 -0.1852935 0.0980241 momimm 1000 0.3191519 0.1181139 eqinc00 1000 -3.090889E-6 1.1721765E-6 hhchlt500 1000 -0.1760001 0.1143188 kids01700 1000 -0.1148346 0.0351904 momvg94 1000 0.6399775 0.0754143 momg94 1000 1.1403891 0.1000578 momfp94 1000 1.6089774 0.1664408 momsmokesdaily00 1000 0.1618192 0.0882162 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
![Page 28: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/28.jpg)
Obs _NAME_ coef se chi pvaluechi 1 intercept_2 -2.90503 0.31078 87.376 0.00000 2 intercept_3 -2.09565 0.27702 57.228 0.00000 3 intercept_4 -1.02021 0.26217 15.143 0.00010 4 intercept_5 0.22473 0.26225 0.734 0.39147 5 hhdis00 0.30519 0.09732 9.834 0.00171 6 momage00 0.00579 0.00558 1.076 0.29961 7 momlthigh00 0.14987 0.09329 2.581 0.10815 8 momcertdip00 -0.07309 0.07722 0.896 0.34390 9 momunivdeg00 -0.17806 0.09802 3.300 0.06930 10 momimm 0.33771 0.11811 8.175 0.00425 11 eqinc00 -0.00000 0.00000 6.346 0.01176 12 hhchlt500 -0.18722 0.11432 2.682 0.10149 13 kids01700 -0.12618 0.03519 12.857 0.00034 14 momvg94 0.61807 0.07541 67.169 0.00000 15 momg94 1.11157 0.10006 123.417 0.00000 16 momfp94 1.56445 0.16644 88.349 0.00000 17 momsmokesdaily00 0.17064 0.08822 3.742 0.05307
![Page 29: Lynn Lethbridge SHRUG November, 2010. What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.](https://reader035.fdocuments.in/reader035/viewer/2022062518/5697bf721a28abf838c7e94d/html5/thumbnails/29.jpg)