Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry...

64
Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François Brisebois CCHS/NPHS senior methodologist [email protected]
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    2

Transcript of Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry...

Page 1: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Population Health SurveysBootstrap Hands-on Workshop

Yves Beland, CCHS senior methodologist

Larry MacNabb, CCHS dissemination manager

developed by

François Brisebois CCHS/NPHS senior [email protected]

Page 2: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Purpose of the presentation

Justify the use, understand the theory, and

get familiar with the bootstrap technique

Demystify all illusions about using the

bootstrap technique for variance estimation

Page 3: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Outline

Context NPHS \ CCHS Complex survey design Variance estimation \ Bootstrap 101 Data support \ using the bootvar program Why bootstrap? CV lookup tables Historical info about variance estimation for NPHS Variance estimation with other software programs Future for STC Health Surveys (re. bootstrap)

Page 4: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Context

A data user is interested in producing some results

1- Compute an estimate (total, ratio, etc.)

2- Compute the precision of the estimate (variance,

coefficient of variation (CV), etc.)

Page 5: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Context

1- Compute an estimate Is not a problem! Use the provided survey weight with

NPHS/CCHS files

Page 6: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Context

1- Compute an estimate (cont’d) Why use the survey weight?

# People % PeopleUnweighted 620 4.1Weighted 865,910 3.5Source: 1998 Master Health f ile

NPHS Estimates for Diabetes - Canada

Conclusion: ALWAYS USE THE WEIGHTS

Page 7: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Context

2- Compute the precision of an estimate Is a problem!!

Estimate Std Dev.Unweighted 4.1 0.162Weighted 3.5 0.151Bootstrap weights 3.5 0.177Source: 1998 Master Health f ile

NPHS Estimates for Diabetes - CanadaSTANDARD DEVIATIONS

% People

Page 8: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Context

2- Compute the precision of the estimate (cont’d) Scaled weights:

Scaled weight = weight / mean(weight) Used to overcome problems with the computation of

the variance for some statistics in SAS Reference: paper from G.Roberts & al.

Page 9: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Context

2- Compute the precision of the estimate (cont’d) Why such a difference?

Answer: The complex survey design is the main cause (other factors to be discussed later)

Note: CCHS and NPHS have slightly different frames but are both considered as complex survey designs

Page 10: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Complex survey design

1- Each province is divided into strata

Stratum #1

Stratum #2

Province AProvince A

Page 11: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

2- Selection of clusters within each stratum

Stratum #1

Stratum #2

Province AProvince A

Complex survey design

Page 12: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

3- Selection of households within each cluster

Stratum #1

Stratum #2

Province AProvince A

Complex survey design

Page 13: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

How does the sample design affect the precision of estimates? Stratification decreases variability (more precise)

Clustering increases variability (less precise)

Overall, the multistage design has the effect of

increasing variability (less precise than SRS)

Complex survey design

Page 14: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

So why use a multistage cluster sample design anyway?

Pros:

Efficient for interviewing (less travel, less costly)

Better coverage of the entire region of interest

Cons:

Problems for variance estimation

Complex survey design

Page 15: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Variance estimation with complex multistage cluster sample design:

Exact formula for variance estimation is too complex; use of an approximate approach required

NOTE: taking account for the design in variance estimation is as crucial as using the sampling weights for the estimation of a statistic

Bootstrap Method

Page 16: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Approximate methods for variance estimation: Taylor linearization Re-sampling methods:

Balanced Repeated Replication Jackknife Bootstrap

Bootstrap Method

Page 17: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Principle: You want to estimate how precise is your

estimation of the number of smokers in Canada You could draw 500 totally new samples, and

compare the 500 estimations you would get from these samples. The variance of these 500 estimations would indicate the precision.

Problem: drawing 500 new samples is $$$ Solution: Use your sample as a population, and take

many smaller subsamples from it.

Bootstrap Method

Page 18: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

How Bootstrap weights are created(the secret is finally revealed!!!)

Bootstrap 101

Starting point: Full data file (example presented for a given stratum)ID Wgt ClusterA 10 1B 10 1C 10 1D 10 2E 10 2F 10 2G 10 3H 10 3I 10 4J 10 4

Select n-1 clusters among n within each stratum (with replacement)ID Wgt Cluster B1 = # of times the cluster is selectedA 10 1 1B 10 1 1C 10 1 1D 10 2 1E 10 2 1F 10 2 1G 10 3 0H 10 3 0I 10 4 1J 10 4 1

Repeat the process 500 times (*BOOTSTRAP REPLICATES*)ID Wgt Cluster B1 B2 . . . . . . . . . . . . B500A 10 1 1 0 3B 10 1 1 0 3C 10 1 1 0 3D 10 2 1 1 0E 10 2 1 1 0F 10 2 1 1 0G 10 3 0 0 0H 10 3 0 0 0I 10 4 1 2 0J 10 4 1 2 0

Apply the survey weight (Wgt) (*BOOTSTRAP WEIGHTS*)ID Wgt Cluster B1 B2 . . . . . . . . . . . . B500A 10 1 10 0 30B 10 1 10 0 30C 10 1 10 0 30D 10 2 10 10 0E 10 2 10 10 0F 10 2 10 10 0G 10 3 0 0 0H 10 3 0 0 0I 10 4 10 20 0J 10 4 10 20 0

Adjust for the fact that we picked n-1 among n (factor = n / n-1 = 1.33)ID Wgt Cluster B1 B2 . . . . . . . . . . . . B500A 10 1 13 0 40B 10 1 13 0 40C 10 1 13 0 40D 10 2 13 13 0E 10 2 13 13 0F 10 2 13 13 0G 10 3 0 0 0H 10 3 0 0 0I 10 4 13 27 0J 10 4 13 27 0

USING THE BOOTSTRAP WGTS: Estimate the number of smokersID Wgt Cluster Smoke B1 B2 . . . . . . . . . . . . B500A 10 1 X 13 0 40B 10 1 X 13 0 40C 10 1 13 0 40D 10 2 13 13 0E 10 2 13 13 0F 10 2 13 13 0G 10 3 X 0 0 0H 10 3 0 0 0I 10 4 13 27 0J 10 4 X 13 27 0

40 39 27 . . . . . . . . . . . . 80

T = 40Var = (Bi - B)2 / 499

Page 19: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

How Bootstrap replicates are built (cont’d) The “real” recipe

1- Subsampling of clusters (SRS) within strata

2- Apply (initial design) weight

3- Adjust weight for selection of n-1 among n

4- Apply all standard adjustments (nonresponse, share, etc.)

5- Post-stratification to population counts

Bootstrap 101

Page 20: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

How Bootstrap replicates are built (cont’d) The bootstrap method intends to mimic the same approach

used for the sampling and weighting processes

Be careful: some software programs say they include the

bootstrap technique; what they really do is to skip steps #4 and

#5, and use directly the final weight in step #2

Bootstrap 101

Page 21: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Bootstrap 101

STC Methodologists create the bootstrap weight files.

Can you create your own bootstrap wgt file? No

Why? Because to do so you need to know:

The design information, i.e. strata, clusters (to generate the

bootstrap subsamples)

The definition of all adjustment classes (including post-

stratification)

Page 22: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Bootstrap 101

The bootstrap wgt files are:

Available for all file (except PUMF - confidentiality)

Distributed with the data files in separate files

The bootstrap wgt files contain: IDs (REALUKEY/SAMPLEID, PERSONID)

Final sampling weight (WTxx)

500 Bootstrap weights (BSW1--BSW500)

Page 23: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Bootstrap - Support

NPHS/CCHS provides data users with SAS & SPSS

macro programs to compute bootstrap variances

Macros simplifying computation of bootstrap variance

estimates for totals, ratio, differences of ratios, regressions

(linear and logistic), and basic generealized linear models

Come with documentation & examples

French and English

referred as “bootvar”

Page 24: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Example: Step by Step

Let’s get to work!

Goal: Interested in estimating the number of diabetics (total)

NPHS 1998-99 Dummy file (see information sheet)

# % of populationTotal cases DIAB DIAB / TOTAL

Diabetes (CCC6_1J), some totals and ratiosNPHS 1998-99 Dummy Health File

# % of populationTotal cases of diabetes

Diabetes (CCC6_1J), some totals and ratiosNPHS 1998-99 Dummy Health File

Page 25: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

STEP #1

Create your « analysis data file »

Read NPHS\CCHS data file

Prepare dummy variables

necessary for your analysis

Keep only necessary variables

(include geography desired)

Run the analysis to get point

estimates only (not necessary but recommended)

STEP #2

Compute your variances

with bootvar

Location of INPUT files: Your « analysis data file » The bootstrap weights file

Geography desired Number of bootstrap weights

to use

Specify the desired analysis Totals, ratios, diff of ratios Regression (linear & logit) Generalized linear modeling

Example: Step by Step

Page 26: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Example: Step by Step

Step #1: On your own

(but can use the examples provided as a starting point)

Step #2: Use the provided Bootvar program

Page 27: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

STEP #1

Read input file Create dummy variables Keep only necessary variables Run the analysis to get point estimates

Create dummy variables

For qualitative/categorical variables, we need to identify which value(s) we are interested in. This is done through the creation of a dummy variable

Dummy variable= 1 for characteristic of interest

= 0 otherwise

Page 28: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

STEP #1

Create dummy variable: example #1 During the past 12 months, how often did you drink

alcoholic beverages? (ALC8_2)1=Less than once a month2=Once a month3=2 to 3 times a month4=Once a week5=2 to 3 times a week6=4 to 6 times a week7=Every day

Interested in categories 1 to 4 (once a week or less) DRINK

= 1 if ALC8_2 is 1,2,3 or 4= 0 otherwise

Page 29: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

STEP #1

Create dummy variable: example #2

Diabetes (CCC8_1J) Sex (DHC8_SEX)1=Yes 1=Male2=No 2=Female6=Not applicable7=Don’t know9=Not stated

Interested in “males having diabetes” mdiab

= 1 if CCC8_1J = 1 and SEX =1= 0 otherwise

Page 30: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

STEP #1

Create dummy variable: example #2 How to use the dummy variable to get an estimate

Total:MDIAB WT56 (product)

0 100 00 200 01 300 3001 400 4001 500 5000 600 00 700 00 800 0

ESTIMATE = 1200

In SAS:

Proc freq; tables mdiab; weight wt56;run;

Page 31: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

STEP #1

Create dummy variable: example #2 How to use the dummy variable to get an estimate

Ratio:MDIAB TOTAL WT56 (num) (den)

0 1 100 0 1000 1 200 0 2001 1 300 300 3001 1 400 400 4001 1 500 500 5000 1 600 0 6000 1 700 0 7000 1 800 0 800

ESTIMATE = 1200 36001200 / 3600 = 33%

Page 32: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

# % of populationDiabetes (Nfld, Man & BC) 169,700 3.1

Diabetes (CCC6_1J), some totals and ratiosNPHS 1998-99 Dummy Health File

STEP #1

See example in SPSS

Page 33: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

# % of populationDiabetes (Nfld, Man & BC) 169,700 3.1Asthma (Nfld, Man & BC) ASTHMA ASTHMA / TOTAL

Diabetes & Asthma, some totals and ratiosNPHS 1998-99 Dummy Health File

# % of populationDiabetes (Nfld, Man & BC) 169,700 3.1Asthma (Nfld, Man & BC) 446,800 8.1

Diabetes & Asthma, some totals and ratiosNPHS 1998-99 Dummy Health File

STEP #1

Now your turn! (exercise #1)

Add asthma (CCC8_1C) to the table

Use existing program (step1.sas) and add SPSS codes to create a dummy variable for asthma; and then get the results

Page 34: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Step #2: Bootvar Program

Created by methodologists in 1997

(first used with NPHS cycle 2 data)

Version 1.0 one single program (over 1,000 lines of codes)

divided into 4 sections

users have to adapt the program to their requests; changes

in 3 sections

SAS: bootvar.sas / bootvarf.sas

SPSS: beta version available only on request (bvr_b.sps)

Page 35: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Version 2.0 Justifications:

Compatible with SAS 8+

Centralize the codes where modifications have to be done by the user

Can use with both NPHS and CCHS data files

Now consists of 2 programs

Contains the codes users need to modify for their requests

Contains the codes users do not have to modify (macros)

Step #2: Bootvar Program

Page 36: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Version 2.0 SAS version:

bootvare_v20.sas / bootvarf_v20.sas

macroe_v20.sas / macrof_v20.sas

SPSS version:

bootvare_v21.sps / bootvarf_v21.sps

macroe_v21.sps / macrof_v21.sps

Step #2: Bootvar Program

Page 37: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

STEP #2: Use of bootvar

Point estimates have already been obtained, let us now estimate the sampling variability of those estimates

Go through the bootvar program (bootvare_v21.sps)

Page 38: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

See example in SPSS

STEP #2: Use of bootvar

# % of pop.Diabetes 169,700 (133,400 ; 205,900) 3.1 (2.4 ; 3.8)Asthma 446,800 8.1

95% C.I.

Nfld, Man & B.C. only

95% C.I.

Diabetes & Asthma, some totals and ratiosNPHS 1998-99 Dummy Health File

Page 39: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

STEP #2

Now your turn! (exercise #2) Compute confidence intervals for asthma

Use bootvare_v21.sps and adjust it to obtain desired results(use the already set up step2.sps program for this exercise)

# % of pop.Diabetes 169,700 (133,400 ; 205,900) 3.1 (2.4 ; 3.8)Asthma 446,800 ? 8.1 ?Nfld, Man & B.C. only

95% C.I.

Diabetes & Asthma, some totals and ratiosNPHS 1998-99 Dummy Health File

95% C.I.# % of pop.Diabetes 169,700 (133,400 ; 205,900) 3.1 (2.4 ; 3.8)Asthma 446,800 (381,700 ; 511,900) 8.1 (6.9 ; 9.3)

95% C.I.

Nfld, Man & B.C. only

95% C.I.

Diabetes & Asthma, some totals and ratiosNPHS 1998-99 Dummy Health File

Page 40: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Why 500 bootstrap weights? Size of file (for dissemination)

Time of computation (for an average PC)

Accuracy

Use more bootstrap weights? Faster PC

Accuracy for small domains and more complex analysis

methods

Bootstrap - More

Page 41: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Confidentiality revealed from the bootstrap weights

Bootstrap - More

ID Wgt Cluster B1 B2 . . . . . . . . . . . . B500A 10 1 13 0 33B 10 1 13 0 33C 10 1 13 0 33D 10 2 13 14 0E 10 2 13 14 0F 10 2 13 14 0G 10 3 0 0 0H 10 3 0 0 0I 10 4 13 29 0J 10 4 13 29 0

ID Wgt Cluster B1 B2 . . . . . . . . . . . . B500A 10 ? 13 0 33B 10 ? 13 0 33C 10 ? 13 0 33D 10 ? 13 14 0E 10 ? 13 14 0F 10 ? 13 14 0G 10 ? 0 0 0H 10 ? 0 0 0I 10 ? 13 29 0J 10 ? 13 29 0

ID Wgt Cluster B1 B2 . . . . . . . . . . . . B500A 10 ? 13 0 33B 10 ? 13 0 33C 10 ? 13 0 33D 10 ? 13 14 0E 10 ? 13 14 0F 10 ? 13 14 0G 10 ? 0 0 0H 10 ? 0 0 0I 10 ? 13 29 0J 10 ? 13 29 0

Page 42: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Confidentiality revealed from the bootstrap weights (cont’d) How PUMF users estimate their exact variances?

Remote access Provide dummy file

(same structure as master files but contain dummy data) Test programs and send by e-mail

Research Data Centre Regional Offices

Bootstrap - More

Page 43: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Why Bootstrap?

Other techniques examined: Taylor, Jackknife Taylor:

Need to define a linear equation for each statistic examined

Jackknife: Can not disseminate because of confidentiality Number of replicates depends on the number of

strata (large number of strata in 1996 makes it impossible to disseminate)

Page 44: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Why Bootstrap?

Bootstrap: Handle more easily survey design with many strata Sets of 500 bootstrap weights can be distributed to

data users Recommended (over the jackknife) for estimating the

variance of nonsmooth functions like quantiles, LICO Reference: “Bootstrap Variance Estimation for the

National Population Health Survey”, D.Yeo, H.Mantel, and T.-P. Liu. 1999, ASA Conference.

Page 45: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Bootvar: exercise #3

Results for diabetes broken down by sex and province

# % of pop.Nfld,Man & BC 169,700 (133,400 ; 205,900) 3.1 (2.4 ; 3.8)Nfld

MalesFemales

ManitobaMalesFemales

B.C.MalesFemales

95% C.I.

Diabetes, some totals and ratiosNPHS 1998-99 Dummy Health File

95% C.I.# % of pop.Nfld,Man & BC 169,700 (133,400 ; 205,900) 3.1 (2.4 ; 3.8)Nfld DIAB TOTAL

Males MDIAB MDIAB / MTOTALFemales FDIAB FDIAB / FTOTAL

ManitobaMalesFemales

B.C.MalesFemales

95% C.I.

Diabetes, some totals and ratiosNPHS 1998-99 Dummy Health File

95% C.I.# % of pop.Nfld,Man & BC 169,700 (133,400 ; 205,900) 3.1 (2.4 ; 3.8)Nfld 24,900 (18,200 ; 31,500) 4.6 (3.4 ; 5.9)

Males 9,800 (4,600 ; 14,700) 3.7 (1.7 ; 5.6)Females 15,100 (10,000 ; 20,100) 5.6 (3.7 ; 7.4)

Manitoba 32,300 (20,400 ; 44,200) 3.0 (1.9 ; 4.1)Males 15,800 (7,300 ; 24,400) 3.0 (1.3 ; 4.5)Females 16,500 (8,000 ; 25,000) 3.0 (1.5 ; 4.7)

B.C. 112,500 (79,300 ; 145,600) 2.9 (2.0 ; 3.7)Males 68,700 (43,500 ; 93,900) 3.5 (2.2 ; 4.8)Females 43,700 (22,200 ; 65,300) 2.2 (1.1 ; 3.4)

95% C.I.

Diabetes, some totals and ratiosNPHS 1998-99 Dummy Health File

95% C.I.

Page 46: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Bootvar: Tricks

If you need to create a dummy variable for a characteristic based on many variables: Example: Males with diabetes First, create dummy variables for each individual

variable (males, diabetes) Then, create the dummy variable for the characteristic

by multiplying the individual dummy variables

Page 47: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Bootvar: Tricks

Example: Males = 1,0 (MALES)Diabetes = 1,0 (DIAB)Males having diabetes (MDIAB) = MALES * DIAB

MALES DIAB MDIAB

1 0 01 1 10 0 00 1 0

* =

Page 48: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Bootvar: Tricks

Use the REGION parameter in bootvar to specify a “stratification” variable (doesn’t have to be a geographic variable!) Example: REGION = sex

will produce results by sex

Page 49: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

CV look-up tables

What is it? Approximate sampling variability tables Produced for Canada, each province, and by age groups

for Canada (also by Health Regions for cycle 2) Useful only for categorical estimates

Totals & ratios only

Page 50: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Approximate Sampling Variability Tables for MANITOBA - Selected Members

NUMERATOR OF ESTIMATED PERCENTAGE PERCENTAGE ('000) 0.1% 1.0% 2.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0% 35.0% 40.0% ………

1 103.6 103.2 102.6 101.1 98.4 95.6 92.7 89.8 86.7 83.6 80.3 2 ******** 72.9 72.6 71.5 69.5 67.6 65.6 63.5 61.3 59.1 56.8 3 ******** 59.6 59.3 58.3 56.8 55.2 53.5 51.8 50.1 48.3 46.4 4 ******** 51.6 51.3 50.5 49.2 47.8 46.4 44.9 43.4 41.8 40.2 5 ******** 46.1 45.9 45.2 44.0 42.7 41.5 40.2 38.8 37.4 35.9 6 ******** 42.1 41.9 41.3 40.2 39.0 37.9 36.7 35.4 34.1 32.8 7 ******** 39.0 38.8 38.2 37.2 36.1 35.0 33.9 32.8 31.6 30.4 8 ******** 36.5 36.3 35.7 34.8 33.8 32.8 31.7 30.7 29.6 28.4 9 ******** 34.4 34.2 33.7 32.8 31.9 30.9 29.9 28.9 27.9 26.8 10 ******** 32.6 32.5 32.0 31.1 30.2 29.3 28.4 27.4 26.4 25.4 11 **************** 30.9 30.5 29.7 28.8 28.0 27.1 26.2 25.2 24.2 12 **************** 29.6 29.2 28.4 27.6 26.8 25.9 25.0 24.1 23.2 13 **************** 28.5 28.0 27.3 26.5 25.7 24.9 24.1 23.2 22.3 14 **************** 27.4 27.0 26.3 25.5 24.8 24.0 23.2 22.3 21.5 15 **************** 26.5 26.1 25.4 24.7 23.9 23.2 22.4 21.6 20.7 16 **************** 25.7 25.3 24.6 23.9 23.2 22.4 21.7 20.9 20.1 17 **************** 24.9 24.5 23.9 23.2 22.5 21.8 21.0 20.3 19.5 18 **************** 24.2 23.8 23.2 22.5 21.9 21.2 20.4 19.7 18.9 19 **************** 23.5 23.2 22.6 21.9 21.3 20.6 19.9 19.2 18.4 20 **************** 22.9 22.6 22.0 21.4 20.7 20.1 19.4 18.7 18.0 21 **************** 22.4 22.1 21.5 20.9 20.2 19.6 18.9 18.2 17.5 22 ************************ 21.5 21.0 20.4 19.8 19.1 18.5 17.8 17.1 23 ************************ 21.1 20.5 19.9 19.3 18.7 18.1 17.4 16.7 24 ************************ 20.6 20.1 19.5 18.9 18.3 17.7 17.1 16.4 25 ************************ 20.2 19.7 19.1 18.5 18.0 17.3 16.7 16.1 30 ************************ 18.4 18.0 17.5 16.9 16.4 15.8 15.3 14.7 35 ************************ 17.1 16.6 16.2 15.7 15.2 14.7 14.1 13.6 40 ************************ 16.0 15.6 15.1 14.7 14.2 13.7 13.2 12.7 45 ************************ 15.1 14.7 14.2 13.8 13.4 12.9 12.5 12.0 50 ************************ 14.3 13.9 13.5 13.1 12.7 12.3 11.8 11.4

CV look-up tables

Page 51: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Sampling Variability Guidelines

Type of estimate CV Guidelines

Acceptable 0.0-16.5 General unrestricted release

Marginal 16.6-33.3 General unrestricted release but withwarning cautioning users of the

highsampling variablitity.

Should be identified by letter M.

Unacceptable > 33.3 No release.

Should be flagged with letter U.

Page 52: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

CV look-up tables

Manitoba total: T=32K Cvtable =18%, BTS = 18.7%Manitoba Males : T=16K Cvtable=25.7%, BTS=27.6%Manitoba Females: T=16.5K Cvtable=25.3%, BTS=26.4%

Comparison between bootstrap CV and CV from lookup table For number of people having diabetes:

Page 53: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

CV look-up tables

Other examples (from master - general file) Number of people experiencing food insecurity:

Number of people in the lowest income quintile:

Comparison between bootstrap CV and CV from lookup table

Manitoba total: T=40K Cvtable =11.9%, BTS = 19.8%

Manitoba total: T=118K Cvtable =6.4%, BTS = 11.2%

Page 54: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Bootvar: Regression models

Logistic regression model log (Y) = intercept + b1*X1 + b2*X2

→Y has to be qualitative (categorical) (for now assume it is dichotomous, i.e. 0,1)

→Xi can be quantitative or qualitative variables

Page 55: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Bootvar: Regression models

Logistic regression model Example: Diabetes vs sex and age

→Categorical variables need to be dichotomized (“dummied”; 1 variable for each category except 1)

→Sex: if sex=2 then FEMALE = 1; else FEMALE = 0;→Age: create a variable for people over 60

(if age > 60 then OVER60=1; else OVER60=0)→The model is:

DIAB = intercept + b1*FEMALE + b2*OVER60

Page 56: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Bootvar: Regression models

Logistic regression model Example: Diabetes vs sex and age

DIAB = intercept + b1*FEMALE + b2*OVER60

In bootvar, use %logreg macro

%logreg(yvar,xvar);

%logreg(DIAB,FEMALE OVER60);

Page 57: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Bootvar: Regression models

Linear regression model Y = intercept + b1*X1 + b2*X2

→Y is quantitive

→Xi can be qualitative (categorical) or quantitative

Page 58: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Bootvar: Regression models

Linear regression model Example: BMI (body mass index) vs sex and age

→Categorical variables need to be dichotomized (“dummied”; 1 variable for each category except 1)

→Sex: if sex=2 then FEMALE = 1; else FEMALE = 0;→Age: use it as quantitative (single year of age)→The model is:

BMI = intercept + b1*FEMALE + b2*AGE

Page 59: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Bootvar: Regression models

Linear regression model Example: BMI vs sex and age

BMI = intercept + b1*FEMALE + b2*AGE

In bootvar, use %regress macro

%regress(yvar,xvar);

%regress(BMI,FEMALE AGE);

Page 60: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Bootvar: testing

For version 2.0/2.1: Simply set 2 < B < 500

For version 1.0: See documentation!

Page 61: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Historical info about variance estimation for NPHS

Cycle 1: Use of Jackknife technique Could not disseminate with public-use microdata

files; only custom requests Cycle 2 & +: Use of bootstrap technique

Can not disseminate ….; custom requests or remote access

All cycles: CV look-up tables for large domains (provinces, age groups) only good for totals, ratios, and differences of ...

Page 62: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Variance estimation with other software programs

WesVar (SPSS)

SAS

SUDAAN

STATA

Page 63: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Future for Stats Can Health Surveys (vs. bootstrap)

NPHS Cycle 4 (2000-2001) data processing & weighting

Promote the use of longitudinal data

Bootstrap pgms: finalize version 2.0 (SAS & SPSS)

CCHS Cycle 1.1 bootstrap weights

Bootstrap also used for variance estimation (same programs as for NPHS)

Page 64: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François.

Contacts

Health Pgm Surveys Manager: Lorna Bailie ([email protected])

NPHS Manager: France Bilocq ([email protected])

CCHS Manager: Marc Hamel ([email protected])

CCHS Dissemination manager: Larry MacNabb ([email protected]

Senior Methodologists: François Brisebois

([email protected])

Mylène Lavigne ([email protected])

Yves Béland ([email protected])

Data Access Services Manager: Mario Bédard ([email protected])

Custom Services Requests: Garry Macdonald ([email protected])

Population Health Surveys