Supported by a grant from The Robert Wood Johnson Foundation

21
Evaluating Methods of Standard Error Estimation for Use with the Current Population Survey’s Public Use Data The Hawaii Coverage For All Technical Workshop Honolulu, Hawaii February 7, 2003 Presented by: Michael Davern, Ph.D. University of Minnesota Division of Health Services Research and Policy Supported by a grant from The Robert Wood Johnson Foundation

description

Evaluating Methods of Standard Error Estimation for Use with the Current Population Survey’s Public Use Data The Hawaii Coverage For All Technical Workshop Honolulu, Hawaii February 7, 2003 Presented by: Michael Davern, Ph.D. University of Minnesota - PowerPoint PPT Presentation

Transcript of Supported by a grant from The Robert Wood Johnson Foundation

Page 1: Supported by a grant from The Robert Wood Johnson Foundation

Evaluating Methods of Standard Error Estimation for Use with

the Current Population Survey’s Public Use Data

The Hawaii Coverage For All Technical Workshop

Honolulu, HawaiiFebruary 7, 2003

Presented by:

Michael Davern, Ph.D.University of Minnesota

Division of Health Services Research and PolicySchool of Public Health

Supported by a grant from The Robert Wood Johnson Foundation

Page 2: Supported by a grant from The Robert Wood Johnson Foundation

This paper is a Work in Progress

• Paper is co-authored with:James Lepkowski, University of Michigan

Gestur Davidson, University of Minnesota/SHADAC

Arthur Jones Jr., US Census Bureau

Lynn A. Blewett, University of Minnesota/SHADAC

• Estimates have not cleared final Census Review– Estimates are therefore PRELIMINARY

– We hope to present it at AAPOR in May of 2003

Page 3: Supported by a grant from The Robert Wood Johnson Foundation

The Problem:

• CPS is a complex survey– Sample Design information is necessary to estimate appropriate

standard errors– Important components of the sampling design are not released

to the public

• Public use data are widely used by policy-makers and academics– Significance tests in research are likely biased due to standard

error estimation– These significance tests provide important rules for “evidence” in the policy analysis and academic literature

Page 4: Supported by a grant from The Robert Wood Johnson Foundation

The Result:

• Thus what constitutes “evidence” in policy analysis and academic journals—and the inferences drawn from that evidence--may not be valid

• In other words: What we know from research using Census Bureau public use data products may not be usefully accurate

• In a quick search we found over 50 journal articles in the top social science journals that used Census Bureau public Use data.

Page 5: Supported by a grant from The Robert Wood Johnson Foundation

The Analysis:

• We identified four approaches to estimating the standard error on the public use data– The Simple Random Sample (SRS) approach– Generalized variance parameter (GVP)

approach (Census Bureau’s Standard)– Robust variance estimation (aka sandwich

estimator or Huber-White estimator)– Taylor Series with a stratum and cluster variable

defined

Page 6: Supported by a grant from The Robert Wood Johnson Foundation

The Data:

• The CPS uses a complex sampling design with the following features:– Country is divided into Primary Sampling Units

• A PSU is a county or group of contiguous counties

• “Self-representing” PSUs are Metro Areas that are selected with certainty

• Non-self-representing PSUs are sampled through a stratification process within each state

– Within PSUs, a groups of housing units are identified and called Ultimate Sampling Units (USUs)

Page 7: Supported by a grant from The Robert Wood Johnson Foundation

The Data:

– On average 4 housing units are selected from a USU using a systematic sampling method

– Information is collected on everyone within a selected household

– Due to the rotation schedule, about 45 percent of the households that were interviewed in the monthly CPS were interviewed in the previous year during that month.

Page 8: Supported by a grant from The Robert Wood Johnson Foundation

The Variables and Standard Error Estimation

• We run the state rates of health insurance coverage, and poverty. We also run the state average income

• We estimate the standard errors for these rates/averages in the following manner:– SRS uses normalized weights and conventional

calculations to determine standard errors– GVP approach uses the parameters in the Source and

Accuracy Statement from the Census Bureau to correct for the complex sampling design (this is the technique used by the Census).

Page 9: Supported by a grant from The Robert Wood Johnson Foundation

Standard Error Estimation

– Robust standard errors use the person weights to account for the degree of heterogeneity in the probability of selection

– Taylor Series on the Public Use file uses the ‘Lowest’ level of identifiable geography as the stratum variable and household as the cluster variable

• Lowest level of identifiable geography is either: – (1) largest 250 MSAs,

– (2) Other counties with over 100,000 in population,

– (3) non-MSA and non-identified county within a state

Page 10: Supported by a grant from The Robert Wood Johnson Foundation

The Standard Error “Standard”

• Ultimate Cluster Method is the current standard way to estimate standard errors for survey data – Taylor series combined with an identified ultimate

cluster and stratum variable

– The Ultimate cluster for the CPS is the PSU

– We used the Census internal data that has the PSU identifiers

• In the Taylor Series the State is stratum and PSU is cluster (except DC)

Page 11: Supported by a grant from The Robert Wood Johnson Foundation

These Results are Preliminary and Subject to Internal Census

Bureau Review

Please do not cite our work without permission

Page 12: Supported by a grant from The Robert Wood Johnson Foundation

Table 1

United States 85.39% 0.08% 21.83% 81.49% -7.34% 567.74%

Hawaii 90.4% 0.53% 5.68% 56.74% -15.39% -4.75%Rhode Island 92.3% 0.45% 10.97% 58.56% -21.62% -2.24%Vermont 90.4% 0.52% 22.29% 49.80% -21.60% 1.69%Illinois 86.4% 0.39% 9.12% 58.40% -12.05% 335.20%New York 84.5% 0.34% 7.70% 50.80% -17.40% 524.75%California 80.5% 0.30% 8.96% 73.53% -5.51% 542.96%

Average Change 8.18% 54.18% -16.92% 137.71%Source: 2002 Current Population Survey Annual Demographic Supplements

Percent Change From SRS Method

Robust Variance Estimation

Taylor Series On Public Use File

Generalized Variance Estimation

Taylor Series On Internal Census File

State Health Insurance Coverage Rates and Standard Error Computation Comparisons by Year: 2001

2001 Coverage Estimate

Simple Random Sample (SRS) Standard Error

Page 13: Supported by a grant from The Robert Wood Johnson Foundation

Table 2

United States 11.67% 0.07% 19.92% 107.39% 101.68% 335.16%

Arizona 14.6% 0.64% 4.59% 91.37% 95.69% -12.83%Iowa 7.4% 0.44% 8.65% 61.38% 81.98% 10.09%Indiana 8.5% 0.45% 8.45% 78.05% 72.58% 49.52%Hawaii 11.4% 0.57% 4.84% 86.49% 84.15% 191.51%Michigan 9.4% 0.37% 6.50% 70.03% 78.60% 407.62%Pennsylvania 9.6% 0.34% 7.41% 82.21% 79.53% 444.93%New York 14.2% 0.32% 4.28% 76.41% 79.77% 516.39%

Average Change 6.56% 76.66% 80.82% 189.57%Source: 2002 Current Population Survey Annual Demographic Supplements

State Poverty Rates and Standard Error Computation Comparisons by Year: 2001

2001 Poverty Rate

Simple Random Sample (SRS) Standard Error

Percent Change From SRS Method

Robust Variance Estimation

Taylor Series On Public Use File

Generalized Variance Estimation

Taylor Series On Internal Census File

Page 14: Supported by a grant from The Robert Wood Johnson Foundation

Table 3

United States 29,089 120 0.59% 23.04% 199.18% 206.66%

Hawaii 26,607 717 0.03% 4.40% 69.15% 7.44%Arkansas 22,448 708 3.28% 2.89% 209.33% 19.25%Oklahoma 23,850 741 -1.84% -2.40% 161.53% 37.23%Delaware 31,758 993 9.62% 9.32% 35.49% 256.51%New York 30,778 677 8.99% 17.55% 160.20% 315.80%Washington 29,413 680 4.70% 8.39% 352.57% 339.78%

Average Change 5.82% 7.16% 152.73% 122.73%Source: 2002 Current Population Survey Annual Demographic Supplements

Robust Variance Estimation

Taylor Series On Public Use File

Generalized Variance Estimation

Taylor Series On Internal Census File

Average State Earned Income and Standard Error Computation Comparisons by Year: 2001

2001 Average Income

Simple Random Sample (SRS) Standard Error

Percent Change From SRS Method

Page 15: Supported by a grant from The Robert Wood Johnson Foundation

Findings

• Health Insurance Coverage on Average:– Robust is 8% larger than SRS– Taylor Series public use file is 54% larger than

SRS– GVP is 17% smaller than SRS– Taylor Series on internal file is 138% larger than

SRS

Page 16: Supported by a grant from The Robert Wood Johnson Foundation

Findings

• Percent in Poverty on Average:– Robust is 7% larger than SRS– Taylor Series public use file is 77% larger than

SRS– GVP is 81% larger than SRS– Taylor Series on internal file is 190% larger than

SRS

Page 17: Supported by a grant from The Robert Wood Johnson Foundation

Findings

• Individual (adult) Income on Average:– Robust is 6% larger than SRS– Taylor Series public use file is 7% larger than

SRS– GVP is 154% percent smaller than SRS– Taylor Series on internal file is 123% larger than

SRS

Page 18: Supported by a grant from The Robert Wood Johnson Foundation

Discussion

• GVPs are all over the board compared to the Standard Error “Standard”– Std. Errors for Income are too high, for poverty too low and health

insurance they are way too low

• Robust Std. Error estimates are consistently too small– The main cause of standard error inflation is not differential

probability of selection but rather intra-cluster correlation

• To the extent households have a high intra-cluster correlation, then the Taylor Series is better than the 3 other public use file estimates – Poverty and health insurance have high intra-household correlations

but not individual income

Page 19: Supported by a grant from The Robert Wood Johnson Foundation

Discussion

• Larger states are likely to have increased numbers of PSUs in the Census Internal file than are recognized in the Public Use File (where we only see their aggregation)

• By their very construction, the increased number of PSUs result in more “within-PSU” homogeneity being recognized:– States with more PSU’s in the internal data have much higher Std.

Errors (using the “Standard”) than currently being estimated– Greater homogeneity within PSUs or households reduces the

“effective” sample size (there is less ‘independent’ information than the full sample size would suggest)

• Consequences of this especially with health insurance and poverty estimates, as expected.

Page 20: Supported by a grant from The Robert Wood Johnson Foundation

Conclusion

• Census is not going to release PSU identifiers to public

• The data are widely used for important policy and academic research– The work done on public use file has biased standard errors

and may not support inferences by meeting the statistical standard for evidence

• Therefore, I feel it is the responsibility of the Census Bureau to improve its GVPs or come up with a better substitute– What is currently offered is inadequate

Page 21: Supported by a grant from The Robert Wood Johnson Foundation

SHADAC Contact Information

www.shadac.org2221 University Avenue, Suite 345

Minneapolis Minnesota 55414

(612) 624-4802

Principal Investigator: Lynn Blewett, Ph.D. ([email protected])

Co-Principal Investigator: Kathleen Call, Ph.D. ([email protected])

Center Director: Kelli Johnson, M.B.A. ([email protected])

Senior Research Associate: Timothy Beebe, Ph.D. ([email protected])

Research Associate: Michael Davern, Ph.D. ([email protected])