“OnTheMap” The Census Bureau’s New Tool for Residence-Workplace Analysis Fredrik Andersson and...

Post on 24-Dec-2015

212 views 0 download

Tags:

Transcript of “OnTheMap” The Census Bureau’s New Tool for Residence-Workplace Analysis Fredrik Andersson and...

“OnTheMap” The Census Bureau’s New Tool

for Residence-Workplace Analysis

Fredrik Andersson and Jeremy WuMay 7, 2007

Daytona Beach, FL

Overview of Presentation

1. Live demo of OnTheMap (Jeremy)2. The Disclosure Avoidance Protocol

for OnTheMap (Fredrik)3. Analytical validity and

confidentiality protection (Fredrik)4. Data Access (Fredrik)

1. Demonstration of OnTheMap

www.census.gov (Local Employment Dynamics) (Local Employment Dynamics)http://lehd.did.census.gov

On The Map v.1LEHD’s online dynamic

mapping tool

On The Map v.1LEHD’s online dynamic

mapping tool

17 states online Completed 12/06

Where do workers live? Where do people work? Companion reports on

age, earnings, and industry

First partial synthetic data product

User select areas Block is base unit for

display; block group is base unit for report

Modular geographic layers such as community colleges and zip codes

Optional Layers

Where are Workers Residing in Sausalito, CA Employed?

Concentric Circle Report

On The Map Version 2On The Map Version 2

• Up to 44 LED partner states• Add 2004 data to 2002-2003 data• Cross-state patterns for all states• Enhanced multi-year reports• Additional geographies• Will become available in phases between

April and September 2007

2. The Disclosure Avoidance Protocol for OnTheMap

The Challenge: Maximize Analytical Validity of Data Subject to Strict Confidentiality Protection Constraints

Ana

lyti

cal V

alid

ity

of D

ata

Degree of confidentiality protection

Synthetic Data

Cell Suppressio

n

Basic Facts about the Disclosure Protection System for OnTheMapGoal: “to protect confidentiality while preserving

analytical validity of data”– No cell suppression– Synthetic place of residence data – Workplace data protected by QWI disclosure

protection system (“dynamically consistent noise infusion”)

First-ever data product released by a Statistical Agency (Feb 2006) that relies on synthetic data method as its primary disclosure avoidance technique

Disclosure Avoidance

Bayesian statistical techniques to create a partially synthetic version of the confidential data– Block of origin counts sampled from a

posterior predictive distribution conditional on destination block and worker characteristics (earnings, industry, age, ownership sector)

– The posterior predictive distribution is derived from combining the likelihood (“true data”) with a prior

So, what does this really mean???

Creation of Synthetic Data

0%5%

10%15%20%25%30%35%40%45%50%

A B C D

Home Block

Likelihood Distribution

Prior Distribution

PPD for large population

PPD for small population

Fictional example: Distribution of place of residence for workers in a specific block, industry, earnings category, age category, ownership sector

Q: Why not sample directly from the likelihood/What’s the role of a prior?Q: How are the priors constructed?Q: How much weight is given to the prior?

Key Implication

The relative weight of the prior when sampling from the posterior distribution is inversely related to the size of the population being synthesized– For larger populations the synthetic place of residence

data closely mimic underlying data– For small populations the synthetic place of residence

data are relatively more “noisy” to protect confidentiality

Important to keep in mind when making inferences using OnTheMap

How “noisy” an estimate is can be assessed by taking advantage of all 10 implicates of the synthetic data available on the virtual RDC

3. Analytical Validity & Confidentiality Protection

The residence patterns in synthetic data mimic confidential data well

Level of protection increases as population in work block decreases

Key Properties in data, such as commute distance, are preserved in synthetic data

4. Data Access

OnTheMap Data

------------------ (public use data) ------------------- • Origin-Destination (OD) matrix• Residence Area Characteristics• Workplace Area Characteristics• Quarterly Workforce Indicators (QWI)--------------- (below not distributed) ---------------• TIGER files• Geographic shape files, etc.

OnTheMap Data

• There are 10 implicates; only the first is used in OnTheMap at this time

• 2002-2004• OnTheMap v2 for 17 states to be released

May 31 – these and future data to be made available within 6 weeks of release

• OnTheMap v1 data will be withdrawn in June

Cornell CISER Site

http://vrdc.ciser.cornell.edu/onthemap/doc/

• No project approval needed

• Email Virtualrdc@cornell.edu to register

• Read documentation and descriptions

• Very limited support

• Not affiliated with the Census Bureau

Getting Your Feedback

Join the OnTheMap listserv:

http://lists.census.gov/mailman/listinfo/lehd-onthemap

Or send an email with Yes in the subject line to

dsd.local.employment.dynamics@census.gov

Contact Us

Program ManagerJeremy.S.Wu@census.gov

General Comments/SuggestionsFredrik.Andersson@census.gov

dsd.local.employment.dynamics@census.gov

Websitehttp://www.census.gov (Local Employment Dynamics)

http://lehd.did.census.gov