Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota [email protected]...
-
Upload
leo-griffin -
Category
Documents
-
view
214 -
download
0
Transcript of Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota [email protected]...
Using IPUMS.org
Katie GenadekMinnesota Population Center
University of [email protected]
The IPUMS projects are funded by the National Science Foundation and the National Institutes of Health
00:00
Overview
• What is IPUMS?• Microdata and Summary Data• IPUMS-USA• IPUMS-CPS• Online Analysis System• Online Demonstration• Questions
00:44
What is IPUMS?
Integrated - consistent codes, labels, and documentation
Public Use - anonymized, downloadable
Microdata - individual-level
Series - pooled data over time and place
1:26
But, What is IPUMS Data?Individual level:
Demographic DataCensus DataSurvey DataHealth DataHistorical DataMigration DataTime Use Data
Summary level: Demographic Data
Census DataHistorical DataMapping Data
2:09
MPC Data Projects
http://www.ipums.org/
2:41
MICRODATA AND SUMMARY DATAMicrodata:
4:40
Microdata versus Summary Data
Shows full range of responses for individuals
Enable custom tables and sophisticated analyses
Suppression: geography, truncation, and item level suppression
Premade or published tables of aggregate characteristics
Enable examination of small geographic areas
Suppression: limited content, grouped intervals, and cell suppression
Microdata Summary Data
4:40
Summary Data
5:44
H910000240000000088001001000220100P910000020101032120010010010011504P910000010201036220010010010011999P910201000301011220060010010011999P910201000301009120060010010011999P910201000301007120060010010011999P910201000301006120060010010011999P910201000301004220060010010011999P910201000301003220060010010011999P910201000301002220060010010011999H910000240000000088001001000110100P910000020101030110010290510511310P910000010201021210010290290171999P910201000301001110060010290291999H910000240000000088001001000220100P910000020101045120010010010011100P910000010201025220010010010011820P910201000301007220060010010011999H910000240000000088001001000220100P910000020101049120010010010011100P910000010201049220010010010011820P910201000301019220060010010011820P910201000301015220060010010012820
Household record(shaded) followedby a person recordfor each member of the household
Relationship
AgeSexRace
BirthplaceMother’s birthplace
Occupation
For each type ofrecord, columns
correspond tospecific variables
IPUMS Data Structure
5:54
IPUMS-USAMicrodata Data:
6:50
IPUMS-USA• Database includes public use microdata
samples:• U.S. decennial censuses (1850-2000)• Complete-count dataset for 1880• Linked Samples 1850 – 1930• Samples from Puerto Rico (1910-2008)• American Community Survey (2000-2009)
• The first MPC data project• Most widely used database ~ 30,000 users
6:53
Census SamplesCensus Year
Sample Density
Number of persons in dataset
1850 1% 198,000
1860 1% 354,000
1870 1% 428,000
1880 100% 50,300,000
1900 6% 5,189,000
1910 1.4% 1,265,000
1920 1% 1,037,000
1930 5% 6,060,000
1940 1% 1,351,000
1950 1% 1,922,000
1960 1% 1,780,000
1970 6% 12,180,000
1980 9% 20,403,000
1990 6% 15,000,000
2000 6% 16,885,000
8:31
The American Community Survey • Replaced the long form of the Decennial Census
– Demonstration stage: 2000 to 2004– Full implementation 2005, group quarters added 2006
• Rolling sample designMicrodata samples:• Full survey responses for 1% of US population• Yearly samples, multi-year samples
9:18
ACS Samples Year
SampleDensity
Number of Persons in dataset
2000 1 in 750 372,000
2001 1 in 230 1,200,000
2002 1 in 260 1,075,000
2003 1 in 230 1,200,000
2004 1 in 240 1,194,000
2005 1 in 100 2,878,000
2006 1 in 100 2,970,000
2007 1 in 100 3,100,000
2008 1 in 100 3,001,000
2009 1 in 100 3,030,700
10:03
Census and ACS Variable Topics
• Basic demographic• Marriage• Family structure• Fertility• Ethnicity• Disability
• Education• Work• Income• Migration• Housing
Characteristics
10:13
Geography Limitations• No confidentiality restrictions for samples
prior to 1940 – no geographic limitation• Samples from 1940-1970
– Limited and inconsistent geographic identifiers • Recent samples:
– State– Some Metropolitan Areas– County Groups – Public Use Microdata Areas (PUMAs)
10:43
What are PUMAs?
• Public Use Microdata Areas (PUMAs)
• Comprised of approximately 100,000 persons
• Boundaries do not always align with jurisdictional boundaries
• Detailed contents and maps available
• GIS shape files for PUMAs available
11:26
IPUMS-CPSMicrodata Data:
11:53
Current Population Survey (CPS)• Administered starting 1940• Monthly survey administered by the Bureau of
Labor Statistics • Household survey was designed to measure
unemployment• Source of the official Government statistics on
employment and unemployment• In 2009 - 57,000 households interviewed monthly
11:55
Current Population SurveyMarch Supplement
• All March respondents• Additional respondents from February, March
and November monthly samples• Data are collected for Armed Forces members
residing with their families• March Annual Social and Economic Supplement
is the most widely used by social scientists and policymakers
12:20
Current Population SurveyMarch Supplement
• Labor force participation and unemployment• Work experience and educational attainment• Sources of income including non-cash benefits • Program participation • Tax filing status• Health Insurance• Migration
12:51
IPUMS - CPS• All March Data (Back to 1962)• Basic Monthly Surveys
– Samples from 2000-2008 (back to 1976 soon)– Data for every month– ~50,000 households surveyed each month– Less variables than March supplement
• Demographic information• Family characteristics• Employment status• Education information
13:16
ONLINE ANALYSIS SYSTEMObtaining Data:
14:26
Online Analysis System
• High-speed tabulation software developed at UC-Berkeley
• Allows for analysis of microdata without statistical package
• All analysis performed online• Can analyze multiple years of data• Help guides on webpage
14:26
Features• Data analysis capabilities
– Frequencies and cross tabulations (including charts) – Comparisons of means (with complex std errors) – Correlation matrix – Comparisons of correlations – Regression (ordinary least squares) – Logit and probit regression – List values of individual cases
15:02
Where is this online tabulator?
• Follow the link ‘Analyze Data Online’ from the homepage of:– usa.ipums.org/usa/– cps.ipums.org/cps/
• Select all samples of year of interest in USA• Open IPUMS-USA or CPS in additional tab
for documentation
15:41
USE THIS DATAObtaining Data:
16:00
Microdata for Analysis• Documentation is Important!!!
– Use the IPUMS documentation– Be aware of top/bottom codes, NIU codes, and
missing data codes– Know the universe – who got asked the question
• Weights – makes estimates representative– See additional weights presentation
• Sample size is important – Check analysis without weights
16:01
Microdata for Analysis
• Allows more complex analysis then summary data
• Geographic Restrictions– State Level Analysis– Metro Area level Analysis
• Time series – change over time• Not downloading tons of tables
18:43
IPUMS is Awesome• Comprehensive online documentation• Integration makes analyzing change over time
possible• Data analysis system allows you access the
data and analyze it online• All of the data are available for free online• User support is available by e-mail to help you
as needed
19:31
Social Explorer - Shout Out
• Produces online maps and data reports
• Based on boundary files made available through NHGIS
• Map changes in census data over time
• http://www.socialexplorer.com/
20:23
DISCUSSION OF “WEIGHTING” ANDONLINE DEMO OF IPUMS
Obtaining Data:
20:54