356 Barrows Hall Jon Stiles, Social Sciences Data Laboratory (D-Lab)
-
Upload
eustacia-patterson -
Category
Documents
-
view
213 -
download
0
Transcript of 356 Barrows Hall Jon Stiles, Social Sciences Data Laboratory (D-Lab)
356 Barrows Hall http://dlab.berkeley.edu
Jon Stiles, Social Sciences Data Laboratory (D-Lab)
D-Lab: New (~18 months old) lab supporting social science researchers
Workshops: Tools (e.g. Stata, R, Python, nVivo)Methods (e.g.GIS, HLM, survey sampling)Data (e.g. Census, ICPSR, Roper, Health)
Consulting: 10-12 consultants, web-based scheduling
Infrastructure: Workstations, Meeting/Teaching Space, Cold room
Community: Working Groups
Resources for Berkeley Researchers
Today: Brief Overview of Census Data and Resources
Huge data collection budget ($5-12 Billion annually) Even more money allocated on basis of data
collection (~ $400 Billion annually) Most widely used social science data
– High quality sample frames– Large samples sizes, small geographies– Consistency
Why is the Census Bureau important?
Broad Data Collections
Population & Housing Census - every 10 years Economic Census - every 5 years Census of Governments - every 5 years
American Community Survey – annually Many additional surveys -- both Demographic &
Economic Economic Indicators - each indicator is released on a
specific schedule
Broad Data Collections
Population & Housing Census – Every 10 years – Full enumeration– Mixed mode (mail-in, CATI, in-person)– Long form/short-form (2000 and earlier)– Multiple data releases
Census 2010: ContentCensus 2010: Content
10 Questions10 Questions NameName SexSex AgeAge Relationship (to Household Head)Relationship (to Household Head) Hispanic OriginHispanic Origin RaceRace Owner/Renter StatusOwner/Renter Status
PlusPlus Whether each member sometimes lives/stays elsewhereWhether each member sometimes lives/stays elsewhere
Total number living in residenceTotal number living in residenceProbe for unreported personsProbe for unreported personsTelephone contactTelephone contact
Census 2010: Product DetailCensus 2010: Product Detail
P.L. 94-171 (Redistricting Data)
State and sub-state counts down to the block level are shown for the total population and the population 18 years and over for 63 race groups; and not Hispanic or Latino origin by 63 race groups. Also shown are housing unit counts by occupancy
status (occupied units, vacant units).
Census 2010: Product DetailCensus 2010: Product Detail
Summary File 1 (SF1)Summary File 1 (SF1) About 300 tables Counts and cross tabulations Counts for detailed race, Hispanic or Latino groups, and
American Indian/Alaska Native tribes (to the tract) Tables repeat for major race groups alone, two or more
races, Hispanic or Latino, White not Hispanic or Latino Geography: block, census tract
http://www.census.gov/population/www/cen2010/glance/files/SF1_Final_1.5_Internet.xls
Census 2010: Product DetailCensus 2010: Product Detail
Summary File 2Detailed tables on age, sex, households, families, relationship to householder, housing units, and group quarters.
Tables are repeated by 141 race groups, 98 American Indian and Alaska Native tribes/tribal groupings, and 39 Hispanic or Latino origin groups.
Where’s all the interesting stuff?
In 2000 (and earlier) censuses, the census used more than one form:
A “short” form, which asked basic demographic data, just like the 2010 census form (AKA – 100% data)
A “long” form, which collected both the items on the short form and a broader set of items about income, education, ancestry, language, disability, employment, etc.. (AKA – sample data)
Now, decennial census focuses solely on basic demographic data, and social and economic data are collected in the American Community Survey (ACS)
Broad Data Collections
American Community Survey Continuous Replacement for the “long form” of the decennial
census. HH sample fully implemented in January 2005,
annual sample of around 3 million. Multi-mode: mail, CATI, CAPI Multiple Data releases
– 1 year, 3 year, 5 year, PUMS
ACS Content - Basic
ACS: Design of the Sample
Annual Sample Size of 3 million addresses Series of Monthly Samples of 250,000 addresses HU sample in each of the 3,141 Counties Areas with smaller populations sampled at higher
rates than those with larger populations HU Address sampling rate set by Block Final sampling rate varies between 1.6% and 10% No HU address can be sampled more than once in 5
years
Distribution Formats
Like former decennial census data, released in both aggregate and microdata formats
Because of change to continuous sampling, however, aggregate data released at different geographic levels with differing collection frames
Demographic (Household) Surveys
Survey of Income and Program Participation Survey of Program Dynamics American Housing Survey Current Population Survey Consumer Expenditure Survey (CES)
And more…..
CCRDC
California Census Research Data Center
The CCRDC is a joint project of the U.S. Census Bureau, UC Berkeley and Stanford
to enable qualified researchers with approved projects to access non-public Census Bureau data .
CES on the web: http://www.census.gov/ces/CCRDC on the web: http://www.ccrdc.ucla.edu/Stanford RDC: https://iriss.stanford.edu/Securedata
I’d love to talk about your data needs
Jon Stiles [email protected]
(510) 664-4157
350F Barrows Hall
http://dlab.Berkeley.edu
Supplementary slides
Behind the scenes:Sampling Frames (Household Surveys)
Master Address File (MAF)– Official inventory of known living quarters– Linked to TIGER
Housing Units– Based on Census 2000 MAF and updates from
the USPS’ Delivery Sequence File Group Quarters
– updates from the administrative records and the FSCPE
Behind the scenes:Sampling Frames (Business Surveys)
Business Register– Census Bureau’s master business list– Industry classification - NAICS– Geographic classification – states, counties, etc.– Legal form & tax status
Establishments– Places of Business
Enterprises– Firms
Behind the scenes:Sampling Frames (Geography/Other)
Topologically Integrated Geographic Encoding and Referencing system (TIGER)
Boundary and Annexation Survey (BAS)– Annual; legally defined geographies
Population Estimates– Based on Vital statistics data, IRS migration,
Medicare enrollment data
ACS: Sample Design
GQ facilities sample for each state Two stratum
– Small (15 or fewer residents)– Large ( more than 15 residents)
Small – Data collected on all residents– Facility eligible once in 5 years
Large– Groups of ten residents sub-sampled– Number of groups determined by size of facility– Facility eligible every year
Supplementary Resources
Population Projections and Estimates
Small Area Income and Poverty Estimates Small Area Health Insurance Estimates
Geographic shapefiles & resources
ACS Content Tests
2006Health InsuranceMarital HistoryVeteran's Service-connected Disability
2007Field of Degree (BA)
2010Computer Ownership-Internet AccessParental Place of Birth
2011-2013Testing of Internet Response mode
Survey of Income and Program Participation
The Survey of Income and Program Participation (SIPP) program, initiated in 1983, is a longitudinal, multi-panel survey primarily of adults in households in the United States.
Sampled households are interviewed at least nine times at four-month intervals and followed over the life of the panel. New samples (panels) are drawn periodically, ranging in size from around 13,000 HHs to around 40,000 HH’s. (annually 1984-1993; 1996, 2001, 2004, 2008)
The SIPP attempts to interview all members age 15 and older in the household during the first wave of interviewing. Subsequent interviews may be in-person or by phone, with the same interviewer speaking to the same respondents.
New members who join the household are interviewed after they join; departing members are interviewed at their new address.
Survey of Income and Program Participation
SIPP information falls into two categories: the core information, and other questions (found in "topical modules") that produce in-depth information on specific subjects and are asked at only one or two interviews.
SIPP core content covers demographic characteristics, work experience, earnings, program participation, transfer income, and asset income.
Current Population Survey
The Current Population Survey (CPS) is a monthly survey of about 50,000 to 65,000 households conducted by the Bureau of the Census for the Bureau of Labor Statistics. The survey has been conducted for more than 50 years.
The CPS is the primary source of information on the labor force characteristics of the U.S. population. The sample is scientifically selected to represent the civilian noninstitutional population.
Households are in the survey eight times: four consecutive months, eight months off, and then a final four months.
Estimates obtained from the CPS include employment, unemployment, earnings, hours of work, and other indicators. They are available by a variety of demographic characteristics including age, sex, race, marital status, and educational attainment. They are also available by occupation, industry, and class of worker.
Supplemental questions to produce estimates on a variety of topics including school enrollment, income, previous work experience, health, employee benefits, and work schedules are also often added to the regular CPS questionnaire.
Current Population Survey
Annual Social and Economic Supplement (ASEC) – (formerly called the Annual Demographic Survey or March Supplement)
Voting and Registration (November) School Enrollment (October) Food Security; every year since 1995 Computer Ownership Fertility and Marital History Fertility and Birth Expectations Contingent Workers and Alternative Employment Displaced Workers Job Tenure and Occupational Mobility Race and Ethnicity Tobacco Use Work Experience Work Schedules
American Housing Survey
Provides information on the size and composition of the housing inventory in the United State, neighborhood characteristics, characteristics of occupants. household characteristics, income, housing and neighborhood quality, housing costs, equipment and fuels, size of housing unit, and recent movers.
The AHS returns to the same housing units year after year to gather data; therefore, this survey is ideal for analyzing the flow of households through housing.
Sample of ~ 65,000 Collected for HUD Separate national (fixed sample for ~50,000, followed since 1985) and
metropolitan samples (~3,200 – 4,800 per area, every 6 years, 12-14 areas/year)
More detailed data, less geographic detail, than census
Consumer Expenditure Survey
The Consumer Expenditure Survey (CES) provides information on the buying habits of American consumers and also furnishes data to support periodic revisions of the Consumer Price Index. A new sample is drawn annually, and includes about 60,000 households.
The survey consists of two separate components: (1) a quarterly Interview Survey in which each consumer unit in the sample is interviewed every three months over a fifteen-month period, and (2) a Diary Survey completed by the sample consumer units for two consecutive one-week periods.
The quarterly interview gathers retrospective data on purchases, and focuses on regular and large expenses.
The Diary Survey contains consumer information on small, frequently-purchased items such as food, beverages, food consumed away from home, gasoline, housekeeping supplies, nonprescription drugs and medical supplies, and personal care products and services. Participants are asked to maintain expense records, or diaries, of all purchases made each day for two consecutive one-week periods.
Selected Other Data
National Crime Victimization Survey– 48,000 addresses in 809 PSU’s in US– Operating since 1972– 7 interviews over 3 ½ year period
National Corrections Reporting Program – Prison Admission and discharges. Variables include incarceration history, current offenses, and
total time served. Background information on individuals includes year of birth, sex, age, race, Hispanic origin, and educational attainment.
A variety of surveys for NCHS, e.g.– National Health Interview Survey– National Hospital Discharge Survey– National Survey of Ambulatory Surgery
National Survey of College Graduates– Baseline Survey based on Census
1993 from 1990 census, 2003 from 2000 census Follow-up surveys every 2 years (4 total per decade)
Geographic grain and Margin of Error
SF2 - Detailed Asian SF2 - Detailed Asian Categories Categories
Asian Japanese Asian Indian Korean Bangladeshi Laotian Bhutanese Malaysian
Burmese Nepalese Cambodian Pakistani Chinese Sri Lankan Chinese, except Taiwanese Thai Taiwanese Vietnamese Filipino Other Asian Hmong Indonesian
SF2 - Detailed Hispanic/Latino CategoriesSF2 - Detailed Hispanic/Latino Categories
Hispanic or Latino(of any race)
Mexican Puerto Rican Cuban Other Hispanic or Latino
DominicanCentral American
Costa Rican Guatemalan Honduran Nicaraguan Panamanian Salvadoran
South American Argentinean
Bolivian Chilean Colombian Ecuadorian Paraguayan Peruvian Uruguayan VenezuelanSpaniardAll other Hispanic or Latino
SF 2 - 42 American Indian CategoriesSF 2 - 42 American Indian Categories
American Indian
Apache Houma South American Indian
Arapaho Iroquois Spanish American Indian
Blackfeet Kiowa Tohono O'Odham
Canadian and French American Indian Lumbee Ute
Central American Indian Menominee Yakama
Cherokee Mexican American Indian Yaqui
Cheyenne Navajo Yuman
Chickasaw Osage American Indian tribes, Other
Chippewa Ottawa
Choctaw Paiute Alaska Native
Colville Pima Alaskan Athabascan
Comanche Potawatomi Aleut
Cree Pueblo Inupiat
Creek Puget Sound Salish Tlingit-Haida
Crow Seminole Tsimshian
Delaware Shoshone Yup'ik
Hopi Sioux