Digging into Census Data to Find Good Stories

download Digging into Census Data  to Find Good Stories

If you can't read please download the document

description

Digging into Census Data to Find Good Stories. Sean Lahman – [email protected] Rochester Democrat & Chronicle SeanLahman.com/resources. Digging for Stories. Computer Assisted Reporting. - PowerPoint PPT Presentation

Transcript of Digging into Census Data to Find Good Stories

  • SEAN LAHMAN [email protected] ROCHESTER DEMOCRAT & CHRONICLESEANLAHMAN.COM/RESOURCESDigging into Census Data to Find Good Stories

  • Digging for Stories

  • Computer Assisted ReportingJournalists need to be data-savvy. It used to be that you would get stories by chatting to people in bars, and it still might be that you'll do it that way some times.

    But now it's also going to be about poring over data and equipping yourself with the tools to analyse it and picking out what's interesting.

    London, November 2010Tim Berners-LeeInventor of the World Wide Web

  • Overview of the U.S. Census

  • Since 1790

  • 10 Questions in 10 MinutesHow many people live here? Age, gender, race of each Home ownership?

  • Data Release ScheduleDec 2010Totals for all statesFebruary & March - Redistricting data for each stateMay Demographic profilesJune-August Summary File 1Late 2011 / early 2012 Summary File 2

  • The End

  • Shortest. Census. Form. Ever.

    The more questions the Census Bureau asked, the fewer responses they received.

  • American Community SurveySamples roughly 3 million people per year 1-year, 3-year, and 5-year data released 65,000 & 20,000 population sizes

  • Census data is really many data setsDecennial CensusAmerican Community SurveySmall Area Income and Poverty EstimatesAmerican Housing SurveyConsumer Expenditure SurveyPublic Use Microdata SamplesSurvey of Construction and more

  • Levels of ComplexityAgeGenderRaceLocation

  • Geographical Groupings

  • Census Tracts

  • 1790 Map

  • Getting the Datahttp://www2.census.gov/census_2010/

  • American Fact Finder

  • AFF2

  • AFF3

  • AFF4

  • AFF5

  • AFF - new

  • AFF-new1

  • AFF-new2

  • AFF-new3

  • DataFerretthttp://thedataweb.org/ or http://dataferrett.census.gov

  • DF1

  • IRE

  • Other ResourcesNew York State Data Center http://esd.ny.gov/Lewis Mumford Center - University of Albany http://www.albany.edu/mumford/ University of Virginia http://mapserver.lib.virginia.edu/NYC Department of City Planning http://gis.nyc.gov/dcp/pa/address.jspUniversity of Michigan http://www.icpsr.umich.edu/icpsrweb/ICPSR/University of Minnesota (PUMS) http://usa.ipums.org/usa/

  • Finding Story Ideas"People don't really tell you this, but often the amount of time you spend finding a decent story is more than the time it takes to produce the story. -- Ira Glass

  • Civics for JournalistsWho is your audience?ContextNeed reliable numbers other than the Census Bureau

  • RedistrictingRedistricting is an intensely political processThere is a vocabulary to redistrictingSection 5 GerrymanderingHuge opportunity to share historical info with readersCommunities of Interest

  • PopulationUrban sprawl, neighborhood decay, suburban flightPopulation density mapsGrowth/decline of population in neighborhoodsAverage household size (changes, comparisons between neighborhoods)

  • Age and Gender(More detailed age breakdowns available in May)Graying or seeing a baby boom?Concentrations of older residents vs. childrenGender differences in neighborhoodsGenerational conflicts (mixing of generations)

  • Race and Ethnicity(Top level race data in first release, more detail in summer release)Look at race breakdowns by age (growth of minority population tends to be bottom up)Surge in Hispanic population is a national story (14 to 35 million from 1980-2000)Segregation & diversity (see the Mumford Center at SUNY Albany, or USAT diversity index)

  • RelationshipsTypes of households (available in may)Seniors living aloneGrandchildren living in householdConsider mapping same-sex couples, single mothers

  • HousingVacant housing units: what is the trend, and where?Owner-occupied vs. rentalsHow many people own their homes free and clear? Where are they?Foreclosures where and how many?

  • Other IdeasRedistricting on the local level Compare racial composition of school population with citizens 18+ (voters)New ACS variables: healthcare coverage, degree field, Veterans disability, marital historyHousing double-up as an indicator of economic impact

  • New York IdeasLook at the extent to which the recession, mortgage meltdown slowed the migration away from NY. NY an example of state that is aging, but also seeing a decline in child population (
  • Demographic ResourcesOur Patchwork Nation Dante Chinni Identified clusters based on income, education, population growth, unemployment, foreclosure rates, Starbucks vs. Wal-MartWho We Are Now Sam RobertsThe Big Sort Bill BishopA Field Guide to Sprawl Dolores Hayden

  • American Community Survey

  • American Community Survey

  • American Community Survey

  • American Community Survey

  • American Community Survey

  • Data Visualization

  • Shan CarterData Visualization

  • Data Visualization as a new medium

  • Protovis-1

  • Protovis-2

  • Science Fair Poster

  • Science Fair Poster 2

  • Martini Glass

  • Data Slide Show

  • Drill Down Story

  • Another Drill Down

  • Not Just for End Users

  • Demographic ExplosionCrime dataCampaign finance Voter registrationPublic payrollsStudent test scores

  • Netflixhttp://www.nytimes.com/interactive/2010/01/10/nyregion/ 20100110-netflix-map.html

  • Mapping ToolsAdobe FlashGoogle Maps APIMAPublisherGEOCommonsSocial ExplorerCensus Bureau Thematic and Reference Maps

  • Adobe FlashDes Moines RegisterJames Wilkerson

    http://data.desmoinesregister.com/dmr/iowa-census/redistricting-map/

  • MAPublisher with IllustratorTampa Tribune Kevin Wiatrowski

    http://www2.tbo.com/static/maps/tbocom-special-report-data-maps-map-builder/

  • Google Maps API

  • GeoCommons

  • Social Explorerhttp://SocialExplorer.com

  • Census Bureau Maps Tool

  • Digging for Stories

  • QuestionsSeanLahman.com/resources

    *There are a lot of movies about what its like being a reporter. For me and folks of my generation, All the Presidents Men was the seminal film.

    Its about a young reporter sent to cover a barely newsworthy court case that turned into a story that toppled a presidency and changed journalism: What Woodward and Bernstein really managed to do was reconcile two professional imperatives that journalists hold dear but that usually seem to be at odds: objectivity and muckraking.

    What Bob Woodward and Carl Bernstein did to pursue their story: Working the phones, pounding the pavement, and developing anonymous sources

    Those are still important skills today, but you need more.

    *CAR was once considered a specialty, but its becoming an important tool set for all reporters

    While newspaper business models have struggled, journalism has been in a state of huge transformation equal to the transformative impact of Woodward and Bernstein.

    Spread of Web has helped make massive amounts of public data accessible

    More of a need for analysis, context, Data mashups, maps, interactivesDeliver content that isnt just articles.

    Census data is just one example of those large public data sets

    Many of the tools and strategies we talk about here today have broader applications

    *So lets start with a quick overview of the US Census, which is in essence simply a headcount. Its not a survey. Not an estimate. But a concerted effort at making an actual count of every single person living in the United States. *The Census is required by the US Constitution (Article I, Section 2 right near the top), and it spells out the two reasons why the Census is required: for figuring out the number of representatives each state has in Congress what we call redistricting and for divvying up tax money or reapportionment.*The process has been refined over 220 years, and now is fairly straight forward. The one-page census form gets mailed to every household in the United States, and if folks dont send the form back, Census workers go and knock on the door. The form asks just a few basic questions: who lives here? How old are they? Whats their gender and race? And do you own or rent the place? Those ten questions give them all of the info they need to compile the detailed reports the Constitution requires and thats the end of the Census process*Wrap up by talking about the staggered schedule for release of data

    Dec 2010Totals for all states

    February & March - Redistricting data for each stateall geographic levelsabout a weeks notice before releases, no embargototals, adults totals for race/hispanic groupsone housing occupancy table: vacant or occupiedmostly raw data

    MayDemographic profiles and group quartersAge, race, relationship, household, rent/ownmany geographic levelspossible media embargomostly PDFs

    June-August: Summary File 1most of restfull breakdown on everythingall geographic levels250 tables, ~330 with racial versionsmostly data files

    Late 2011/early 2012: Summary File 2SF1 repeated separately for race/ethnic groupsall geographic levelsdata files

    *except this just signals the beginning of what is likely to be a long and nasty political fight over redistricting. In NY and elsewhere, the process becomes a protracted legal battle to redraw Congressional districts, state legislature districts, and even local districts (such as County legislatures, or city council districts). And its not simply about drawing new lines on a map, because for folks in New York, it means drawing those lines into fewer districts.

    That process leads to things like this monstrosity, New Yorks 28th district. Upstate lost a congressional seat after the 2000 Census, and the process led to this which folks call the ear muff because of its shape. It encompasses the city of Rochester, some sparsely populated rural areas along the coast, and the northern suburbs of Buffalo. It also includes the Rochester suburb of Fairport. Why? Because thats where long-time congress woman Louise Slaughter lives.

    Sometime within the next year, theyll draw some new districts and I cant begin to imagine what theyll look like but I do know one thing two of our current congressional representatives are going to see their seat disappear.

    In New York and Ohio, it means a loss of 2 seats. Texas was the biggest winner with a net gain of four. Florida will add 2, and 14 other states will gain or lose one seat each. (8 states lost 1, 6 gained 1)

    *So the Census form asks just 10 questions, and the process yields only very basic information. Its supposed to be the information age, so why so little data being collected?

    After the 1990 Census, it became clear that there was a lot of resistance to filling out the so-called long form. Federal agencies and state governments were calling on the Census Bureau to provide them with demographic information, but that became more difficult as the response rate kept dropping.

    They started testing a survey program in 1996, and the first data sets were released in 2001. This program was called the American Community Survey, and its the source of most of the data that we talk about when we talk about Census data.

    *ACS Surveys are sent to a representative sample, roughly one percent of the population each year.

    Asks 69 detailed questions about: housing, income, family relationships, job status, household expenses, education, languages spoken, migration, health insurance, marital history, veteran status, disabilities, where you work and how you get there, etc.

    Most useful for percentages, not raw counts (a survey)Almost infinite ways to slice it: 600-800 tables for each geography

    For areas that are larger than 65,000 people, the sample size is enough to produce meaningful results every year. For areas between 20 & 65,000, it takes 3 years to get meaningful numbers, and for areas smaller than 20,000, its five years.

    will talk more about the ACS later.

    *Also collect data for other agencies such as the Bureau of Labor Statistics, the Social Security Administration, National Center for Education Statistics, and others.

    Constantly updating the numbers, improving the processes, and adding new data points (new questions).

    The Census Bureau is a warehouse for data, but not just raw data. The Census Bureau releases white papers and other reports, and have a large staff of demographers and statisticians willing to speak to these issues on the record.

    *Its also important to understand that there are levels of complexity to the data, and each additional question on a Census form or each data point in a survey creates additional permutations.

    By asking three basic questions age, gender and race we can give a count of how many males and females there are, how many people are over and under 18, and how many people identify with each racial group. But then you can start to combine those elements and find out how many men there are over the age of 65, how many Asian men over the age of 65, etc.

    Then you add in the element of geography, and all of a sudden you have something pretty powerful. Now you can find out how many Asian men over the age of 65 live in Rochester compared to Syracuse.*Geography. Most important grouping.

    Nation, State, County, Towns, Census Tracts But also School districts, legislative districts, metropolitan areas, etc.

    Each household exists within many different geographical groupings.*Census tracts were drawn in 1940s to represent areas with roughly the same number of people (4-6000). Were designed to be homogeneous with respect to population characteristics, economic status, and living conditions. Over time there have been demographics changes, most have grown. Some have been split over time.

    Smallest unit of measure, which can be very useful, but could also subject you to noise, depending on your region.

    *Bear in mind that geographies change over time when comparing one set of data to another.

    New York had 15 counties in 1790.

    *Now that youve had a crash course in the Census, how do you get at the data?

    For day one releases right to the source. But thats raw data, and not easy to use.*Best way is to go right to the source. Their tool is called the American Fact Finder.

    Not going to do a full tutorial, but just a quick look at how you find and download data.

    Two versions: will show you both. Right now, some data sets only accessible through old version, and some from new version

    Factfinder.Census.gov or from front page of Census.gov

    Lefthand nav, select deccenial census, ACS or other datasets, then come over here and select detailed tables*Remember geographies

    Geography is first thing you have to pick

    choose County Subdivisions which means cities and towns within a county.

    For this example I chose New York, Monroe county, and All subdiuvisions

    *Once weve chosen a geography, you need to pick a report.

    Choose B01001: Sex by Age

    Hundreds of canned tables from Census for each geography*Here you see a preview of what the data looks like

    Columns for three towns.. Brighton, Chili, and Clarkson

    Can choose download from menu*Download it in a variety of formats*Heres the newer version , released in 2011.

    Youll need to use this to get the 2010 Census data

    *Instead of starting with a geography, we start with a topic.Click on topics, and it shows you a list of applicable data tables*Select total population, and I selected NJ since NY wasnt out when I created this slide show

    Breakdown of population by race for each town in NJ*Select download, and as with the other version, several options for file formats.

    Either version of FactFinder offers:

    Access to raw dataRequires some demographic knowledge to get at the right dataRequires some database/analytical skill to do something with itMerging different sets or different tables (longitudinal) could be challenging

    *Can be tough to use, but lets you merge data from different sourcesSearch tools help you narrow in what youre looking forGood tutorials, Toll Free Help DeskTools to create basic maps, and other web ready output

    **Other sources that take the census data and create ready to use data tables

    USA Today is doing this for Gannett papers, and theyre sharing that data for others through IRE.

    Other collectives that are doing this sort of thing.*Mumford center urban studiesUVA historical census browser, lets you do longitudinal studiesMich Inter-University Consortium for Political Research (collection of half million sets of social science research data)Minn - PUMS Public Use Microdata Series samples of actual individual responses*Weve talked about where to get data. How do we find stories in the data?

    This quote from Ira Glass, one of the great storytellers of this generation, is something Ive definitely found to be true in my own personal experience. Cant tell you how many story ideas Ive had that didnt amount to anything, or turned out not to be as interesting as I thought. And conversely, the story that I wrote that got the most traction last year was one I initially thought wasnt worthy of reporting at all.

    The process of investigating data means youll do a lot of work before you start writing, maybe before you even know what youll be writing about. Those of us who do database reporting do a lot of experimentation, and that means a lot of false starts and dead ends.*Start with the Census process, because thats a story unto itself.

    Who is your audience?Are they politically sophisticated or have they not heard about this subject since high school

    2) ContextHow do you enrich raw data to tell a story? A number in isolation is meaningless Juxtapose it with another number that gives it meaning and dont just compare it to previous 3) Need reliable numbers other than the census bureau the numbers are massaged need sources to tell us why these numbers are wrong each state has a political class (lawyers, consultants) who will fight the battlestate level resourcesdemographers (at local universities)academics (for high concept, historical sense)politicianspolitical reporters

    *Section 5- under Voting Rights Act , some areas need DOJ approval of redistricting (includes 3 counties in NY). Bronx, Kings (Brooklyn), New York (Manhattan)

    Gerrymandering , the deliberate manipulation of districts for electoral advantage, happens whenever the process is controlled by legislature some states have skewed towards one party (Tex, Mich, Pa, Fla) which has national implications. Others (like NY, NJ, and Calif) skew towards preserving incumbents., which reduces the number of competitive districts.

    How many incumbents were defeated in 2010 (5 of 28 in NY, 56 of 435 nationally) 55 of 61 re-elected to State Senate, )

    COMMUNITIES OF INTEREST identify the things that folks have in common *politically*What issues will be affected

    *Census tract maps**Diversity: Philip Meyer and later Paul Overberg, uses the traditional breakdown. You can find an explanation here:http://www.unc.edu/~pmeyer/carstat/tools.html#updating

    Or Ron Campbell of the Orange County Register: http://www.ocregister.com/articles/index-291897-squared-dissimilarity.html****Domestic migration rate over last 3 years is lowest since WWII *Look outside the Census data for story ideas

    Who We Are Now big picture

    The Big Sort people cluster into self-segregated neighborhoods*Lets take a look at a few examples of data sets that you can from the Census Bureau, specifically from the American Community Survey. Just to give you a smattering of whats available, and the kinds of stories that you might draw from them.

    Born in NY, Born in Other State, Native born outside US, Foreign born*Geographic mobility**The Census Bureau defines a linguistically isolated household as one in which no one 14 years old and over speaks only English or speaks a non-English language and speaks English "very well." In other words, all members of the household 14 years old and over have at least some difficulty with English.

    **Those are just a half-dozen examples, and there are hundreds more. You could pull a different data table down every day and write a new story and not run out until the summer of 2012. But I have to implore you not to do that, not to simply take a data table like this one and regurgitate it.*The last thing the internet needs is more data. What it needs, what our readers need, is more analysis. Dont just show them patterns, explain them. Just because something is complicated does not mean it is interesting. *So Id like to spend a few minutes talking about data visualization. Thats an important part of digging for stories, for reasons which I hope will become clear if they arent already.

    If you havent seen this documentary, I urge you to do so. Its called Journalism in the Age of Data, and it was produced by Geoff McGhee. (the URL is datavisualization.stanford.edu) Geoff used to be a reporter for the NY Times, and for ABC News and LeMonde in Paris. He won a Pulitzer Prize in 2006 for public service journalism, but he was really interested in studying the methods that journalists were using to share data in innovative ways. He left the Times and went to Stanford to do this research, and this film was the result of that work. This 50-minute film serves as a great introduction to what Geoff sees as a new storytelling medium, and hes not alone.*Heres a basic example of an interactive graphic using Census data, showing population for Hillsborough County Florida broken down by age and gender.*And heres a much more complex data visualization, which overlays crime reports in near real-time over a map of the city of Oakland. Both this map and the Hillsborough county population chart were created with an off the shelf tool called Protovis.

    Protovis has become very popular with news organizations, maybe the most popular data viz tool, and because its an open source toolkit, you can take an example like this, borrow it, and pour in your own data.*When we talk about data visualization, you may think thats the domain of graphic artists and for a long time it has been. Theyre great at producing what I like to call the Science Fair Poster, a collection of charts and images and text all thrown together, like this one from the NY Times. These help to prove the old adage about a picture being worth a thousand words, or in this case maybe ten pictures being worth ten thousand words.*Heres another example using Census Data. Its a chart from the New York Times that looks at the subject of interracial marriages using raw data, line charts, bubble charts, and maps.

    When we move online the Science Fair Poster is a less useful model because of space constraints. But the web offers a more important element: interactivity.

    *This is an interactive that was made popular by Hans Rosling, who went viral with a YouTube video demonstrating this chart in 3-D. It shows the relationship between income (across the bottom of the chart) and life expectancy (on the vertical axis). Each circle represents a different country, and the size is based on that countrys population. And the color coding corresponds to what continent the country is on. Blue is sub-saharan Africa, yellow is the Americas, and so on.

    This is an example of what we call a Martini Glass graphic. It starts with a tight narrative path early (like the stem of the glass) and opens later for a free exploration (like the body of the glass)*Then theres the data slide show, which is a merger of the Science Fair Poster and Martini glass. Its basically a set of static images which work okay on their own, but tell a story as part of a series.*Whats becoming more and more common is this sort of thing, which we call a drill down story.

    Starts with a large amount of data at once and lets you explore. The Oakland Crimespotting site was a great example of that. Heres another from the Washington Post, which created some search tools and a treemap to interface a database of the presidents schedule.

    The Guardian used the same approach when they built an interface for the WikiLeaks release of US embassy cables: http://www.guardian.co.uk/world/interactive/2010/nov/28/us-embassy-cables-wikileaks

    (Also see The Jobless Rate for People Like You from the NY Times: http://www.nytimes.com/interactive/2009/11/06/business/economy/unemployment-lines.html)**Not just about creating interactives for readers as it is finding compelling ways to tell stories. And creating maps and charts yourself is a great way to find stories in the data. Even if you dont end up showing those to readers. In my opinion, its the best way to distinguish yourself from reporters at other outlets who are just regurgitating those flat tables. It makes a real difference to your readers when you can go beyond the numbers and find something thats relevant to their day-to-day lives.*There has been an explosion of demographic data in the last 10 years, more info available than at any time in human history. And so while the focus of what were talking about here today is Census data, keep in mind that there are other massive datasets out there that can be mined for good stories. And many of the tools and techniques weve been talking about apply just as well.**Quick survey of the most complex to the simplest mapping tools.*******Back to these guys a lot has changed since the Washington Post in 1972

    Better access to data, great tools, bigger audience for analytical storytelling

    Use those things in your Census coverage, you can make a difference with your reporting**