Opening Large Data Sets

30
via flickr: by www.pictobank.com opening large data sets Thursday, March 11, 2010

description

In this presentation, Eric Gundersen shows some real life examples of awesomeness that was achieved by opening up public data sets and making this information widely accessibly and talks about how to do this. This presentation was given as part of the "Building Governmental Transparency" event hosted by the Center for American Progress on Friday, March 19, 2010. More details and video at http://www.americanprogress.org/events/2010/03/sunshine.html.

Transcript of Opening Large Data Sets

Page 1: Opening Large Data Sets

via flickr: by www.pictobank.com

opening large data sets

Thursday, March 11, 2010

Page 2: Opening Large Data Sets

twitter.com/ericgThursday, March 11, 2010

Page 3: Opening Large Data Sets

• Original Polling Center Master list of 6,969 polling centers from the Independent Election Commission (IEC).

• IEC's election prelim results from September 16th, a 2,500 page PDF.

• The Electoral Complaints Commission's (ECC) complaint data (which aggregates only to the provincial level).

Data Sources:

twitter.com/ericgThursday, March 11, 2010

Page 4: Opening Large Data Sets

we needed a data browser

Thursday, March 11, 2010

Page 5: Opening Large Data Sets

www.AfghanistanElectionData.com

Thursday, March 11, 2010

Page 6: Opening Large Data Sets

Thursday, March 11, 2010

Page 7: Opening Large Data Sets

The system geo codes votes down the the district level. The political boundaries for this map covered 400 districts.Density point visualization shows results based on the Highlighted stations criteria, in this case % of stations effected.

Thursday, March 11, 2010

Page 8: Opening Large Data Sets

Complex analysis: This Afghan ethnic distribution base layer is overlaid with districts won by Karzai (red dots) and Abdullah (green dots). Dot size indicates the number of votes. Ethnic data is digitized from the Soviet Atlas Narodov Mira

Thursday, March 11, 2010

Page 9: Opening Large Data Sets

Interacting with the data: you can quickly drill down to any region, as the map zooms.

Thursday, March 11, 2010

Page 10: Opening Large Data Sets

• Percent Population Urban by District Population by District (2003-2004) AIMS CSO Population Statistics.

• Settled Population by Province (2006-2007) Afghanistan Human Development Report 2007, Center for Policy and Human Development, Kabul University

• Estimated votes, via IEC’s Master Polling Center list

Thursday, March 11, 2010

Page 11: Opening Large Data Sets

Population: 22,700Estimated voters: 53,039

Difference: 30,339

Thursday, March 11, 2010

Page 12: Opening Large Data Sets

Total votes: 15,023

Thursday, March 11, 2010

Page 13: Opening Large Data Sets

Drill down in context: “Highlighted Station” selection continues to work within both provinces + districts

Thursday, March 11, 2010

Page 14: Opening Large Data Sets

Per polling center data: see the affected stations and votes within a polling center

Thursday, March 11, 2010

Page 15: Opening Large Data Sets

Thursday, March 11, 2010

Page 16: Opening Large Data Sets

Thursday, March 11, 2010

Page 17: Opening Large Data Sets

Thursday, March 11, 2010

Page 18: Opening Large Data Sets

photo credit boston.com

security matters

Thursday, March 11, 2010

Page 19: Opening Large Data Sets

Thursday, March 11, 2010

Page 20: Opening Large Data Sets

Thursday, March 11, 2010

Page 21: Opening Large Data Sets

via flickr: by www.pictobank.com

geography matters

Thursday, March 11, 2010

Page 22: Opening Large Data Sets

Thursday, March 11, 2010

Page 23: Opening Large Data Sets

Thursday, March 11, 2010

Page 24: Opening Large Data Sets

Road data: OSM provides better street data than AIMSThursday, March 11, 2010

Page 25: Opening Large Data Sets

Thursday, March 11, 2010

Page 26: Opening Large Data Sets

twitter.com/ericgvia wikimedia.orgThursday, March 11, 2010

Page 27: Opening Large Data Sets

Thursday, March 11, 2010

Page 28: Opening Large Data Sets

Snow line: 1,800 meters according to FAO Thursday, March 11, 2010

Page 29: Opening Large Data Sets

Thursday, March 11, 2010

Page 30: Opening Large Data Sets

• Elevation information is from the SRTM (Shuttle Radar Topography Mission)

• Road information from OpenStreetMap

• Provincial and district data are from AIMS (Afghanistan Information Management Services)

Map Data Sources:

Thursday, March 11, 2010