Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT...
-
Upload
caitlin-montgomery -
Category
Documents
-
view
216 -
download
0
description
Transcript of Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT...
Transparency and Inference for Big Data
1
Workshop Overview: Transparency and Inference for Big Data
Micah AltmanDirector of Research
MIT Libraries
Prepared for
Census-MIT Big Data Workshop Series
MITDecember 2015
Transparency and Inference for Big Data2
Roadmap Workshop series:
Challenges of big data forofficial statistics
What to expect today and tomorrow Big Data
Challenges
Acquisition
Access
Governance
Protection
Analysis
Transparency and Inference for Big Data3
Credits&
Disclaimers
Transparency and Inference for Big Data4
DISCLAIMERThese opinions are my own, they are not the opinions of MIT, Brookings, any of the project funders, nor (with the exception of co-authored previously published work) my collaborators
Secondary disclaimer:
“It’s tough to make predictions, especially about the future!”
-- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi, Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx,
Dan Quayle, George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White, etc.
Transparency and Inference for Big Data5
Collaborators & Co-Conspirators
Workshop Series Organizers US Census
Cavan Capps, Ron Prevost
MIT Micah Altman
Workshop Co-Organizers (US Census) Peter Miller Benjamin Reist Michael Thieme
Research Support Supported by the U.S. Census Bureau
Transparency and Inference for Big Data6
Related Work
Main Project: Census-MIT Big Data Workshop Series
projects.informatics.mit.edu/bigdataworkshops Related publications:(Reprints available from: informatics.mit.edu ) Altman M, Capps C, Prevost R. Using New Forms of Information for Official Economic Statistics --
Examining the Commodity Flow Survey: Executive Summary from the 1rst Workshop in the MIT Big Data Workshop Series. SSN: Social Science Research Network [Internet]. Working Paper.
Altman, M., D. O’Brien, S. Vadhan, A. Wood. 2014. “Big Data Study: Request for Information.”
Altman, M Altman M, Wood A, O'Brien D, Gasser M., Vadhan, S. Towards a Modern Approach to Privacy-Aware Government Data Releases. Berkeley Journal of Technology Law. Forthcoming.
Altman M, McDonald MP. 2014. Public Participation GIS : The Case of Redistricting. Proceedings of the 47th Annual Hawaii International Conference on Systems Science .
Transparency and Inference for Big Data7
Online… Website
projects.informatics.mit.edu/bigdataworkshops
Twitter Hashtag #cmbigdata
E-mail Micah Altman: [email protected] Cavan Capps: [email protected] Ron Prevost: [email protected]
Transparency and Inference for Big Data8
Workshop Series:Big Data and
Official Statistics
Transparency and Inference for Big Data9
Trends and Challenges Trends
Increasingly data-driven economy Individuals are increasingly mobile Technology changes data uses Stakeholder expectations are changing Agency budgets and staffing remain flat.
The next generation of official statistics Utilize broad sources of information Increase granularity, detail, and timeliness Reduce cost & burden Maintain confidentiality and security
Multi-disciplinary challenges : Computation, Statistics, Informatics, Social Science, Policy
Transparency and Inference for Big Data10
Workshops and OutcomesAcquisition ChallengesUsing New forms of Information for Official Economic Statistics[August 3-4]
Privacy ChallengesLocation Confidentiality and Official Surveys [November 30-Dec 1]
Inference ChallengesTransparency and Inference
[December 7-8]
Expected outcomes:
Workshop reports (September, January)
Integrated white paper(February)
Identifying new opportunities for statistical agencies
Inform the Census Big Data Research Program.
Transparency and Inference for Big Data11
Themes from Workshop 1: Big Data Sources Broad new sources of information have the potential to enhance official
statistics increased granularity & detail increased timeliness reduced burdens
Incorporating big data creates challenges Acquisition challenges Management, confidentiality and governance challenges Analytic challenges
Incorporating big data into statistical agencies will require adaptation: Agencies will need to broaden from data collection to information provisioning. Agencies will require different sources of data to support different types of
decisions. Agencies will need to develop more extensive relationships with business
stakeholders. Agencies have the potential to take on new roles with respect to big data source,
as… standards leaders certification authorities clearinghouses infrastructure for durable, trusted access
Transparency and Inference for Big Data12
Themes from Workshop 2: Big Data Privacy Value of Census Reputation
Reputation to census is a primary concern Reputation affects willingness to participate cost of participation Reliability & transparency is needed for official statistics to serve their policy
purpose To ensure accountability of process and programs To create a public data good – where results can be accepted across multiple sectors To support reliable inferences for a range of purposes
Consider data needs in terms of computations Source of big data may not be willing to distribute data directly Sources of big data may not be able to distribute all data directly – typically
internally distributed and reaggregated Access through computation
Custom / private API’s could provide the analytics needed Where privacy and security are challenges Secure Multi-Party Computing methods could
be used in place of trusted systems Characterizing risks and harms
official statistics reflect an implicit harm/benefit balance –although not legally framed explicitly
need to move from binary measures (identification) to formal measures census could be a leader -- Many countries/industries/states use aggregation or
suppression with no formal risk/harm characterization
Transparency and Inference for Big Data13
What to ExpectToday,
Tomorrow,& Beyond
Transparency and Inference for Big Data14
Workshop ScheduleMonday12:00 Lunch and Introductions1:00 Workshop Overview 1:15 Overview of SIPP 2:15 Overview of Census Needs for Reliable and
Transparent Inference3:00 Coffee3:30 Preliminary Discussion of Workshop Questions4:00 Challenges in Extracting Information from Big
Data 4:45 Transparency Challenges5:15 Discussion & Provocations6:00 Transportation to Hotel 7-10 Hosted Dinner
Tuesday 8:30 Breakfast9:00 Recap / Review of Days 9:15 Overview of Census Uses – Implications for
Inference 10:15 Discussion: Key Challenges and
Opportunities11:30 Lunch1:00 Emerging Approaches to Using Big Data
in Official Statistics2:00 Discussion: Potential approaches to
reliable, transparent & reproducible inference with Big Data
3:00 Coffee3:30 Synthesis and next steps4:30 Taxis leave for airport5:00 (Optional) Beer/snacks and informal
chat for those staying over in Boston
Transparency and Inference for Big Data15
Workshop Questions What are the errors and
biases in the collection, cleaning, editing, assembly, linking and other operations that affect Big Data utility?
How can bias, construct validity, and reliability be measured and evaluated?
What methods are most promising for discovering relationships that are substantively interesting, statistically reliable, and causally plausible?
What are methods for
ensuring transparency and replicability with big data sources? How do we detect dependencies among data sources?
How can the integrity and authenticity, of official statistics be maintained when integrating big data from outside sources?
How should we assess the quality of Big Data information for different official statistics uses?
Transparency and Inference for Big Data16
Use Cases Survey of Income and Program
Participation
Use cases may focus discussion – they should not limit discussion
Transparency and Inference for Big Data17
What will be Shared Chatham-House Rules
When a meeting, or part thereof, is held under the Chatham House Rule, participants are free to use the information received, but neither the identity nor the affiliation of the speakers, nor that of any other participant, may be revealed.
Please do not name individuals or companies in social media, etc. What’s Public
Ideas/information shared(We will be taking notes and recording – but only for summary reports)
Formal presentations Attendance & Participant List (unless opted-out) Attribution – when requested/verified (opt-in)
Future Outputs Draft summary report from workshop [December]
Circulated to participants for comments Public Summary of Report [January]
Including corrections and attribution where requested White Paper – Series Summary & Synthesis [February]
To appear on project site
Transparency and Inference for Big Data18
Suggested Readings Reimsbach-Kounatze, C. (2015), “The Proliferation of
“Big Data” and Implications for Official Statistics and Statistical Agencies: A Preliminary Analysis”, OECD Digital Economy Papers, No. 245, OECD Publishing.
Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. “The Parable of Google Flu: Traps in Big Data Analysis.” Science 343 (14 March): 1203-1205. Copy at http://j.mp/1ii4ETo
Kreuter, Frauke, Marcus Berg, Paul Biemer, Paul Decker, Cliff Lampe, Julia Lane, Cathy O'Neil, and Abe Usher. AAPOR Report on Big Data. No. 4eb9b798fd5b42a8b53a9249c7661dd8. Mathematica Policy Research, 2015.
NRC, 2013, Frontiers in Massive Data Analysis, National Academies Press.
19
Questions?E-mail: [email protected]
Web: informatics.mit.edu
Transparency and Inference for Big Data
Transparency and Inference for Big Data20
Creative Commons License
This work. by Micah Altman (http://redistricting.info) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.