How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

42
How can Big Data contribute to the Open Data process? DANE Big Data Event , Bogotá, October 2013 1

Transcript of How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

Page 1: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

1

How can Big Data contribute to the Open Data process?

DANE Big Data Event , Bogotá, October 2013

Page 2: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 2

1. What is “Big Data”?

2. Big Data sources, challenges & opportunities

3. Big Data and official statistics

4. Current Big Data initiatives

5. What is “Open Data” ?

6. Current Open Data initiatives

7. Incorporating Big Data into official statistics Open Data programmes - the OECD perspective

Presentation contents

Page 3: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 3

DATABIG

Page 4: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 4

Big data are data sources that can be –generally– described as: “high volume, velocity and variety of data that demand cost-effective, innovative forms of processing for enhanced insight and decision making.”

Gartner

• Big data is characterized as data sets of increasing volume, velocity and variety

• Big data is often largely unstructured, meaning that it has no pre-defined data model and/or does not fit well into conventional relational databases

• private sector may take advantage of the Big data era and produce more and more statistics that attempt to beat official statistics on timeliness and relevance

What is “Big Data”?

Page 5: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 5

What is “Big Data”? The data deluge

Page 6: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 6

New wealth of digital data - 90% of the world’s digital data has been created in just the last two years and is doubling every 20 months.

Big data comes in a number of forms: • “Data Exhaust” collected passively from devices (phones, credit cards, web searches etc) as sensors of human behaviour

• Online information (blogs, twitters, news articles...) sensors of human sentiments

• Physical sensors (pollution, light emission etc) remote sensors of human activity

• Citizen reporting – information actively produced via phone-surveys, hotlines etc

What is “Big Data”? The data deluge

Page 7: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 7

• Administrative (electronic medical records, hospital visits, insurance records, bank records, food banks, etc.)

• Commercial/Transactional (credit card transactions, on-line transactions, etc.)

• Sensors (satellite imaging, road sensors, climate sensors, etc.)

• Tracking devices (mobile telephones, GPS, etc.)

• Behavioural (online searches, online page view, etc.)

• Opinion (comments on social media, etc.)

Big Data - Sources

Page 8: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 8

• Legislative - with respect to the access and use of data.

• Privacy - managing public trust and acceptance of data re-use and its link to other sources.

• Financial - potential costs of sourcing data vs. benefits.

• Management - policies and directives about the management and protection of the data.

• Methodological - data quality and suitability of statistical methods.

• Technological - issues related to information technology.

Big Data - Challenges

Page 9: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 9

• Collecting data in real time or near real time maximize the potential of data

• big data has potential as an input for official statistics; either for use on its own, or in combination with more traditional data sources such as sample surveys and administrative registers

• Big data has the potential to produce more relevant and timely statistics than traditional sources of official statistics

• By incorporating relevant Big data sources into their official statistics process NSOs are best positioned to measure their accuracy

Big Data - Opportunities

Page 10: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 10

 

1. NSO as brokers of Big Data?

2. NSO to provide “Quality Stamp” ?

3. Combining Big data with official statistics

4. Replacing official statistics by Big data

5. Filling new data gaps, i.e. developing new 'Big data - based' measurements to address emerging phenomena (not known in advance or for which traditional approaches are not feasible)

6. Visualization methods

7. Text mining

8. High Performance Computing.

Big Data Opportunities - areas for experimentation

Page 11: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 11

“What does Big Data mean for official statistics?”

Big Data and Official Statistics

Page 12: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 12

Big Data and Official Statistics

Page 13: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 13

Statistical organisations are encouraged to address formally Big data issues in their annual and multi-annual work programmes by: • undertaking research and pilot projects

in selected areas • allocating appropriate resources for that

purpose.

Big Data & Official Statistics

Page 14: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 14

• Collaboration of NSOs with private data source owners is of critical importance and it touches upon sensitive issues such as privacy, trust and corporate competitiveness, as well as the legislation framework of the NSOs.

• To use Big data, statisticians are needed with a different mind-set and new skills. The processing of more and more data for official statistics requires statistically aware people with an analytical mind-set, an affinity for IT and a determination to extract valuable ‘knowledge’ from data : “Data scientists” (Quote stats from US)

• NSOs should develop the necessary internal analytical capability through specialised training.

Big Data & Official Statistics

Page 15: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 15

Example: Twitter used an algorithm which could perceive the difference between actual sickness and usage of the common word ‘sick’, researchers were able to plot and predict when people from a certain area were at risk of picking up a flu bug

Big Data – Examples

Page 16: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 16

Big Data – Examples

Page 17: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 17

Real Estate• Collect real time real estate data, incl. location, product characteristics and price

information

• Gather data by extracting information from Real estate sites ads in major agglomerations

• Collection mechanism: Search engine with semantic capability collects and structures data

• Data analysis with existing statistical tools to produce aggregated indicators

• Enrich the GOV metropolitan database with the compiled indicators

Traffic • Collect a sample of data tracking movements of mobile users over a territory

• Compile from that data and create transportation performance indicators (especially: reliability)

• Including evolution over time

• Collection mechanism: Sample of #100K users across several countries who downloaded an App on mobile access quality

• Data analysis with existing statistical tools to produce aggregated indicators

Big Data – Examples (OECD)

Page 18: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 18

Political tension • Produce indicators on Political tension for African Economic Outlook

• Collect a sample of qualitative data (articles, …) qualifying political tension in African countries. Based on keywords: strike, demonstration, kidnapping,…

• Text mining on countries or topics could raise interest as part of the data collection process

Employment• Supplement survey data on employment with job offerings and applications

collected from the Internet

• Building indicators by analysing legal documents (labour codes, labour legislation, court judgements, ….)

Internet• Quality of Internet network infrastructure and security

• Ranking of languages most used on the Internet

• Study the effectiveness of current intellectual property protection laws

Big Data – Examples (OECD)

Page 19: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 19

• Objectives– Collect a sample of data tracking movements of mobile users over a

territory

– Compile from that data and create transportation performance indicators (especially: reliability)

– Including evolution over time

• Proof of concept envisaged– Solution provider identified (Sensorly, start-up specialised in mobile

data)

– Privacy should not be an issue since Sensorly collects data based on an opt-in mechanism and does not rely on mobile operators data.

– Collection mechanism: Sample of #100K users across several countries who downloaded an App on mobile access quality

– Data analysis with existing statistical tools to produce aggregated indicators

Big Data – Examples (OECD)

Page 20: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 20

Big Data – Examples (MIT)

Page 21: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 21

Big Data – Examples (Health data)

Page 22: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 22

Any questions?

Page 23: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 23

DATAOPEN

Page 24: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 24

From wikipedia:• Open data is the idea that data

 should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control.

What is “Open Data”?

Page 25: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 25

From open data handbook:Open data is as defined by the Open Definition:Open data is data that can be freely used, reused and redistributed by anyone.

What is “Open Data”?

Page 26: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 26

From OECD Open Data Project:

Definition of ‘Open’ from 2011 OECD Publishing Review :

To make OECD data machine-readable, retrievable, indexable and re-usable

What is “Open Data”?

Page 27: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 27

1. Completeness: Datasets released by the government should be as complete as possible, reflecting the entirety of what is recorded about a particular subject. Metadata that defines and explains the raw data should be included as well, along with formulas and explanations for how derived data was calculated.

2. Primacy: Datasets released should be primary source data. This includes the original information collected, details on how the data was collected and the original source documents recording the collection of the data.

3. Timeliness: Datasets should be available to the public in a timely fashion. Whenever feasible, information collected should be released as quickly as it is gathered and collected

4. Ease of Physical and Electronic Access: Datasets should be as accessible as possible. There should be no barriers such as completing forms or submitting requests or systems that require browser-oriented technologies (e.g., Flash, Javascript, cookies or Java applets).

5. Machine readability: Information should be stored in widely-used file formats that easily lend themselves to machine processing. These files should be accompanied by documentation related to the format and how to use it in relation to the data..

Open Data: Ten Principles for Opening Up Government Information (Sunlight Foundation)

Page 28: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 28

6. Non-discrimination: Barriers to use of data can include registration or membership requirements. Any person can access the data at any time without having to identify him/herself or provide any justification for doing so.

7. Use of Commonly Owned Standards: Should be freely available formats by which stored data can be accessed without the need for a software license to make the data available to a wider pool of potential users.

8. Licensing: Maximal openness means making data available without restrictions on use as part of the public domain.

9. Permanence: Information should be available online in archives in perpetuity. Data should remain online, with appropriate version-tracking and archiving over time.

10. Usage Costs: Data should be available free of charge

Open Data: Ten Principles for Opening Up Government Information (Sunlight Foundation)

Page 29: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 29

Examples of open data initiatives

Page 30: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 30

Open Data examples – Data.Gov.uk

Page 31: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 31

Open Data examples – Data.Gov

Page 32: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 32

Open Data examples

Page 33: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 33

Open Data examples – World Bank

Page 34: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 34

OECD Open Web Services

Page 35: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 35

Introducing the OECD DELTA Programme….

Incorporating Big Data into official statistics Open Data programmes - the OECD perspective

Page 36: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 36

DELTA Programme – Making OECD data Open, Accessible, Free

Accessible

Open

Free

FindUnderstandUse

Machine-readableIndexableRe-Useable

Available without charge

Page 37: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 37

• To make OECD data machine-readable, retrievable, indexable and re-usable.

• To increase the dissemination and impact of OECD data via open data services for OECD statistical data

• To encourage re-use of OECD data and re-use by OECD of external innovation via open innovation process and communities,

The Open Data project - goals

Page 38: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 38

Data content• All datasets within the OECD.Stat data

warehouse with standardised structural format and content necessary for machine-to-machine “Open” access.

The Open Data project - scope

Page 39: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 39

i) SDMX/JSON JavaScript Object Notation (JSON)text-based open standard designed for human-readable data interchange Widely-used open data format on web sites today. JSON has a number of advantages, including: • Simplicity - simple and ‘lightweight’ format with a smaller

grammar and can map directly onto the data structures used in today’s programming languages.

• Interoperability - has the same interoperability potential as XML.

• Openness - has the same open capabilities as XML• Readability - is much easier for human to read than XML. It is

easier to write and is easier for machines to read and write. 

Open data web services – Data Formats

Page 40: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 40

ii) Excel/CSV Excel and CSV are already widely used exchange standards so including them as output formats was a fairly obvious decision.

iii) Open Data (OData)OData is an open protocol for sharing data

Open data web services – Data Fotmats

Page 41: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 41

iv) Future formats could include Google Data (a REST-inspired technology), Google Dataset Publishing Language (DPSL) or Google KML, a Geospatial file format.

Open data web services – Data Formats

Page 42: How can Big Data contribute to the Open Data process? DANE Big Data Event, Bogotá, October 2013 1.

DANE, Big Data, October 2013 42

Any questions?