Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on...

41
Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar Corcho Escuela Técnica Superior de Ingenieros Informáticos Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net/ [email protected]

Transcript of Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on...

Page 1: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Linked Statistical Data 101ESS Workshop on dissemination of official

statistics as open data

18-19 January 2017, Malta

Oscar CorchoEscuela Técnica Superior de Ingenieros Informáticos

Universidad Politécnica de Madrid

Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid

http://www.oeg-upm.net/

[email protected]

Page 2: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Contents

• Foundations of (Linked) Open Data

- For public administrations, in general

- For statistical offices, in particular

• Linked Statistical Data by example

- A use case from IAEST (Aragón Statistical Office)

• A bit of technical background

- W3C RDF DataCube

• Preparing the discussion on benefits for different

types of stakeholders

2

Page 3: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Contents

• Foundations of (Linked) Open Data

- For public administrations, in general

- For statistical offices, in particular

• Linked Statistical Data by example

- A use case from IAEST (Aragón Statistical Office)

• A bit of technical background

- W3C RDF DataCube

• Preparing the discussion on benefits for different

types of stakeholders

3

Page 4: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

What is Open Data?

• Open data is data that can be freely used, re-used

and redistributed by anyone - subject only, at most, to

the requirement to attribute and share alike

• Key aspects:

- Availability and access: the data must be available as a

whole and at no more than a reasonable reproduction cost,

preferably by downloading over the Internet. The data must

also be available in a convenient and modifiable form.

- Re-use and redistribution: the data must be provided

under terms that permit re-use and redistribution including

the intermixing with other datasets.

- Universal participation: everyone must be able to use, re-

use and redistribute - there should be no discrimination

against fields of endeavour or against persons or groups

[source: Open Data Handbook, http://opendatahandbook.org/en/what-is-open-data/ ]

Page 5: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Relevant Legislation. Europe and Spain

• Open Access Initiative (2001). Scientific information; > 510 orgs

• Aarhus Convention (1998). Right to participate and access; 41

countries and the EU

• PSI Directives. PSI reuse (2003/98/EC and 2013/37/UE)

• Convention about access to official documentation (2009)

- 12 countries

• Law 37/2007. PSI reuse (transposition of directive 2003/98/EC)

- Modified in law 18/2015 (BOE 10/07/2015, directive 2013/37/UE )

• Law 11/2007. Citizen access to public services, and rights to good

quality services

• RD 4/2010 Esquema Nacional de Interoperabilidad

- Open standards, technology neutral, open source

• RD 1495/2011 It develops Law 37/2007 for national agencies

• Norma Técnica de Interoperabilidad (19/02/2013, BOE 4/3/2013)

[source: based on a presentation from Antonio Rodríguez Pascual (CNIG)]

Page 6: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

An Explosion of Open Data Portals

Page 7: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Open Data and how to publish it

1) In a posterboard

- For those with a lot of free time available

- Or those who happen to be there at the right time

Adapted from: Antonio Rodríguez Pascual (IGN)

Page 8: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Open Data and how to publish it

2) On a Web page or mobile app

- For people, but not downloadable

Adapted from: Antonio Rodríguez Pascual (IGN)

Page 9: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Open Data and how to publish it

3) In files

- These can be downloaded and use by humans in

information systems (XML, HTML, CSV, GTFS, etc.)

- Luckily, it is not a scanned PDF

Adapted from: Antonio Rodríguez Pascual (IGN)

Page 10: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Open Data and how to publish it

4) Via Web Services

- They can be used by systems (sometimes persons)

- They allow generating added value

- Ease of integration in the application logic

Adapted from: Antonio Rodríguez Pascual (IGN)

Page 11: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

All together…, Shaken, not stirred…

Page 12: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

What is Linked Data?

1. Use URIs to identify

rsources

2. Use HTTP URIs, so that

they can be found

3. Use de-referenceable

URIs, that is, provide

useful data (RDF, JSON,

SPARQL)

4. Include links to other

URIs.

• http://www.w3.org/DesignIssues/

LinkedData.html

Page 13: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Open Data and how to publish it

5) Via APIs (semantically enhanced) and linked

- To be used by systems (and sometimes persons)

- It allows generating added-value services

- Standardised formats (JSON, JSON-LD, RDF)

- Standardised models (vocabularies, ontologies)

Page 14: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Difficult to reuse

√ Reusable.

Not open

√ Reusable, open

Difficult to link together

√ Reusable, open,

complete, easier to link

Data representation formats

And many more: JSON, JSON-LD, Shapefiles, KMZ, KML, PC-Axis, etc.

Page 15: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Recap: The 5-star categorisation from TBL

Page 16: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Contents

• Foundations of (Linked) Open Data

- For public administrations, in general

- For statistical offices, in particular

• Linked Statistical Data by example

- A use case from IAEST (Aragón Statistical Office)

• A bit of technical background

- W3C RDF DataCube

• Preparing the discussion on benefits for different

types of stakeholders

18

Page 17: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

INFRASTRUCTURE

MICRODATA

MACRODATA

i

Cartography, streets,directories, codes…

ANALYSTSJOURNALISTS

CITIZENS

RESEARCHERS

METADATA

Which type of data and which (re)users?

[source: Alberto González Yanes (ISTAC)]

Page 18: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Our use case: Aragón

• IAEST

- Instituto Aragonés de

Estadística

• Good open data

ecosystem

- Aragón Open Data

• http://opendata.aragon.es/

- Zaragoza

• http://datos.zaragoza.es/

24

Page 19: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Reports and templates

from Oracle BI

Current Web application

for local statistics

Statistics about municipalities

Page 20: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Statistics about municipalities

• At IAEst Web

- http://www.aragon.es/DepartamentosOrganismosPublicos/In

stitutos/InstitutoAragonesEstadistica/AreasGenericas/ci.Esta

disticaLocal.detalleDepartamento

• At OpenDataAragón

- http://opendata.aragon.es/catalogo/edificios-superficie-y-

vivienda-comarcas

Page 21: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Reports and templates

from Oracle BI

Current Web application

for local statistics

What have we done?

Page 22: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

SPARQL

Elda

Linked Data

Transformation process

API

Publication process

General architecture

This is not the purpose of my talk

https://github.com/aragonopendata/local-data-aragopedia

Page 23: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

URIs for datasets

• Let’s look for the dataset on “Number of homes per

owner per municipality”

- Número de hogares por tipo de propietario por municipio

• The dataset has a URI

- http://opendata.aragon.es/recurso/iaest/dataset/01-

010013TM

Page 24: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

URIs for each observation

• And now we can point to specific observations in this

dataset

- In 2001, the number of buildings owned by one person in the

municipality of Ilche

• http://opendata.aragon.es/recurso/iaest/observacion/01-

010013TM/00794aab-964f-35c7-8e7c-156c9bc60133

36

Page 25: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

URIs for each observation

37

Page 26: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

And links to other URIs in Aragón

• The municipality of Ilche

- http://opendata.aragon.es/recurso/territorio/Municipio/Ilche

- This information is owned by another department of the

Government of Aragón

38

Page 27: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

And links to codelists

• Types of owners

- http://opendata.aragon.es/kos/iaest/clase-de-propietario

• The community

• A person

• A society

• A public organisation

39

Page 28: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

SPARQL endpoint

The women population in Zaragoza in the age range of 0-15

years growed until 2013 and then reduced

select distinct ?year ?personas

where

{

?x a qb:Observation .

?x qb:dataSet <http://opendata.aragon.es/recurso/iaest/dataset/03-030005TM> .

?x <http://purl.org/linked-data/sdmx/2009/dimension#refPeriod> ?year .

?x <http://purl.org/linked-data/sdmx/2009/dimension#refArea> <http://opendata.aragon.es/recurso/territorio/Municipio/Zaragoza>.

?x <http://opendata.aragon.es/def/iaest/dimension#edad-grandes-grupos> <http://opendata.aragon.es/kos/iaest/edad-grandes-grupos/0-a-15> .

?x <http://opendata.aragon.es/def/iaest/dimension#sexo> <http://opendata.aragon.es/kos/iaest/sexo/mujeres>.

?x <http://opendata.aragon.es/def/iaest/medida#personas> ?personas .

} ORDER BY ?year

Examples at

https://github.com/aragonopendata/local-data-aragopedia/blob/master/consultas.md

Page 29: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Contents

• Foundations of (Linked) Open Data

- For public administrations, in general

- For statistical offices, in particular

• Linked Statistical Data by example

- A use case from IAEST (Aragón Statistical Office)

• A bit of technical background

- W3C RDF DataCube

• Preparing the discussion on benefits for different

types of stakeholders

48

Page 30: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

W3C Data Cube

4949

http://www.w3.org/TR/vocab-data-cube/

Page 31: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

W3C Data Cube

5050

Page 32: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

DataSets and Observations

55

Page 33: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Observations in a dataset

58

qb:DataSetqb:Observationqb:dataSet

rdf:type

iaest-data:01-010003M/22001/030-045 aod:Abiego

sdmx:refArea

Iaest-codelist:superficie030-045

iaest:superficieUtil

“1”^^xsd:int

Iaest:numeroHogares

iaest:01-010003M

qb:dataSet

rdf:type

Page 34: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

DataCube Structure Definition

60

Page 35: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Describing the dataset

61

qb:DataSet qb:DataStructureDefinition qb:ComponentSpecification qb:ComponentProperty

sdmx:refArea

iaest:superficieUtil

qb:structure qb:component qb:componentProperty

rdf:type rdf:type

iaest:01-010003M iaest--dsd:01-010003M

qb:structure qb:component

qb:measure

iaest:numeroHogares

qb:dimension

qb:dimension

rdf:typerdf:type

Page 36: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Dimensions

62

qb:DataSet qb:DataStructureDefinition

rdfs:range

qb:concept

qb:DimensionProperty qb:MeasureProperty

qb:Observation

esadm:MunicipioIaest:SuperficieUtil

qb:ComponentSpecification

qb:ComponentProperty

rdfs:subClassOf

qb:dataSet

iaest:numeroHogaressdmx:refAreaiaest:superficieUtil

rdf:type rdf:type

rdfs:range

xsd:int

rdfs:range

qb:structure qb:component

qb:componentProperty

Page 37: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

SKOS Codelists

63

rdfs:subClassOf

sdmx:CodeList skos:Conceptskos:ConceptScheme

iaest:SuperficieUtilqb:codeList

Iaest-codelist:SuperficieUtil

rdf:type

Iaest-codelist:superficie030-045

skos:hasTopConceptrdf:type

Iaest-codelist:superficie046-060

Iaest-codelist:superficie180-mas

Page 38: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Contents

• Foundations of (Linked) Open Data

- For public administrations, in general

- For statistical offices, in particular

• Linked Statistical Data by example

- A use case from IAEST (Aragón Statistical Office)

• A bit of technical background

- W3C RDF DataCube

• Preparing the discussion on benefits for different

types of stakeholders

79

Page 39: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Why Linked Statistical Data? (I)

• Facilitate data (re)use by developers outside our

organisation

• Data access APIs (according to standards)

• Do they prefer CSVs, PCAxis, SDMX, RDF?

• Fine-grained data granularity (refer to specific facts)

• Integration with other data sources from other public

or private organisations

- E.g., Government of Aragón for municipalities

• Allow for queries across datasets

- E.g., tell me how many municipalities may benefit from this

funding that I am making available with these restrictions:

number of registered companies lower than 5 and

unemployed population higher than 15%

Page 40: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Why Linked Statistical Data? (II)

• Internal benefits as well

- Codelists are made available and more visible internally

- Methodology and metadata explicitly described as part of

the RDF DataCube data (e.g., reference years in datasets)

81

Page 41: Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar

Linked Statistical Data 101ESS Workshop on dissemination of official

statistics as open data

18-19 January 2017, Malta

Oscar CorchoEscuela Técnica Superior de Ingenieros Informáticos

Universidad Politécnica de Madrid

Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid

http://www.oeg-upm.net/

[email protected]