Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on...
Transcript of Linked Statistical Data 101 - European Commission · Linked Statistical Data 101 ESS Workshop on...
Linked Statistical Data 101ESS Workshop on dissemination of official
statistics as open data
18-19 January 2017, Malta
Oscar CorchoEscuela Técnica Superior de Ingenieros Informáticos
Universidad Politécnica de Madrid
Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www.oeg-upm.net/
Contents
• Foundations of (Linked) Open Data
- For public administrations, in general
- For statistical offices, in particular
• Linked Statistical Data by example
- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background
- W3C RDF DataCube
• Preparing the discussion on benefits for different
types of stakeholders
2
Contents
• Foundations of (Linked) Open Data
- For public administrations, in general
- For statistical offices, in particular
• Linked Statistical Data by example
- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background
- W3C RDF DataCube
• Preparing the discussion on benefits for different
types of stakeholders
3
What is Open Data?
• Open data is data that can be freely used, re-used
and redistributed by anyone - subject only, at most, to
the requirement to attribute and share alike
• Key aspects:
- Availability and access: the data must be available as a
whole and at no more than a reasonable reproduction cost,
preferably by downloading over the Internet. The data must
also be available in a convenient and modifiable form.
- Re-use and redistribution: the data must be provided
under terms that permit re-use and redistribution including
the intermixing with other datasets.
- Universal participation: everyone must be able to use, re-
use and redistribute - there should be no discrimination
against fields of endeavour or against persons or groups
[source: Open Data Handbook, http://opendatahandbook.org/en/what-is-open-data/ ]
Relevant Legislation. Europe and Spain
• Open Access Initiative (2001). Scientific information; > 510 orgs
• Aarhus Convention (1998). Right to participate and access; 41
countries and the EU
• PSI Directives. PSI reuse (2003/98/EC and 2013/37/UE)
• Convention about access to official documentation (2009)
- 12 countries
• Law 37/2007. PSI reuse (transposition of directive 2003/98/EC)
- Modified in law 18/2015 (BOE 10/07/2015, directive 2013/37/UE )
• Law 11/2007. Citizen access to public services, and rights to good
quality services
• RD 4/2010 Esquema Nacional de Interoperabilidad
- Open standards, technology neutral, open source
• RD 1495/2011 It develops Law 37/2007 for national agencies
• Norma Técnica de Interoperabilidad (19/02/2013, BOE 4/3/2013)
[source: based on a presentation from Antonio Rodríguez Pascual (CNIG)]
An Explosion of Open Data Portals
Open Data and how to publish it
1) In a posterboard
- For those with a lot of free time available
- Or those who happen to be there at the right time
Adapted from: Antonio Rodríguez Pascual (IGN)
Open Data and how to publish it
2) On a Web page or mobile app
- For people, but not downloadable
Adapted from: Antonio Rodríguez Pascual (IGN)
Open Data and how to publish it
3) In files
- These can be downloaded and use by humans in
information systems (XML, HTML, CSV, GTFS, etc.)
- Luckily, it is not a scanned PDF
Adapted from: Antonio Rodríguez Pascual (IGN)
Open Data and how to publish it
4) Via Web Services
- They can be used by systems (sometimes persons)
- They allow generating added value
- Ease of integration in the application logic
Adapted from: Antonio Rodríguez Pascual (IGN)
All together…, Shaken, not stirred…
What is Linked Data?
1. Use URIs to identify
rsources
2. Use HTTP URIs, so that
they can be found
3. Use de-referenceable
URIs, that is, provide
useful data (RDF, JSON,
SPARQL)
4. Include links to other
URIs.
• http://www.w3.org/DesignIssues/
LinkedData.html
Open Data and how to publish it
5) Via APIs (semantically enhanced) and linked
- To be used by systems (and sometimes persons)
- It allows generating added-value services
- Standardised formats (JSON, JSON-LD, RDF)
- Standardised models (vocabularies, ontologies)
Difficult to reuse
√ Reusable.
Not open
√ Reusable, open
Difficult to link together
√ Reusable, open,
complete, easier to link
Data representation formats
And many more: JSON, JSON-LD, Shapefiles, KMZ, KML, PC-Axis, etc.
Recap: The 5-star categorisation from TBL
Contents
• Foundations of (Linked) Open Data
- For public administrations, in general
- For statistical offices, in particular
• Linked Statistical Data by example
- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background
- W3C RDF DataCube
• Preparing the discussion on benefits for different
types of stakeholders
18
INFRASTRUCTURE
MICRODATA
MACRODATA
i
Cartography, streets,directories, codes…
ANALYSTSJOURNALISTS
CITIZENS
RESEARCHERS
METADATA
Which type of data and which (re)users?
[source: Alberto González Yanes (ISTAC)]
Our use case: Aragón
• IAEST
- Instituto Aragonés de
Estadística
• Good open data
ecosystem
- Aragón Open Data
• http://opendata.aragon.es/
- Zaragoza
• http://datos.zaragoza.es/
24
Reports and templates
from Oracle BI
Current Web application
for local statistics
Statistics about municipalities
Statistics about municipalities
• At IAEst Web
- http://www.aragon.es/DepartamentosOrganismosPublicos/In
stitutos/InstitutoAragonesEstadistica/AreasGenericas/ci.Esta
disticaLocal.detalleDepartamento
• At OpenDataAragón
- http://opendata.aragon.es/catalogo/edificios-superficie-y-
vivienda-comarcas
Reports and templates
from Oracle BI
Current Web application
for local statistics
What have we done?
SPARQL
Elda
Linked Data
Transformation process
API
Publication process
General architecture
This is not the purpose of my talk
https://github.com/aragonopendata/local-data-aragopedia
URIs for datasets
• Let’s look for the dataset on “Number of homes per
owner per municipality”
- Número de hogares por tipo de propietario por municipio
• The dataset has a URI
- http://opendata.aragon.es/recurso/iaest/dataset/01-
010013TM
URIs for each observation
• And now we can point to specific observations in this
dataset
- In 2001, the number of buildings owned by one person in the
municipality of Ilche
• http://opendata.aragon.es/recurso/iaest/observacion/01-
010013TM/00794aab-964f-35c7-8e7c-156c9bc60133
36
URIs for each observation
37
And links to other URIs in Aragón
• The municipality of Ilche
- http://opendata.aragon.es/recurso/territorio/Municipio/Ilche
- This information is owned by another department of the
Government of Aragón
38
And links to codelists
• Types of owners
- http://opendata.aragon.es/kos/iaest/clase-de-propietario
• The community
• A person
• A society
• A public organisation
39
SPARQL endpoint
The women population in Zaragoza in the age range of 0-15
years growed until 2013 and then reduced
select distinct ?year ?personas
where
{
?x a qb:Observation .
?x qb:dataSet <http://opendata.aragon.es/recurso/iaest/dataset/03-030005TM> .
?x <http://purl.org/linked-data/sdmx/2009/dimension#refPeriod> ?year .
?x <http://purl.org/linked-data/sdmx/2009/dimension#refArea> <http://opendata.aragon.es/recurso/territorio/Municipio/Zaragoza>.
?x <http://opendata.aragon.es/def/iaest/dimension#edad-grandes-grupos> <http://opendata.aragon.es/kos/iaest/edad-grandes-grupos/0-a-15> .
?x <http://opendata.aragon.es/def/iaest/dimension#sexo> <http://opendata.aragon.es/kos/iaest/sexo/mujeres>.
?x <http://opendata.aragon.es/def/iaest/medida#personas> ?personas .
} ORDER BY ?year
Examples at
https://github.com/aragonopendata/local-data-aragopedia/blob/master/consultas.md
Contents
• Foundations of (Linked) Open Data
- For public administrations, in general
- For statistical offices, in particular
• Linked Statistical Data by example
- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background
- W3C RDF DataCube
• Preparing the discussion on benefits for different
types of stakeholders
48
W3C Data Cube
4949
http://www.w3.org/TR/vocab-data-cube/
W3C Data Cube
5050
DataSets and Observations
55
Observations in a dataset
58
qb:DataSetqb:Observationqb:dataSet
rdf:type
iaest-data:01-010003M/22001/030-045 aod:Abiego
sdmx:refArea
Iaest-codelist:superficie030-045
iaest:superficieUtil
“1”^^xsd:int
Iaest:numeroHogares
iaest:01-010003M
qb:dataSet
rdf:type
DataCube Structure Definition
60
Describing the dataset
61
qb:DataSet qb:DataStructureDefinition qb:ComponentSpecification qb:ComponentProperty
sdmx:refArea
iaest:superficieUtil
qb:structure qb:component qb:componentProperty
rdf:type rdf:type
iaest:01-010003M iaest--dsd:01-010003M
qb:structure qb:component
qb:measure
iaest:numeroHogares
qb:dimension
qb:dimension
rdf:typerdf:type
Dimensions
62
qb:DataSet qb:DataStructureDefinition
rdfs:range
qb:concept
qb:DimensionProperty qb:MeasureProperty
qb:Observation
esadm:MunicipioIaest:SuperficieUtil
qb:ComponentSpecification
qb:ComponentProperty
rdfs:subClassOf
qb:dataSet
iaest:numeroHogaressdmx:refAreaiaest:superficieUtil
rdf:type rdf:type
rdfs:range
xsd:int
rdfs:range
qb:structure qb:component
qb:componentProperty
SKOS Codelists
63
rdfs:subClassOf
sdmx:CodeList skos:Conceptskos:ConceptScheme
iaest:SuperficieUtilqb:codeList
Iaest-codelist:SuperficieUtil
rdf:type
Iaest-codelist:superficie030-045
skos:hasTopConceptrdf:type
Iaest-codelist:superficie046-060
Iaest-codelist:superficie180-mas
…
Contents
• Foundations of (Linked) Open Data
- For public administrations, in general
- For statistical offices, in particular
• Linked Statistical Data by example
- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background
- W3C RDF DataCube
• Preparing the discussion on benefits for different
types of stakeholders
79
Why Linked Statistical Data? (I)
• Facilitate data (re)use by developers outside our
organisation
• Data access APIs (according to standards)
• Do they prefer CSVs, PCAxis, SDMX, RDF?
• Fine-grained data granularity (refer to specific facts)
• Integration with other data sources from other public
or private organisations
- E.g., Government of Aragón for municipalities
• Allow for queries across datasets
- E.g., tell me how many municipalities may benefit from this
funding that I am making available with these restrictions:
number of registered companies lower than 5 and
unemployed population higher than 15%
Why Linked Statistical Data? (II)
• Internal benefits as well
- Codelists are made available and more visible internally
- Methodology and metadata explicitly described as part of
the RDF DataCube data (e.g., reference years in datasets)
81
Linked Statistical Data 101ESS Workshop on dissemination of official
statistics as open data
18-19 January 2017, Malta
Oscar CorchoEscuela Técnica Superior de Ingenieros Informáticos
Universidad Politécnica de Madrid
Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www.oeg-upm.net/