Linked Statistical Data 101
-
Upload
oscar-corcho -
Category
Government & Nonprofit
-
view
55 -
download
0
Transcript of Linked Statistical Data 101
![Page 1: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/1.jpg)
Linked Statistical Data 101ESS Workshop on dissemination of official
statistics as open data18-19 January 2017, Malta
Oscar CorchoEscuela Técnica Superior de Ingenieros Informáticos
Universidad Politécnica de MadridCampus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www.oeg-upm.net/ [email protected]
![Page 2: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/2.jpg)
2
Contents
• Foundations of (Linked) Open Data- For public administrations, in general- For statistical offices, in particular
• Linked Statistical Data by example- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background- W3C RDF DataCube
• Preparing the discussion on benefits for different types of stakeholders
![Page 3: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/3.jpg)
3
Contents
• Foundations of (Linked) Open Data- For public administrations, in general- For statistical offices, in particular
• Linked Statistical Data by example- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background- W3C RDF DataCube
• Preparing the discussion on benefits for different types of stakeholders
![Page 4: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/4.jpg)
What is Open Data?
• Open data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and share alike
• Key aspects:- Availability and access: the data must be available as a
whole and at no more than a reasonable reproduction cost, preferably by downloading over the Internet. The data must also be available in a convenient and modifiable form.
- Re-use and redistribution: the data must be provided under terms that permit re-use and redistribution including the intermixing with other datasets.
- Universal participation: everyone must be able to use, re-use and redistribute - there should be no discrimination against fields of endeavour or against persons or groups
[source: Open Data Handbook, http://opendatahandbook.org/en/what-is-open-data/ ]
![Page 5: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/5.jpg)
Relevant Legislation. Europe and Spain
• Open Access Initiative (2001). Scientific information; > 510 orgs• Aarhus Convention (1998). Right to participate and access; 41
countries and the EU• PSI Directives. PSI reuse (2003/98/EC and 2013/37/UE)• Convention about access to official documentation (2009)
- 12 countries
• Law 37/2007. PSI reuse (transposition of directive 2003/98/EC)- Modified in law 18/2015 (BOE 10/07/2015, directive 2013/37/UE )
• Law 11/2007. Citizen access to public services, and rights to good quality services
• RD 4/2010 Esquema Nacional de Interoperabilidad- Open standards, technology neutral, open source
• RD 1495/2011 It develops Law 37/2007 for national agencies• Norma Técnica de Interoperabilidad (19/02/2013, BOE 4/3/2013)
[source: based on a presentation from Antonio Rodríguez Pascual (CNIG)]
![Page 6: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/6.jpg)
An Explosion of Open Data Portals
![Page 7: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/7.jpg)
Open Data and how to publish it
1) In a posterboard- For those with a lot of free time available- Or those who happen to be there at the right time
Adapted from: Antonio Rodríguez Pascual (IGN)
![Page 8: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/8.jpg)
Open Data and how to publish it
2) On a Web page or mobile app- For people, but not downloadable
Adapted from: Antonio Rodríguez Pascual (IGN)
![Page 9: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/9.jpg)
Open Data and how to publish it
3) In files- These can be downloaded and use by humans in
information systems (XML, HTML, CSV, GTFS, etc.)- Luckily, it is not a scanned PDF
Adapted from: Antonio Rodríguez Pascual (IGN)
![Page 10: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/10.jpg)
Open Data and how to publish it
4) Via Web Services- They can be used by systems (sometimes persons)- They allow generating added value- Ease of integration in the application logic
Adapted from: Antonio Rodríguez Pascual (IGN)
![Page 11: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/11.jpg)
All together…, Shaken, not stirred…
![Page 12: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/12.jpg)
What is Linked Data?
1. Use URIs to identify rsources
2. Use HTTP URIs, so that they can be found
3. Use de-referenceable URIs, that is, provide useful data (RDF, JSON, SPARQL)
4. Include links to other URIs.
• http://www.w3.org/DesignIssues/LinkedData.html
![Page 13: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/13.jpg)
Open Data and how to publish it
5) Via APIs (semantically enhanced) and linked- To be used by systems (and sometimes persons)- It allows generating added-value services- Standardised formats (JSON, JSON-LD, RDF)- Standardised models (vocabularies, ontologies)
![Page 14: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/14.jpg)
Difficult to reuse
√ Reusable. Not open
√ Reusable, open Difficult to link together
√ Reusable, open, complete, easier to link
Data representation formats
And many more: JSON, JSON-LD, Shapefiles, KMZ, KML, PC-Axis, etc.
![Page 15: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/15.jpg)
Recap: The 5-star categorisation from TBL
![Page 16: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/16.jpg)
16
Contents
• Foundations of (Linked) Open Data- For public administrations, in general- For statistical offices, in particular
• Linked Statistical Data by example- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background- W3C RDF DataCube
• Preparing the discussion on benefits for different types of stakeholders
![Page 17: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/17.jpg)
INFRASTRUCTURE
MICRODATA
MACRODATA
i
Cartography, streets,directories, codes…
ANALYSTSJOURNALISTS
CITIZENS
RESEARCHERS
NON
PUBL
IC
PUBL
ICMETADATA
Which type of data and which (re)users?
[source: Alberto González Yanes (ISTAC)]
![Page 18: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/18.jpg)
18
Our use case: Aragón
• IAEST - Instituto Aragonés de
Estadística
• Good open data ecosystem- Aragón Open Data
• http://opendata.aragon.es/ - Zaragoza
• http://datos.zaragoza.es/
![Page 19: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/19.jpg)
Reports and templates from Oracle BI
Current Web application for local statistics
Statistics about municipalities
![Page 20: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/20.jpg)
Statistics about municipalities
• At IAEst Web- http://www.aragon.es/DepartamentosOrganismosPublicos/In
stitutos/InstitutoAragonesEstadistica/AreasGenericas/ci.EstadisticaLocal.detalleDepartamento
• At OpenDataAragón- http://opendata.aragon.es/catalogo/edificios-superficie-y-vivi
enda-comarcas
![Page 21: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/21.jpg)
Reports and templates from Oracle BI
Current Web application for local statistics
What have we done?
![Page 22: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/22.jpg)
SPARQL
Elda
Linked Data
Transformation process
API
Publication process
General architecture
This is not the purpose of my talk
https://github.com/aragonopendata/local-data-aragopedia
![Page 23: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/23.jpg)
URIs for datasets
• Let’s look for the dataset on “Number of homes per owner per municipality”- Número de hogares por tipo de propietario por municipio
• The dataset has a URI- http://opendata.aragon.es/recurso/iaest/dataset/01-
010013TM
![Page 24: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/24.jpg)
24
What is behind that URI?
This is not the purpose of my talk
![Page 25: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/25.jpg)
25
URIs for each observation
• And now we can point to specific observations in this dataset- In 2001, the number of buildings owned by one person in the
municipality of Ilche• http://opendata.aragon.es/recurso/iaest/observacion/01-010013
TM/00794aab-964f-35c7-8e7c-156c9bc60133
![Page 26: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/26.jpg)
26
URIs for each observation
![Page 27: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/27.jpg)
27
And links to other URIs in Aragón
• The municipality of Ilche- http://opendata.aragon.es/recurso/territorio/Municipio/Ilche - This information is owned by another department of the
Government of Aragón
![Page 28: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/28.jpg)
28
And links to codelists
• Types of owners- http://opendata.aragon.es/kos/iaest/clase-de-propietario
• The community• A person• A society• A public organisation
![Page 29: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/29.jpg)
SPARQL endpoint
The women population in Zaragoza in the age range of 0-15 years growed until 2013 and then reduced
select distinct ?year ?personaswhere { ?x a qb:Observation . ?x qb:dataSet <http://opendata.aragon.es/recurso/iaest/dataset/03-030005TM> . ?x <http://purl.org/linked-data/sdmx/2009/dimension#refPeriod> ?year . ?x <http://purl.org/linked-data/sdmx/2009/dimension#refArea>
<http://opendata.aragon.es/recurso/territorio/Municipio/Zaragoza>. ?x <http://opendata.aragon.es/def/iaest/dimension#edad-grandes-grupos>
<http://opendata.aragon.es/kos/iaest/edad-grandes-grupos/0-a-15> . ?x <http://opendata.aragon.es/def/iaest/dimension#sexo>
<http://opendata.aragon.es/kos/iaest/sexo/mujeres>. ?x <http://opendata.aragon.es/def/iaest/medida#personas> ?personas .} ORDER BY ?year
Examples at https://github.com/aragonopendata/local-data-aragopedia/blob/master/consultas.md
![Page 30: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/30.jpg)
30
Contents
• Foundations of (Linked) Open Data- For public administrations, in general- For statistical offices, in particular
• Linked Statistical Data by example- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background- W3C RDF DataCube
• Preparing the discussion on benefits for different types of stakeholders
![Page 31: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/31.jpg)
W3C Data Cube
3131
http://www.w3.org/TR/vocab-data-cube/
![Page 32: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/32.jpg)
W3C Data Cube
3232
![Page 33: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/33.jpg)
DataSets and Observations
33
![Page 34: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/34.jpg)
34
Observations in a dataset
qb:DataSet
qb:Observation
qb:dataSet
rdf:type
iaest-data:01-010003M/22001/030-045 aod:Abiego
sdmx:refArea
Iaest-codelist:superficie030-045
iaest:superficieUtil
“1”^^xsd:int
Iaest:numeroHogares
iaest:01-010003M
qb:dataSetrdf:type
![Page 35: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/35.jpg)
DataCube Structure Definition
35
![Page 36: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/36.jpg)
36
Describing the dataset
qb:DataSet
qb:DataStructureDefinition
qb:ComponentSpecification
qb:ComponentProperty
sdmx:refArea
iaest:superficieUtil
qb:structure qb:component qb:componentProperty
rdf:type rdf:type
iaest:01-010003M iaest--dsd:01-010003M
qb:structure qb:component
qb:measureiaest:numeroHogares
qb:dimension
qb:dimension
rdf:typerdf:type
![Page 37: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/37.jpg)
37
Dimensions
qb:DataSet
qb:DataStructureDefinition
rdfs:rangeqb:concept
qb:DimensionProperty
qb:MeasureProperty
qb:Observation
esadm:Municipio
Iaest:SuperficieUtil
qb:ComponentSpecification
qb:ComponentProperty
rdfs:subClassOf
qb:dataSet
iaest:numeroHogaressdmx:refAreaiaest:superficieUtil
rdf:type rdf:type
rdfs:range
xsd:int
rdfs:range
qb:structure qb:component
qb:componentProperty
![Page 38: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/38.jpg)
38
SKOS Codelists
rdfs:subClassOf
sdmx:CodeList
skos:Concept
skos:ConceptScheme
iaest:SuperficieUtil
qb:codeListIaest-codelist:SuperficieUtil
rdf:type
Iaest-codelist:superficie030-045
skos:hasTopConceptrdf:type
Iaest-codelist:superficie046-060
Iaest-codelist:superficie180-mas
…
![Page 39: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/39.jpg)
39
Contents
• Foundations of (Linked) Open Data- For public administrations, in general- For statistical offices, in particular
• Linked Statistical Data by example- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background- W3C RDF DataCube
• Preparing the discussion on benefits for different types of stakeholders
![Page 40: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/40.jpg)
Why Linked Statistical Data? (I)
• Facilitate data (re)use by developers outside our organisation• Data access APIs (according to standards)• Do they prefer CSVs, PCAxis, SDMX, RDF?• Fine-grained data granularity (refer to specific facts)
• Integration with other data sources from other public or private organisations- E.g., Government of Aragón for municipalities
• Allow for queries across datasets- E.g., tell me how many municipalities may benefit from this
funding that I am making available with these restrictions: number of registered companies lower than 5 and unemployed population higher than 15%
![Page 41: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/41.jpg)
41
Why Linked Statistical Data? (II)
• Internal benefits as well- Codelists are made available and more visible internally
- Methodology and metadata explicitly described as part of the RDF DataCube data (e.g., reference years in datasets)
![Page 42: Linked Statistical Data 101](https://reader036.fdocuments.in/reader036/viewer/2022062900/58e54f451a28ab3a468b6383/html5/thumbnails/42.jpg)
Linked Statistical Data 101ESS Workshop on dissemination of official
statistics as open data18-19 January 2017, Malta
Oscar CorchoEscuela Técnica Superior de Ingenieros Informáticos
Universidad Politécnica de MadridCampus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www.oeg-upm.net/ [email protected]