Development of guidelines statistical data as linked …...Development of guidelines for publishing...
Transcript of Development of guidelines statistical data as linked …...Development of guidelines for publishing...
![Page 1: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/1.jpg)
Development of guidelines for publishing statistical dataas linked open dataMERGING STATISTICS AND GEOSPATIALINFORMATION IN MEMBER STATES ‐ POLAND
Mirosław MigaczINSPIRE Conference 2016Barcelona, 26 IX 16
![Page 2: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/2.jpg)
Overall objectiveSupport decision‐making processes involving provision of standardized, usable and open georeferenced statistical data.
![Page 3: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/3.jpg)
What is linked open data?• Internet – collection of documents published online – accessible at Web location identified by a URL,
• Documents mainly human‐readable and cannot be understood by machines.
• Linked open data – data machine‐readable formats and connecting described using Uniform Resource Identifiers (URIs), thus enabling people and machines to collect the data, and put it together to do all kinds of things with it (permitted by the licence).
source: https://joinup.ec.europa.eu/community/ods/description (CC 2.0)
![Page 4: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/4.jpg)
Linked open data• URI – for names
• RDF – to describe data
• SPARQL – to query for data
source: https://joinup.ec.europa.eu/community/ods/description (CC 2.0)
![Page 5: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/5.jpg)
Uniform Resource Identifier (URI)to „make a long story short”:
object described by an internet address
A country, e.g. Belgium
http://publications.europa.eu/resource/authority/country/BEL
A dataset, e.g. Countries Named Authority List
http://publications.europa.eu/resource/authority/country/
In official statistics it can look like this:
http://teryt.stat.gov.pl/32/18/05/3 ‐ gmina Węgorzyno
source: https://joinup.ec.europa.eu/community/ods/description (CC 2.0)
![Page 6: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/6.jpg)
RDF & SPARQLResource Description Framework (RDF ) is a syntax for representing data and resources in the Web
RDF breaks every piece of information down in triples:
• Subject – a resource, which may be identified with a URI.
• Predicate – a URI‐identified reused specification of the relationship.
• Object – a resource or literal to which the subject is related.
source: https://joinup.ec.europa.eu/community/ods/description (CC 2.0)
http://example.org/place/Brussels is the capital of “Belgia”LUB
http://example.org/place/Brussels is the capital of http://example.org/place/Belgium
subject predicate object
SPARQL is a standardised language for querying RDF data.
![Page 7: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/7.jpg)
Five stars of linked open data
source: https://joinup.ec.europa.eu/community/ods/description (CC 2.0)
Make your stuff available on the Web (whatever format) under an open license.
Make it available as structured data (e.g., Excel instead of image scan of a table)
Use non‐proprietary formats (e.g., CSV instead of Excel)
Use URIs to denote things, so that people can point at your stuff
Link your data to other data to provide context
![Page 8: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/8.jpg)
Now
powiatłobeski(LAU 1)
3218
4.4.32.64.18
lobeski
4326418
![Page 9: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/9.jpg)
Aim
powiat łobeski
http://nts.stat.gov.pl/4/4/32/64/18
![Page 10: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/10.jpg)
Specific objectives• identification of statistical units for which data can be published with harmonization of theirgeometries for respective years
• building standarizedURIs for statistical units
• identification and analysis of potential data sources
• plan for transformation of existing data sourcesinto open formats
• creation of RDF metadata for data sources
• feasibility analysis for publishing linked open data
![Page 11: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/11.jpg)
Identification of data sources• Three major databases:
• Local Data Bank• biggest set of statistical information available
for a wide range of years• updated monthly
• Demography Database • integrated data source for state and structure
of population, vital statistics and migrations
• Development monitoring system STRATEG• a system for facilitating and monitoring the
development policy• key measures to monitor execution of
strategies at local, regional, transregional and EU level.
![Page 12: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/12.jpg)
Identification of data sources• Other data sources:• publications
• tables
• communiques
• announcements
• articles
![Page 13: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/13.jpg)
Identification of data sources• Metadata:• thematic category,
• format (PDF, DOC, XLS, CSV),
• spatial reference (country, NUTS, LAU, functional areas, urbanareas),
• temporal reference (years)
• presence of identifiers (TERYT, NTS, NUTS)
• update cycle
![Page 14: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/14.jpg)
Preliminary analysis of data sources• Key aspects:• openness
• redundance of information
• popularity (based on view and download statistics)
• Inclusion / exclusion of the data source
![Page 15: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/15.jpg)
Statistical units harmonization• Basis:• NTS (Nomenclature of Territorial Units for Statistical Purposes)
Name NTS NUTS/LAU Identifier
Region 1 NUTS 1 1.6
Voivodship 2 NUTS 2 2.6.22
Subregion 3 NUTS 3 3.6.22.40
Powiat 4 LAU 1 4.6.22.40.11
Gmina 5 LAU 2 5.6.22.40.11.01.1
![Page 16: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/16.jpg)
Statistical units harmonization• Input data:• administrative boundaries since 2002 for LAU 2 (gmina), excluding 2007
• Harmonization process:• structure standardization
• standardization of identifiers (creating NTS identifiers)
• aggregation to higher level units (LAU 1 ‐> NUTS 1)
![Page 17: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/17.jpg)
Statistical units harmonization• Non‐standard statistical units:• functional areas
• urban areas
• Groups of NTS units
• Derive mostly from strategic documents
• Changes of geometries in time to be determined
![Page 18: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/18.jpg)
Statistical units URIs• NTS as basic classification
Name NTS NUTS/LAU
Identifier URIhttp://nts.stat.gov.pl/...
Region 1 NUTS 1 1.6 …1/6
Voivodship 2 NUTS 2 2.6.22 …2/6/22
Subregion 3 NUTS 3 3.6.22.40 …3/6/22/40
Powiat 4 LAU 1 4.6.22.40.11 …4/6/22/40/11
Gmina 5 LAU 2 5.6.22.40.11.01.1 …5/6/22/40/11/01/1
http://nts.stat.gov.pl/5/6/22/40/11/01/1
![Page 19: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/19.jpg)
Data transformation plan• Test workflow from ontology to SPARQL endpoint• Decide what will be published as Open Data• three major databases• other data sources
• Create ontology
• Map to existing databases
• Export to RDF data store
• Publish on linked data server
• Workflow tested on STRATEG database
![Page 20: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/20.jpg)
Ontology ‐ methods and tools• Ontop ‐ platform to query databases as Virtual RDF Graphs using SPARQL• SPARQL 1.0 Support
• Supports interface for ontology development
• Intuitive/powerful mapping language
• Support for free and commercial DBMS
• SPARQL end‐point
![Page 21: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/21.jpg)
Mapping ontology on database
![Page 22: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/22.jpg)
SPARQL query on mapped data
![Page 23: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/23.jpg)
SPARQL endpoint tools for the web• Apache Jena Fuseki• Fuseki is a SPARQL server. It allows REST‐style SPARQL Query.
• Ontop generated RDF’s are imported to Apache Jena
• Pubby• A Linked Data Frontend for SPARQL Endpoints
• Pubby makes it easy to turn a SPARQL endpoint into a Linked Data server. It is implemented as a Java web application.
• Provides data at given linked data uri
![Page 24: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/24.jpg)
Fuseki SPARQL endpoint query
![Page 25: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/25.jpg)
Query result facilitated by Pubby
![Page 26: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/26.jpg)
Further works• Consultation of the designed workflow during a studyvisit at the Madrid University of Technology
• Setting up an internal test linked data server to implement web tools
• Creating ontologies and workflows for databases and other data sources
![Page 27: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/27.jpg)
Summary – results so far• Harmonized geometries for statistical units
• Identified data sources with comprehensive metadata
• Preliminary data transformation plan with tools tested
![Page 28: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/28.jpg)
Poland’s data opening strategy• launched this year
• aimed at opening data resources of governmentinstitutions with respect to the 5‐stars of linked open data goals
• the grant results (guidelines) in line with the strategy
• increased probability of acquiring financing for a fullyfledged implementation
![Page 29: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/29.jpg)
INSPIRE Thematic Clustershttps://themes.jrc.ec.europa.eu – collaboration platform
Statistical Cluster:
statistical units
population distribution (demography)
human health and safety
Informal meeting of Cluster members duringthe coffee break (15:30‐16:00)
![Page 30: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION](https://reader035.fdocuments.in/reader035/viewer/2022070719/5edf7cc7ad6a402d666ad4b7/html5/thumbnails/30.jpg)
Questions?