Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current...

36
Geonovum Testbed – Topic 4 Spatial data on the Web using the current SDI” INSPIRE Conference 2016

Transcript of Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current...

Page 1: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Geonovum Testbed – Topic 4“Spatial data on the Web using the current SDI”

INSPIRE Conference 2016

Page 2: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

The starting point …

Agrarisch Areaal

WFS

Adressen en Gebouwen(BAG)

Metadata

CSW WFS

SDI

GIS Experts and Developers

any modelany CRS

XMLrich queries

ISO 19115ISO 19139/XML

Page 3: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

What are we trying to do?

• presence on the Web of data• crawlability and linkability, i.e. make each resource hosted by a WFS or CSW available via a persistent URI and ensure that all resources can be reached via links from a “landing page” for the data set

• harmonisation of data discovery: • classification of the resources using vocabularies supported by the main search engines on the Web

• discovery of both spatial and non‐spatial data by the same search engine• data access based on current Web practices

• representations of data for consumption by humans (HTML), developers (JSON‐LD, GeoJSON, GML) and search engine crawlers (HTML with structured data annotations), accessible via HTTP(S)

• connecting data with other data on the Web• establishing and maintaining links between data

Page 4: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

What have we built

Agrarisch Areaal

WFS

Adressen en Gebouwen(BAG)

Metadata

CSW WFS

SDI

GIS Experts and Developers

any modelany CRS

XMLrich queries

ISO 19115ISO 19139/XML

Bekendmakingen

“linkeddata

proxy”

“linkeddata

proxy”GeoNetwork

Indexed Web Linked Data Web?Web APIs?

Open Data Portals

Search Engine

Crawlers

“Web Developers”

schema.orgWGS 84

HTML & JSON-LDContent negotiation“Follow your nose”

DCATschema.org

links betweendatasets

Page 5: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Architectural principles used in the proxy design 

• All resources are identified using persistent HTTP URIs• All resources should be discoverable via search engines• All interaction is using the HTTP protocol, consistent with its design• APIs to access data should be self‐describing and support immediate use

• Resources can be accessed and understood by developers and citizens

• Resources are either explicitly linked using HTTP URIs or data is structured so that links can be established dynamically

Page 6: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Metadata Paul

Page 7: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Search for an address in Google

Feature from Address WFS reported as first hit

Page 8: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

HTML page of the address – live from the WFS

Feature available under a persistent URI as HTML, GeoJSON, JSON‐LD, GML

Page 9: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

HTML annotated with schema.org

Page 10: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Landing page of the ldproxy.net demo server

Page 11: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

A dataset (a WFS)

Page 12: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

A dataset (a feature type in a WFS)

Page 13: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

… paged access to all features

Page 14: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

All data is accessed live from a WFS

Page 15: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Providing subsets of large data sets (here: by municipality)

Page 16: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Select the municipality

http://www.ldproxy.net/bag/inspireadressen/?fields=addressLocality&distinctValues=true

Page 17: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Dynamically establish links to other resources(here: a dataset with public announcements)

http://www.ldproxy.net/bag/inspireadressen/?addressLocality=Valkenswaard

Page 18: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

A public announcement

Page 19: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Note: Workshop tomorrow morning at 09:00 about the work of the Working Group

Page 20: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Indexing is slow   0.2% the pages (features) after 5 months

Page 21: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Indexing of the schema.org annotation is minimal

Page 22: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Similar across search engines

Page 23: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Google is experimenting with the discovery of datasets

https://developers.google.com/search/docs/data‐types/datasets

Page 24: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Resources

• Detailed reporthttp://geo4web‐testbed.github.io/topic4/Many additional aspects and issues covered, not discussed today

• Source codehttps://github.com/interactive‐instruments/ldproxy

• Docker imagehttps://hub.docker.com/r/iide/ldproxy/

Page 25: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Summary of the Findings

• To a large extent we were successful – SDIs can be leveraged to make data available on the Web 

• But there are open questions and issues, e.g.• Data integration across datasets: Properly structured data using common vocabularies will decrease the cost of data integration (here: linking) significantly

• SDI consistency: Metadata in WFS capabilities and the dataset metadata in catalogues is often incomplete or inconsistent, persistent identifiers in WFSs, dead links

• Performance & data compactness: Response times and data size  impact on the indexing / ranking by search engines and usability in general

• Search engines are largely a black box when we look at spatial data: how is structured spatial data used?

• schema.org is a divergent kind of vocabulary ‐ from the Linked Data perspective• Different vocabularies needed for different use cases / communities – e.g. schema.org vs GeoSPARQL

• Content negotiation based on media types does not support different vocabularies / user communities

Page 26: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Docker

Page 27: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Docker

Page 28: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Docker image of ldproxy

Page 29: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Set up a local ldproxy service using docker

https://github.com/interactive‐instruments/ldproxy/blob/master/docs/00‐getting‐started.md

Page 30: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Adapt a ldproxy service configuration

• change feature type label “lands2:watertorens” to “Watertorens“• disable Foto_groot in the overviews • disable properties OBJECTID and Foto_thumb everywhere • have a better feature name than "watertorens.1" • change "WOONPLAATS" to "Woonplats" 

also documented at …https://github.com/interactive‐

instruments/ldproxy/blob/master/docs/00‐getting‐started.md

Page 31: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Service Manager without ldproxy services

Page 32: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

… with one ldproxy service

Page 33: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

The “landing page” for the ldproxy service

Page 34: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

The page for a feature type

Page 35: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

The page of a feature

Page 36: Geonovum Testbed –Topic 4 “Spatial data on the Web using ... · •data access based on current Web practices •representations of data for consumption by humans (HTML), developers

Try it yourself, even without setting up your own proxy

http://services.live.geocat.net:7080