Architecture of SmartOpenData infrastructuresmartopendata.eu/sites/default/files/SmartOpenData...

Linked Open Data for environment

protection in Smart Regions

Architecture of

SmartOpenData

infrastructure

Deliverable D2.3 :: Public

Keywords: RM-ODP, Architecture design, Linked Open

Data, SDI, Open Data

Architecture of

SmartOpenData

infrastructure

Deliverable D2.3 :: Public

Keywords: RM-ODP, Architecture design, Linked Open

Data, SDI, Open Data

D2.3 Architecture of SmartOpenData infrastructure SmartOpenData project (Grant no.: 603824)

Version 2.0 of 77 © SmartOpenData Consortium 2014



Table of Contents

1 Executive Summary ............................................................................................................ 9

2 Introduction...................................................................................................................... 10

3 Technological outcomes for the project .......................................................................... 11

3.1 HABITATS Reference Laboratory.............................................................................. 11

Figure 1 HABITATS Reference Laboratory Architecture........................................................... 12

3.2 Plan4Business Open Data Repository ...................................................................... 12

The platform targets different user groups. ............................................................................ 13

Figure 2 Plan4business Open Data Repository Architecture ................................................... 14

3.3 LOD2 project............................................................................................................. 14

3.4 GeoKnow project...................................................................................................... 17

Figure 3 GeoKnow High-Level Architectural Overview............................................................ 18

3.5 SemaGrow project.................................................................................................... 18

Figure 4 SemaGrow Architecture............................................................................................. 19

3.6 DCAT application profile for data portals in Europe ................................................ 19

3.7 CKAN domain model ................................................................................................ 20

3.7.1 CKAN and GeoNetwork for an integrated Open Data Portal........................... 21

Figure 5 OpenDataNetwork System & Nodes Architecture..................................................... 21

3.8 INSPIRE metadata and DCAT-AP .............................................................................. 21

4 RM-ODP methodology ..................................................................................................... 24

4.1 RM – ODP methodology........................................................................................... 24

4.2 View point approach ................................................................................................ 24

4.3 Generic and pilot design .......................................................................................... 26

5 Enterprise viewpoint ........................................................................................................ 27

5.1 Basic pilots................................................................................................................ 27

5.2 Common search and indexing facilities.................................................................... 28

5.2.1 Spain (and Portugal) ......................................................................................... 29

5.2.2 Ireland .............................................................................................................. 30

5.2.3 Italy................................................................................................................... 33

5.2.4 Czech ................................................................................................................ 34

5.2.5 Slovakia............................................................................................................. 37

5.3 Cross-pilot functional requirements ........................................................................ 39

5.3.1 Plan4business................................................................................................... 39

5.3.2 Tourism............................................................................................................. 39



6 Information viewpoint ..................................................................................................... 41

6.1 Local data ................................................................................................................. 41

6.1.1 GIS Files ............................................................................................................ 41

6.1.2 General purpose files ....................................................................................... 42

6.2 Basic data types used in SmartOpenData ................................................................ 44

6.3 RDF Data integration................................................................................................ 47

6.4 Ontologies and vocabularies .................................................................................... 48

6.5 Registers and Registries ........................................................................................... 48

6.6 Information structure and content with a clear focus on the metadata and data

models; ................................................................................................................................. 50

6.7 Tasks related to SmartOpenData data..................................................................... 50

6.8 Data flows................................................................................................................. 50

7 Computational viewpoint................................................................................................. 51

7.1 Local data ................................................................................................................. 51

7.2 Server side ................................................................................................................ 51

7.3 Client side ................................................................................................................. 52

7.4 Non-functional requirements .................................................................................. 53

8 Engineering viewpoint...................................................................................................... 54

8.1 Generic architecture ................................................................................................ 54

8.2 Server side ................................................................................................................ 55

8.2.1 Relation database............................................................................................. 55

8.2.2 Data Integration ............................................................................................... 55

8.2.3 Structured and non-structured data extraction .............................................. 56

8.2.4 Converting data to RDF .................................................................................... 57

8.2.5 SPARQL EndPoint.............................................................................................. 57

8.2.6 Support for standard vocabularies................................................................... 58

9 Technology viewpoint ...................................................................................................... 59

9.1 Tools for technical implementations ....................................................................... 59

9.2 Spatial data serialization tools ................................................................................. 60

9.2.1 GeoKnow TripleGeo ......................................................................................... 60

9.3 Non-spatial data serialization tools.......................................................................... 61

9.3.1 D2RQ................................................................................................................. 61

9.3.2 db2triples ......................................................................................................... 61

9.3.3 TripleStores and SPARQL/GeoSPAQL endpoints.............................................. 61

9.3.4 Parliament ........................................................................................................ 61

9.3.5 Strabon ............................................................................................................. 62



9.3.6 OpenSahara uSeekM IndexingSail Sesame Sail plugin..................................... 62

9.3.7 Openlink Virtuoso Universal Server ................................................................. 62

9.3.8 Ontotext OWLIM .............................................................................................. 62

9.3.9 SPARQ ED ......................................................................................................... 62

9.4.10 SIRENDB........................................................................................................... 63

9.3.11 Sefarad-Faceted Search.................................................................................... 63

9.4 High-level technical specification – generic level .................................................... 63

9.5 Examples for architecture implementation of the pilots......................................... 65

9.5.1 Czech pilot ........................................................................................................ 66

9.5.2 Slovakian pilot ................................................................................................. 66

9.5.2 Irish pilot........................................................................................................... 67

9.5.3 Spanish pilot ..................................................................................................... 68

9.5.4 Italian pilot ....................................................................................................... 69

9.5.5 Tourist cross border pilot ................................................................................. 70

9.6 Relation with semantic indexing .............................................................................. 70

10 Conclusions and Recommendations ............................................................................ 72

10.1 Linkage with main technical components of SmartOpenData ................................ 72

Annex A: ................................................................................................................................... 75



List of Figures

Figure 1 HABITATS Reference Laboratory Architecture........................................................... 12

Figure 2 Plan4business Open Data Repository Architecture ................................................... 14

Figure 3 GeoKnow High-Level Architectural Overview............................................................ 18

Figure 4 SemaGrow Architecture............................................................................................. 19

Figure 5 OpenDataNetwork System & Nodes Architecture..................................................... 21

Figure 6 RM ODP viewpoints.................................................................................................... 25

Figure 7 SmartOpenData data flow.......................................................................................... 50

Figure 8 Generic architecture................................................................................................... 54

Figure 9 High-level technical specification – generic level...................................................... 64

Figure 10 Component diagram of SmartOpenData ................................................................. 65

Figure 11 Czech Pilot Architecture........................................................................................... 66

Figure 12 Slovaks pilot architecture......................................................................................... 67

Figure 13 Irish pilot................................................................................................................... 68

Figure 14 Spanish pilot: Agroforestry Management................................................................ 68

Figure 15 Portuguese-Spanish pilot: Water Management ...................................................... 69

Figure 16 Italian pilot ............................................................................................................... 69

Figure 14 WP4 implementation and linkage with pilots.......................................................... 71



List of Tables

Table 1 Pilot Data source.......................................................................................................... 46

Table 2 Available components for SmartOpenData ................................................................ 60

Table 3 SmartOpenData LOD functionalities ........................................................................... 72



Contractual Date of Delivery to the EC: April2014

Actual Date of Delivery to the EC:

Editor(s): Karel Charvat (HSRS)

Contributor(s): TRAGSA, UPM, MAC, Sindice, SDati, HSRS, IMCS,

SAZP, FMI

DOCUMENT HISTORY

Version Version date Responsible Description

1.0 28 March 2014 HSRS Initial draft, TOC, call for

contributions from others.

1.1 10. May 2014 HSRS with

contribution

for all pilot

partners

First version of report

1.2 27. May 2014 HSRS with

contribution

from SAZP,

FMI

Last version

2.0 30. June 2014 HSRS with

contribution

from

TRAGSA

Final version

The information and views set out in this publication are those of the author(s) and do not

necessarily reflect the official opinion of the European Communities. Neither the European

Union institutions and bodies nor any person acting on their behalf may be held responsible

for the use which may be made of the information contained therein.

Copyright © 2014, SmartOpenData Consortium.



1111 Executive Summary

This deliverable D2.3 is part of WP2, task T2.3. It is based on previous work of tasks T2.1 and

T2.2, deliverables D2.1 and D2.3 and their content was defined on Prague meeting of Pilot

partners organized on 22nd

and 23rd

of May. D2.3 defines a reference infrastructure model

and high-level technical specification for the project, including its main components and

connection points to other tools and systems. There is implemented methodology reference

model for open distributed processes (RM – ODP) The reference architecture aims at

meeting the technical and user requirements established throughout T2.1 and T2.2,

addressing interoperability and multilingualism aspects, metrics engine and interfaces. The

reference architecture will define both, platform neutral components and also provides

suggestion for concrete implementation. In addition to that document indicates linkages

between architecture (WP2), data model (WP3), SmartOpenData Semantic Front-end

Facilities (WP4) and demonstration pilots (WP5). This will ensure overall consistency in the

development of the main technical project components.



2222 Introduction

The objective of Task 2.3 is to define generic architecture (platform neutral), supporting at

the same time pilot solutions. This task defines a reference infrastructure model and high-

level technical specification for the project, including its main components and connection

points to other tools and systems. The methodology reference model for open distributed

processes (RM – ODP) was used. The reference architecture aims at meeting the technical

and user requirements established throughout T2.1 and T2.2, addressing technology and

user requirements. The reference architecture will define both, platform neutral

components and will provide suggestions for concrete implementation. The objective of this

task is not to design monolithic solution integrating whole life of Link Open Data, but to

define basic architecture components of Link Open Data chain and define potential solutions

for solving concrete challenges of the pilots.

Additional requirements came from analysis of cross links identified among single pilots and

also from project partners SMEs, potential users of Linked Open Data



3333 Technological outcomes for the project

The deliverables D2.2 Requirements of the SmartOpenData Infrastructure and D3.1 Review

of geographic resources metadata and related metadata standards analyzed existing

solutions, projects and platforms. For design of SmartOpenData architecture, we selected

solutions, allowing re–use of their outcomes to meet the SmartOpenData solution

requirements and expectations. First two solutions come from already finished projects,

representing traditional approach to SDI building. We also do expect re-use of the

repositories prepared by these two project (Habitats, Plan4business), making them available

as Linked Open Data. In case of both solutions, there is also plan to extend classical GI

modules with Linked Open Data (LOD) functionality.

Other solutions are oriented on LOD and SmartOpenData project expects re-use of their

components and experience for implementation for SmartOpenData platform.

3.1 HABITATS Reference Laboratory

HABITATS project predecessor of SmartOpenData with focus on better utilization of

environmental data. HABITATS (Social Validation of INSPIRE Annex III Data Structures in EU

Habitats) extended classical INSPIRE based SDI architecture and define new solution

supporting better usability of INSPIRE based services by people. Solution was defined in the

form of Habitats Reference Laboratory.

The Habitats RL networking architecture was as follows1:

1 HABITATS D4.3.2 Networking services and service toolkit, March 2013, www.habitats.cz/gallery



Figure 1 HABITATS Reference Laboratory Architecture

The important extension of classical INSPIRE architecture was Application layer and

Presentation layers, allowing building user focused Apps on the base of INSPIRE SDI. This

approach will be also important for future utilization of Linked Open Data as part of

SmartOpenData.

Two other important outcomes, which will be reused and extended in SmartOpenData is

SuperCatalog, which provide harvesting and validation of INSPIRE and GEOSS based services

in Europe and Worldwide and also concepts of integration of Linked Data with INSPIRE based

View client. The intention of SmartOPenData is to transform this catalogue to RDF.

3.2 Plan4Business Open Data Repository

Plan4Business (A service platform for aggregation, processing and analysis of urban and

regional planning data)2 is an FP7 project that has developed an aggregation platform

serving multiple providers and thus offering users a full catalogue of planning data such as

transport infrastructure, regional plans, urban plans and zoning plans. The platform offers

clients the data itself in integrated, harmonised and thus ready-to-use form, but also

provides rich analysis and visualization services. Such services are offered via different

interfaces, such as an API and an interactive web front-end (WebGIS).

The Plan4Business “WhatsthePlan” service3 is a public platform for harmonisation,

integration, storing and analysis of spatial and non-spatial data. The platform contains a

large data pool of planning data including pan-European datasets such as Urban Atlas,

Corine Land Cover and Natura 2000; statistical information from EUROSTAT and selected

2 www.plan4business.eu

3 www.whatstheplan.eu



countries, national datasets such as cadastral information and flood zones; and regional and

local urban planning data. The platform is open for everyone and encourages users to share

their data and expand the data coverage on horizontal and vertical levels.

Besides data harmonisation and integration, the platform enables various analyses based on

its integrated datasets. The more spatial information available, the better and more precise

analysis results that can be retrieved.

The platform targets different user groups.

• Tools for spatial data experts.

o The Integration Engine enables harmonisation of urban plans into the INSPIRE

Land Use schema and publish them in an interoperable way using the OGC

web services.

o The Map Creator enables users to prepare a map of their choice by making an

overlay of data from the database as well of remote data connected using the

OGC web services.

• Spatial apps for general users. These apps, which are easy to use and show the

capabilities of the platform, include:

o Location Evaluator – that generate a comprehensive report about a region, a

municipality or a point of interest

o Thematic Map Viewer – that navigates through thematic maps and

predefined analyses from local to European level

The Plan4business Open Data Repository uses the Micka catalogue4 developed by HSRS, also

a partner in SmartOpenData. The catalogue enables storage not only of metadata about

existing datasets, but also about analyses, map compositions and integration services that

can be performed. The Integration Engine accessing and harmonising data in the Storage

Engine is supported by the HUMBOLDT Alignment Editor (HALE)5. The Analysis Engine

processes the requests given by users through the plan4business portal. The query is

processed and the Analysis Engine accesses the data storage and retrieves query results that

are then provided to users in standardised form. Its architecture is as follows6:

4

http://geossregistries.info/geosspub/resource_details_ns.jsp?compId=urn:geoss:csr:resource:urn:uuid:921876

05-201e-4175-a0ba-a15714041c53 5 www.esdi-community.eu/projects/hale

6 From

http://www.google.ie/url?sa=t&rct=j&q=&esrc=s&frm=1&source=web&cd=2&cad=rja&ved=0CDMQFjAB&url=

http%3A%2F%2Fwww.plan4business.eu%2Fcgi-

bin%2Fdownload.pl%3Ff%3D153.pdf&ei=1XuMUuG8JuiS7Qao64CgCg&usg=AFQjCNF3-

xcWC2QORCI99KuZcKVLNckdIg



Figure 2 Plan4business Open Data Repository Architecture

The Plan4business Open Data Repository builds on and extends the HABITATS Reference

Laboratory tools, with two major additions that will benefit SmartOpenData

1. It has additional tools that partly provide LOD functionality.

2. It is collecting large-scale heterogeneous data. It has a set of operational data from

Europe, basic data, statistical data, data that was in HABITATS, OpenStreetMap

transformed to INSPIRE models. Some country transport networks, for example

Norway. Spatial plans from some countries, for example the Irish Open Land Use

data, Czech cadastre maps, etc.7

As the project has common partners with SmartOpenData, the repository could be used

early in the project for fast prototyping with LOD tools for large-scale implementation in

WP3 (Data modeling and LOD alignment) by transforming it to Open Data.

3.3 LOD2 project

The Commission’s Open Data Support European initiative is using the DCAT Application

Profile common metadata vocabulary to describe datasets for data portals in Europe8. While

the metadata harvesting and publishing platform that EU Open Data Portals are using for

collecting metadata of datasets from government data portals, transforming it into RDF,

harmonizing it according to the DCAT-AP, and publishing it as Linked Open Government Data

7 Another source of transport geographic information is the portal http://geo.weastflows.eu/, from the

Weastflows (west and east freight flows) Interreg IVB North West Europe (NWE) project, www.weastflows.eu,

that aims to encourage a shift towards greener freight transport in NWE. MWRA is a partner in Weastflows. 8 https://joinup.ec.europa.eu/asset/dcat_application_profile/asset_release/dcat-application-profile-data-

portals-europe-final#download-linksl



(LOGD) is based on the Linked Open Data Management Suite developed in the LOD2

project9.

The project is integrating and syndicating linked data with large-scale, existing applications

and showcasing the benefits in the three

application scenarios of media and

publishing, corporate data intranets and

eGovernment.

The LOD2 Stack suite of tools and

components include10

:

• D2R Server is a tool for publishing

the content of relational databases

on the Semantic Web, a global

information space consisting of

linked data.

• Spatial Semantic Browser allows

geographic visualization of RDF

data,

• CubeViz is a facetted browser for

statistical data utilizing the RDF

Data Cube vocabulary which is the state-of-the-art in representing statistical data in

RDF.

• R2R Framework enables Linked Data applications which discover data on the Web,

that is represented using unknown terms, to search the Web for mappings and apply

the discovered mappings to translate Web data to the application's target

vocabulary.

• OntoWiki is a tool providing support for agile, distributed knowledge engineering

scenarios. It facilitates the visual presentation of a knowledge base as an information

map, with different views on instance data. It enables intuitive authoring of semantic

content, with an inline editing mode for editing RDF content, similar to WYSIWIG for

text documents.11

• ORE (Ontology Repair and Enrichment) tool allows knowledge engineers to improve

OWL ontology by fixing inconsistencies and making suggestions for adding further

axioms to it.

• Sparqlify is a SPARQL-SQL rewriter that enables one to define RDF views on relational

databases and query them with SPARQL. It is currently in alpha state and powers the

Linked-Data Interface of the LinkedGeoData Server – i.e. it provides access to billions

of virtual triples from the OpenStreetMap database.

9 https://github.com/nvdk/lodms-core/tree/virtuoso

10 LOD2 Stack Components & their APIs are described at

wiki.lod2.eu/display/LOD2DOC/LOD2+Stack+Components 11

(http://lod2.eu/Project/OntoWiki



• LIMES is a link discovery framework for the Web of Data. It implements time-efficient

approaches for large-scale link discovery based on the characteristics of metric

spaces. It is easily configurable via a web interface. It can also be downloaded as

standalone tool for carrying out link discovery locally.

• SigmaEE is a tool to explore and leverage the Web of Data. At any time, information

in Sigma is likely to come from multiple, unrelated Web sites - potentially any web

site that embeds information in RDF, RDFa or Microformats (standards for the Web

of Data).

• SIREn - Semantic Information Retrieval Engine - is a Lucene plugin to efficiently index

and query RDF, as well as any textual document with an arbitrary amount of

metadata fields.

• Silk Link Discovery Framework supports data publishers in accomplishing the second

task12

.

• Sieve allows Web data to be filtered according to different data quality assessment

policies and provides for fusing Web data according to different conflict resolution

methods.

• LODrefine is OpenRefine with integrated extensions that enable you to reconcile and

extend data with DBpedia, extract named entities and upload your data on

CrowdFlower crowdsourcing service.

• Virtuoso is an innovative industry standards compliant platform for native data,

information, and knowledge management. It implements and supports a broad

spectrum of query languages, data access interfaces, protocols, and data

representation formats that includes: SQL, SPARQL, ODBC, JDBC, HTTP, WebDAV,

XML, RDF, RDFa, and many more13

.

• PoolParty is a thesaurus management system and a SKOS editor for the Semantic

Web including text mining and linked data capabilities. The system helps to build and

maintain multilingual thesauri providing an easy-to-use interface.

• CKAN is a powerful data management system that makes data accessible – by

providing tools to streamline publishing, sharing, finding and using data. CKAN is

aimed at data publishers (national and regional governments, companies and

organizations) wanting to make their data open and available.

• DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia

resources in text, providing a solution for linking unstructured information sources to

the Linked Open Data cloud through DBpedia. DBpedia Spotlight performs named

entity extraction, including entity detection and Name Resolution (a.k.a.

disambiguation). It can also be used for building your solution for Named Entity

Recognition, amongst other information extraction tasks.

12

http://lod2.eu/Project/Silk 13

http://lod2.eu/Project/Virtuoso



3.4 GeoKnow project14

The GeoKnow vision is to make geospatial data accessible on the web of data and turn the

Web into a place where geospatial data can be published, queried, reasoned, and

interlinked, according to the Linked Data principles15

. This will move geospatial data beyond

syntactic interoperability to actual semantic interoperability, and to services that can

geospatially reason on the Web.

GeoKnow aims to repurpose SDI standards, enabling the existing vast body of geospatial

knowledge to be introduced in the Data Web. Further, it will apply the RDF model and the

GeoSPARQL standard as the basis for representing and querying geospatial data. In

particular, GeoKnow contributions will be in the following areas:

• Efficient geospatial RDF querying.

o Existing RDF stores lack performance and geospatial analysis capabilities

compared to geospatially-enabled relational DBMS. GeoKnow will focus on

introducing query optimisation techniques for accelerating geospatial querying at

least an order of magnitude.

• Fusion and aggregation of geospatial RDF data.

o Given a number of different RDF geospatial data for a given region containing

similar knowledge (e.g. OSM, PSI and closed data16

) and will devise automatic

fusion and aggregation techniques in order to consolidate them and provide a

data set of increased value and quantitative quality metrics of this new data

resource

• Visualization and authoring.

o GeoKnow will develop reusable mapping components, enabling the integration of

geospatial RDF data as an addition data resource in web map publishing. Further,

it will support expert and community-based authoring of RDF geospatial data

within interactive maps, fully embracing crowdsourcing.

• Public-private geospatial data.

o To support value added services on top of open geospatial data, GeoKnow will

develop enterprise RDF data synchronisation workflows that can integrate open

geospatial RDF with closed, proprietary data. It will focus on the supply chain and

e-commerce use cases.

• GeoKnow Generator.

o This will consist of a full suite of tools supporting the complete lifecycle of

geospatial linked open data. The GeoKnow Generator will enable publishers to

triplify geospatial data, interlink them with other geospatial and non-geospatial

Linked Data sources, fuse and aggregate linked geospatial data to provide new

data of increased quality, visualise and author linked geospatial data in the Web.

GeoKnow aims to contribute to the following areas concerned with geospatial data:

14

http://geoknow.eu and also see “GeoKnow: Leveraging Geospatial Data in the Web of Data”, Garcia-Rojas,

Athanasiou, Lehmann and Hladky, http://www.w3.org/2013/04/odw/odw13_submission_15.pdf 15

See for instance, Soren Auer and Jens Lehmann. Making the web a data washing machine – creating

knowledge out of interlinked data. Semantic Web Journal, 2010. 16

http://ec:europa:eu/informationsociety/policy/psi/indexen:htm



• Creation and maintenance of qualitative geospatial information from existing

unstructured data such as OpenStreetMap, Geonames and Wikipedia, anticipating

geospatial search and acquisition and aggregation of information resources.

• Reuse and exploitation of unforeseen discoveries found in geospatial data. GeoKnow

will provide methods to acquire, analyze and categories data that is rapidly evolving,

immense, incomplete and potentially conflicting with: Tools and methodologies for

mapping and exposing existing structured geospatial information on the web of data,

considering comprehensive and qualitative ontologies and efficient spatial indexing.

• Automatic fusing and aggregation of geospatial data by developing algorithms and

services based on machine learning, pattern recognition and heuristics.

• Tools for exploring, searching, authoring and curating the Spatial Data Web by using

Web 2.0 and machine learning techniques based on scalable spatial knowledge

stores.

All these contributions are integrated in the open source GeoKnow Generator framework

developed by the consortium.

Figure 3 GeoKnow High-Level Architectural Overview

The GeoKnow Generator will provide a comprehensive toolset of easy-to-use applications

covering a range of possible usage scenarios (e.g. mobility/traffic, energy/water, culture,

etc).

3.5 SemaGrow project

In order to achieve this ambitious vision and solve a difficult data management problem,

SemaGrow is addressing the following key challenges:

• Develop novel algorithms and methods for querying distributed triple stores that can

overcome the problems stemming from heterogeneity and from the fact that the



distribution of data over nodes is not determined by the needs of better load

balancing and more efficient resource discovery, but by data providers.

• Develop scalable and robust semantic indexing algorithms that can serve detailed

and accurate data summaries and other data source annotations about extremely

large datasets. Such annotations are crucial for distributed querying, as they support

the decomposition of queries and the selection of the data sources that each query

component will be directed to.

• Since it is, in the general case, not possible to align schemas and vocabularies so

perfectly that there is no loss of information, investigate how to minimize losses and

how to not accumulate them over successive schema translations.

The SemaGrow Large Scale Distributed Architecture Stack is as follows17

:

Figure 4 SemaGrow Architecture

3.6 DCAT application profile for data portals in Europe

The DCAT Application profile for data portals in Europe (DCAT-AP)18

is a very new

specification based on the Data Catalogue vocabulary (DCAT) 19

for describing public sector

17

From http://www.semagrow.eu/sites/default/files/SemaGrow_D2.3.1-

Large%20Scale%20Distributed%20Architecture.pdf 18

DCAT-AP: https://joinup.ec.europa.eu/asset/dcat_application_profile/description 19

DCAT: http://www.w3.org/TR/vocab-dcat/



datasets in Europe. Its basic use case is to enable cross-data portal searching for data sets

and making public sector data better searchable across borders and sectors. This can be

achieved by the exchange of descriptions of data sets among data portals.

Mandatory classes for the profile are:

• Agent (e.g. Organisation)

• Category (Subject of Dataset)

• Category scheme (Controlled vocabulary the theme comes from)

• Catalogue (Repository that hosts the dataset)

• Literal (Literal value)

• Resource (Anything described by RDF])

E.g. for all datasets mandatory properties are title and description, recommended contact

point, dataset distribution, keyword/tag, publisher theme/category

3.7 CKAN domain model

CKAN is an open source system for publishing data on the web. It provides tools to

streamline publishing, sharing, finding and using data.

The published metadata schemas are proprietary and provide some mapping to Dublin Core

elements. Also adding proprietary user defined elements is allowed. Harvesting of OGC CSW

2.0.2 + ISO 19139 spatial metadata standards are available with extensions.

CKAN metadata profile:

• id: unique id

• name (slug): unique name that is used in urls and for identification

• title (dc:title): short title for dataset

• url (home page): home page for this dataset

• author (dc:creator): original creator of the dataset

• author_email:

• maintainer: current maintainer or publisher of the dataset

• maintainer_email:

• license (dc:rights): license under which the dataset is made available

• version: dataset version

• notes (description) (dc:description): description and other information about the

dataset

• tags: arbitrary textual tags for the dataset

• state: state of dataset in CKAN system (active, deleted, pending)

• resources: list of resources



• groups: list of groups this dataset is a member of

• “extras” - arbitrary, unlimited additional key/value fields

3.7.1 CKAN and GeoNetwork for an integrated Open Data Portal

The OpenDataNetwork Project20

is a joint project promoted by a number of Public

Administrations located in the Italian Region of Tuscany, who developed a federated spatial

data infrastructure for the publication of their data (geospatial as well as alphanumeric) as

OpenData.

The OpenDataNetwork infrastructure is as follows:

Figure 5 OpenDataNetwork System & Nodes Architecture

The goal of the federation is to allow each partner administration to deploy the components

they would need to have a complete and ready-to-use infrastructure for the publication of

data as Open Data relying on Open Source products and accounting for the possibility to

reuse existing components (e.g. proprietary geospatial servers) in order to minimize the

impact. The Central Hub would then harvest the information from each Partner's

Deployment (with particular care to avoiding cyclic graph in the harvesting set up) in order

to act as the single entry point for the entire network. The possibility to have multiple

hierarchical levels is foreseen and allows implementation without further development (e.g.

a new County could join acting also as a Hub for its own cities which would not be harvested

by the Central Hub).

The node Architecture puts together in each instance all of the building blocks that a Partner

could need to ingest and disseminate geospatial and alphanumeric data with particular

emphasis on geospatial data. The system uses existing standards for the interconnections in

order to account for the possibility to swap some components with other similar ones. E.g. it

would be possible to swap the catalogue implementation from GeoNetwork to a different

catalogue providing it implemented the OGC CSW protocol.

3.8 INSPIRE metadata and DCAT-AP

A proposal for the alignment of INSPIRE metadata with the DCAT Application Profile is under

preparation by the Joint Research Centre of the European Commission, in the framework of

20 www.opendatanetwork.it

Action 1.17 of the EU ISA Programme. The document is still under the development status

and the working draft may dramatically change before final release. The profile uses INSPIRE

registr21

y together with DCAT defined classes for the mapping. This profile is very important

because it mediate the bridge between the INSPIRE and other European portals. It is

intended to implement this profile and test its usability and stability in the scope of

SmartOpenData project.

Example (part of the file):

… <rdf:Description rdf:about="http://someURI"> <foaf:primaryTopicOf rdf:resource="metadataURI"/>  <dct:language rdf:resource="http://publications.europa.eu/resource/authority/language/DEU"/>  <dct:title xml:lang="en">Forest / Non-Forest Map 2006</dct:title>  <dct:description xml:lang="en">Pan-European Forest / Non Forest Map with target year 2006, Data Source: Landsat ETM+ and Corine Land Cover 2006, Classes: forest, non-forest, clouds/snow, no data; Method: automatic classification performed with an in-house algorithm; spatial resolution: 25m. In addition, the forest map 2006 is extended to FTYPE2006 to include forest types (broadleaf, coniferous forest) that are mapped using MODIS composites.</dct:description>  <rdf:type rdf:resource="http://www.w3.org/ns/dcat#Dataset"/> <dct:type rdf:resource="http://inspire.ec.europa.eu/codelist/resource-type/series"/>  <dcat:landingPage rdf:resource="http://someurl.org"/>  <dct:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#string">namespace/code</dct:identifier>  <dct:subject rdf:resource="http://inspire.ec.europa.eu/codelist/topic-category/geoscientificInformation"/>  <dcat:theme rdf:resource="http://inspire.ec.europa.eu/theme/landCover"/>  <dcat:theme> <skos:Concept> <rdfs:label xml:lang="en">coniferous forest</rdfs:label> <skos:inScheme> <skos:ConceptScheme> <rdfs:label xml:lang="en">GEMET - Concepts, version 2.4</rdfs:label> <dct:issued rdf:datatype="http://www.w3.org/2001/XMLSchema-datatypes#date">2010-01-13</dct:issued> </skos:ConceptScheme> </skos:inScheme> </skos:Concept> </dcat:theme>  <dct:spatial> <dct:Location> <locn:geometry rdf:datatype="http://www.opengis.net/rdf#GMLLiteral"> <gml:Envelope srsName="http://www.opengis.net/def/crs/OGC/1.3/CRS84"> <gml:lowerCorner>-10.58 34.56</gml:lowerCorner> <gml:upperCorner>34.59 70.09</gml:upperCorner> </gml:Envelope> </locn:geometry>

21 http://inspire.ec.europa.eu/registry/)



</dct:Location> </dct:spatial>

…



4444 RM-ODP methodology

4.1 RM – ODP methodology

D2.3, architecture design is realized on the base of a Reference Model of Open Distributed

Processing22

. This model is the architecture reference model used also within ISO/TC 211

“Geographic Information – Reference model”23

, and on Open Geospatial Consortium

Reference Model (ORM).

The use of RM-ODP will give us two opportunities:

• To define the basic design of the solution as platform neutral and to support different

local implementation. This is important, because the objective of this document is

not to describe one unique technology solution, but to give general models, which

could be used by different organization across Europe. These models are then

demonstrated on selected pilot cases that are part of the project, in order to

demonstrate the feasibility of such solution

• To build on positive experiences of previous European research projects, as this

methodology is used by most European (mainly research) projects and some

recommendations already exist, our objective is to extend these models to make

them more oriented towards actual user needs.

The architecture design provides an overall conceptual framework for building geo-

processing services for biodiversity, sea region protection and for effective management and

utilization of sensitive areas.

4.2 View point approach

The RM-ODP divides all process of architecture design into five generic and complementary

steps, which are called viewpoints on the system and its environment. These viewpoints are:

• The enterprise viewpoint, which focuses on the purpose, scope and policies for the

system. It describes the business requirements and how to meet them. It is based on

user scenarios and user cases.

• The information viewpoint focuses on the semantics of the information and the

information processing performed. It describes the information managed by the system

and the structure and content type of the supporting data. This viewpoint is related to

WP3, where data and metadata models are defined. The Information viewpoint extends

these models and analyses also necessary operation.

22

RM-ODP) (ISO/IEC 10746-1 23

ISO 19101:2002



• The computational viewpoint provides functional decomposition of the system into

objects that interact at interfaces. It describes the functionality provided by the system

and its functional decomposition.

• The engineering viewpoint focuses on the mechanisms and functions required to

support distributed interactions between objects in the system. It describes the

distribution of processing performed by the system to manage the information and

provide the functionality.

• The technology viewpoint focuses on the choice of technology of the system. It describes

the technologies chosen to provide the processing, functionality and presentation of

information. In principle only this viewpoint is platform dependent. In our architecture

design in this first document we will analyze potential components for implementation.

Figure 6 RM ODP viewpoints24

For the architecture design of SmartOpenData we adapted RM ODP methodology to

guarantee real adoption of user needs and to give real possibilities of end user involvement

into the design process. The design process will be done iteratively with this as the first

version, so all views will be modified on the basis of user validation and user experiences. It

is necessary to take into consideration, that in many cases users are not familiar with newest

technologies and also have no experience in the design of distributed systems. Another

important aspect is also, that we don’t design one concrete solution, but a number of

relatively independent services. In order to allow reuse of some components we have

defined among pilots what common generic functions are fulfilling the requirements of

24

http://en.wikipedia.org/wiki/RM-ODP



different scenarios and what specific solution dependent components are. In the first

version of the design, we will focus mainly on generic functionality, in the second version;

we will try to add more pilot specific functions.

The relation of all these five viewpoints could be described by the following scheme:

The Enterprise viewpoint of the architecture design is focused on the analysis of pilot

scenarios and the definition of a limited numbers of generic use cases, which are

implemented to support basis functionalities required by more scenarios, but also

supporting the process of data and metadata harmonization based on outputs from WP3.

The Information viewpoint is focused on basic data and metadata sets, which could be

shared among different scenarios and will be focused on data and metadata related to

SmartOpenData specification.

The Computational viewpoint is focused on generic components, which could be reused for

more scenarios and which will be some basic parts of the infrastructure.

The Engineering viewpoint defines a generic scheme, which can be reused for all pilot

implementations and which could be modified for different scenarios. Also this viewpoint is

platform independent.

The Technical viewpoint defines basic architecture modifications for single scenarios and

suggests potential technical implementation. This solution is really basic definition and will

give more option of implementation. Concrete implementation will be modified during the

implementation and validation process.

4.3 Generic and pilot design

An important part of the methodology is to divide design of generic architecture and pilot

dependent architecture. On the basis of user requirements, generic use cases will be defined

and generics services will be designed for these generic use cases, which will be reusable for

different pilot solutions. On the basis of these generic services pilot applications will be

defined, which will be composed from generic services. Generic services will be available for

developers so they can implement specific user applications using these generic services.

Important is, that these applications can reuse existing components.25

25

D-4.2.2 HABITATS INSPIRE Networking Architecture, CIP- ICT-PSP-2009-3-250455, Social Validation of INSPIRE

Annex III Data Structures in EU Habitats, 2010



5555 Enterprise viewpoint

According to the SmartOpenData user-driven approach to standardization, the full impact of

results will be sparked off by the pilot service scenarios and their ability to prove feasibility

of the architecture and attract new participants to the communities of adoption. Each pilot is

therefore built on

a) existing concrete services currently carried out by project partners,

b) potential for data access through Linked Open Data and

c) Enhancement through usage scenarios developed by user communities, in order to

meet the three criteria of relevance, openness and responsiveness.

5.1 Basic pilots

The five SmartOpenData pilots fall into the next forward-looking categories described above

as follows:

Pilot I Spain (Public sector) Agroforestry management TRAGSA

Portugal-Spain Pilot focuses on building a web based collaborative spatial data infrastructure

prototype with the main goal of promoting sustainable agro forestry management.

Pilot II Ireland (Research, Enterprises) Environmental research, Biodiversity MAC

Irish Pilot will focus on the use of the SmartOpenData to provide open data and open

INSPIRE-compliant geo-spatial sources for environmental researchers particularly

focused on biodiversity and habitats, building on the participative social validation and

pilots of the HABITATS project in particular.

Pilot III Italy (Citizens) Water monitoring A.R.P.A. Sicilia

The Italian pilot in Sicily will explore the role of aggregating information from different Open

Data sources in order to support ARPA’s institutional mission of providing up to date

monitoring of water quality in Sicily.

Pilot IV Czech (Enterprises & Public sector) Forest sustainability UHUL FMI

Czech Republic is focused on the forest site classification, sustainable management and

utilization of forest road network using the National Forest Inventory and the Regional

Plans for Development datasets.

Pilot V Slovak (Citizens) Environmental data reuse SAZP

Slovakian Pilot will include the proposal, development and deployment of two conceptually

different types of web applications in order to achieve reuse of environmental data and

information in line with European Open Data Strategy:

1. Spatial Web Crawler: Aiming to support search and discovery of available spatial

resources



2. Biodiversity MashUp Linked Open Data Extension: Providing the evidence of linking

biodiversity related resources and contribution into decision support process.

Due the fact, that most of pilots are oriented mainly on publishing data, we add two

additional scenarios, which will be focused mainly on utilization of LOD. These two scenarios

are:

• Territorial decision - with focus on linkage of from data coming from project

Plan4business with data from SmartOpenData pilots, mainly forestry related data,

water quality environmental data. The Plan4business project collected large amount

of data, which are now opening, but disadvantage is, that data available for analysis

has to be stored on one portal. Using principles of LOD will be possible linked

distributed resources

• Discovery and utilize tourist related information coming from different repositories

trough one access point. The problem of current information related to tourism is or

stored in large repositories or are distributed on many servers and they are not well

visible. Environmental information collected mainly in Italian Slovak and Irish pilots

are interesting source of information for tourists.

Adding these two additional cross pilot activities suggested by SME partners will support

better utilization of available LOD by developers and providers of commercial services.

5.2 Common search and indexing facilities

Additionally to Pilot requirements, SmartOpenData will offer for external users searching,

indexing and visualization facilities, which will

- Provide common facilities for exploiting environmental data.

- Improve environmental data search ability thanks to integration of big data

infrastructure for structured and

- Semi-structured search facilities.

- Provide a semantic front-end framework for environmental data visualization.

These tools are part of WP4 design. WP4 system is creating services that take advantage of

such data and provide valuable services for each community illustrating how the availability

of such services and the corresponding data can provide advantages for them.

Between the pilot data sources and the external data consumer the SmartOpenData System

is placed providing key functionalities. The most basic element is the harmonization of data

sources. This element will offer an open data source layer that exposes the external data

sources fully adapted to the open data standards supported by the project.

Over this open data source layer, three key functionalities are defined:

(i) distributed semantic indexing, which provides a service for searching and locating

data based on semantic information collected from all the available Data Sources;

(ii) distributed data access, which provides data collected from external data sources, as

an extra data source for easier and uniform data gathering from the users at the

identified scenarios;



(iii) Administration and notification, which provides administration facilities for managing

users, workflows and data to data providers.

These three functional components will be communicated and coordinated inside the

SmartOpenData System, creating a distributed service system which can be accesses

transparently from the scenarios. It is also important to note that it will be possible for

services created on the scenarios to access directly external data sources selected thought

the distributed semantic indexing functionality. Pilot functional requirements

5.2.1 Spain (and Portugal)

The pilot will be mainly focused on meeting the needs and requirements of the public sector

regarding forest management and land use planning normative requirements. Another facet

of the pilot will be related with water and drinking water management. In both countries,

public bodies from these areas involved in the pilot, have defined their requirements and

will provide data and assess project results as final users.

Spanish and Portuguese public bodies count on environmental and geospatial information

that can be very valuable for management purposes. This information has a huge potential,

but it is usually not widely accessible and not sufficiently exploited, and needs to be

reorganized accordingly.

The final goal of this pilot is to provide an easy access to this information and to develop

decision making tools and services available and helpful for public and private agroforestry

managers. Results and outcomes will be open, standardized and of public access including

web services and information management applications. The technological approach will be

based on Open Data (environmental monitoring data, cartographic services and remote

sensing products) linked through INSPIRE compliant semantic services. This strategy based

on Open Data is necessary in order to optimize the use of public information, improve

management issues and involve stakeholders in decision making processes.

The Spanish pilot defined next concrete use cases:

• Determination of the best forestry species - offer a list of the best species to plant in

a particular plot. It allows taking the best decisions for the actor needs. The actors

could know which species are usually planted but is useful offer a tool with the

species more recommended for each plot. The use case is to be able get best species

clicking in any plot. After that the user can combine available data about cadastre,

species and any other layer and variable that interests to query, filter to choose the

best option.

• Best origin of seeds determination on forestry - offer the best plots to harvest seeds

for the citizen’s choice and determinate if the plot has been visited. It allows taking

the best decisions, showing the offer and saving time for the actors. The use case is

to be able combine available data about species, admission units, and any other layer

and variable that interests to query, filter and display the candidate units. In this way,

the user can check the information about seeds from anywhere and assist to

decision-making. Finally, the updating of the visits to the different units prevents to

visit them without the expected result. The citizens who usually turn to search for

seeds are foresters and forest owners. Also they could obtain information from the

nurseries data about the availability of any seed and its quality. This point is relevant.



There are four classes of seeds by quality: Identified: known origin, Selected: chosen

seeds, Qualified: seed from best individuals, Controlled: certify that the seed will

grows fine. The public bodies are divided among the nurseries, in this case the

"Viveiro de Ourense", which are also units of admission, provincial officers who

manage the visits, and the owners of the data used, such as the Data Bank for

Nature.

The generic LOD functionalities required by Spanish pilots are:

• Transformation (Relational data -> RDF)

• Storage

• Search

• Federated querying

• Visualization of LOD

• Visualization of LOD using conventional GI tools

5.2.2 Ireland

Seamlessly The Irish pilot bridges the major gap between the “worlds” of open data and INSPIRE

geo-spatial sources, to validate a major value-add and impact on their work. Some of the main issues

to be addressed include (i) Discovery and seamless use, and mashing together of all available sources

to address immediate research issues; (ii) Overcoming the barriers (cultural, political, administrative)

to opening up the data; (iii) Overcoming technical incompatibilities of datasets in terms of technical

standards, semantic structuring etc. (iv) Validation of the SmartOpenData platform in the

aggregation, analysis, and visualization to support decision making of the various research and

stakeholders requirements. This will use and complement the Biodiversity MashUp Linked Open Data

Extension and Spatial Web Crawler of the Citizens pilot but will focus more on identifying and

seamlessly using various sources at International, European, National and local levels by researchers.

This will be complemented by various social networking and crowdsourcing mobile apps to engage

stakeholders at the local level in particular developed as part of Irish pilot.

The pilot will begin at European level by using and seamlessly integrating the SmartOpenData

platform into, collaborating with, and building on various open data and geo-spatial sources and

initiatives that will have a particular value for biodiversity researchers, including (i) The European

Biodiversity Observation Network, EUBON project26

(ii) European Environmental Agency (EEA)

Biodiversity data centre (BDC)27

; (iii) PESI28

; (iv) FP7 EUBrazilOpenBio29

; (v) LifeWatch European

research infrastructure30

; (vi) The Joinup Portal31

; (vii) The UK Environmental Agency’s Datashare32

;

(viii) The Global Biodiversity Information Facility (GBIF)33

. This pilot will also use SmartOpenData to

mash disparate open data sources such as those listed at FreeGISdata34

and the GeoStore, which

allows use subject to the terms and conditions of the Open Government Licence (OGL), and others, in

various data formats. The implications of these for researchers will be explored in the pilot and

supported using the platform.

26

www.earthobservations.org/geobon.shtml; 27

www.eea.europa.eu/themes/biodiversity/dc 28

www.eu-nomen.eu/portal 29

www.eubrazilopenbio.eu 30

www.lifewatch.eu 31

joinup.ec.europa.eu/catalogue 32

www.geostore.com/environment-agency/WebStore 33

www.gbif.org 34

freegisdata.rtwilson.com



At a national level this pilot will focus on Ireland. Mid-West Regional Au, with the technical support

of MAC, will work with Irish public agencies, Local Authorities and community groups in the

implementation and use of the SmartOpenData platform in validation trials. These will include the

use of sources such as (i) The Irish National Parks & Wildlife Services (NPWS) who have extensive

open online maps and datasets35

; (ii) The National Biodiversity Data Centre Ireland36

; (iii) The Irish

Opendata Portal37

; (iv) The Irish Spatial Data Exchange (ISDE38

) and (v) The Marine Institute Ireland,

who have extensive OGC/INSPIRE compliant geo-spatial data 39

, which are all searchable using the

ISDE Browser40

. In addition there are a number of interactive services available to the public, such as

the vessel tracking service and Survey Planning System 41

.

The Irish pilot defined next concrete use cases:

• SmartOpenData enabled ETIS Webservice for the Burren & European GeoParks

Network - The SmartOpenData enabled European Tourism Indicators System (ETIS)

webservice for sustainable management at destination level, will streamline and

enhance the current manual system by transforming the ETIS Excel dataset into

Linked Open Geospatial Data. The Burren Geopark’s solidity as a destination is

exemplified by its benchmarking and monitoring procedures. It has adopted the

recently launched European Tourism Indicator System for the Sustainable

Management of Destinations (ETIS) to monitor and measure performance and is

one of 100 destinations in Europe that are currently piloting this system. Further to

this, Failte Ireland, the national tourism development authority, has expressed

interest in using the Geopark’s work on the ETIS as a pilot for assessing for larger-

scale, national projects. .

• SmartOpenData enabled Farming for Conservation webservice - The SmartOpenData

platform will be used to make the BFCP data open and linked, with integrated spatial

content, while protecting its sensitive data. The Burren Farming for Conservation

Programme (BFCP) provides a template for the future management of the Burren

and other HNV areas. Appropriate management of the land ensures the maintenance

of and/or improvement in the conservation status of the Annex I habitats and lead to

an increase the area of sustainably managed high nature value farmland within the

Burren. This will in turn lead to an improvement in water quality in aquatic

ecosystems and ensure the maintenance of and/or improvement in the high quality

Burren landscape and its cultural heritage. A series of databases has been set up and

is being maintained to manage and store the wide variety of data and additional

information generated by the BFCP. This includes Microsoft Excel format

spreadsheets of successful applicants and appellants, advisors, workers (available to

assist participants with Measure 2 work), contacts, farm plan checks and site visits,

farmer training bookings and attendance, receipt of signed plans, Measure 2

declaration forms (Form D1 and Form D2) and receipts requested, farmer and advisor

feedback, summary data from farm plans (including areas (ha), REPS/AEOS

35

www.npws.ie/mapsanddata 36

www.biodiversityireland.ie 37

opendata.ie 38

www.isde.ie 39

www.marine.ie/home/publicationsdata 40

catalogue.isde.ie 41

www.marine.ie/home/services/researchvessels



participation, payments, length/area/number for Measure 2 tasks and M1 scores),

payment checks, detailed Measure 1 assessment records for monitoring purposes.

GIS databases have been set up and include spatial data such as DAFM land parcels,

Ordnance Survey Ireland maps and aerial photos (for which a third party license

request via NPWS was applied for in June 2010), farm boundaries, field boundaries,

SAC designated areas, and national monuments. The SmartOpenData platform will

be used to make this data open and linked, with integrated spatial content, while

protecting its sensitive data. All practical precautions will be put in place to protect

sensitive data, such as BFCP participants’ contact and payment information. As the

NPWS data is predominantly INSPIRE compliant, the SmartOpenData platform will

enable its seamless interoperability with the BFCP datasets. The SmartOpenData

enabled webservice will support the BFCP Team (including Programme Manager and

Programme Scientist) in the production and maintenance of a set of detailed,

comprehensive and regularly updated datasets on all aspects of the farm plans and

the monitoring programmes. It could provide access to the DAFM and NPWS Land

Parcel Identification System (LPIS) based on-line planning system, aerial photographs

and digital maps through a third party agreement with OSI. The system will provide a

template for future management of the Burren and other HNV areas. The Farming for

Conservation webservice will have an easy-to-use user interface, and its users will be

the BFCP Team, Advisors and relevant Department people. Farmers may have limited

access (to their own farm’s data for instance, or input/editing of their farm plan).

Being linked and open, the webservice will enable seamless access by the NPWS and

relevant Government Departments (such as the Department of Arts, Heritage and the

Gaeltacht (DAHG) and Department of Agriculture, Food and the Marine (DAFM)).

Also, because it is linked and open, it will allow its seamless operation with the

Department’s own ESRI, ArcGIS systems using the appropriate open data plugins

• App to Ground-Truth potential Protected Monument sites. - SmartOpenData enabled

App to Ground-Truth potential Protected Monument sites. The SmartOpenData

platform will enable the provision of an App service to mobilise a very motivated

community, by enabling visitors and people interested in their local heritage, to seek

out and ground truth potential Monument sites in the Burren and beyond. At the end

of the project, the Heritage Council and National Monuments Service will decide if

the crowd-sourced ground truthing observations (both positive and negative) should

be included as a permanent Voluntary Geographic Information (VGI) layer on the

National Monuments map on their Mapping Viewer. This process will also initiate a

process for the digital preservation of the VGI data concerning the features that were

investigated, both those that are validated to be national monument sites, and those

that are not, to avoid people wasting their time and resources in investigating them

again. While the main users of the app service will be experienced archaeological

users and visitors to the Burren who are motivated to record the local heritage, the

app and process will be very educational and will probably be used by teachers and

students to discover and contribute to their local heritage. For instance, it could

complement the courses and practical local environmental work carried out by

BurrenBeo Trust. The Burren is very well observed and recorded over many years, so

few new National Monument sites are likely to be found. However other sites, such

as Lough Derg and the Slieve Aughtry, which is also in the Mid-West Region of

Ireland, is likely to yield many new national monuments. So further sites, beyond the



Burren, are likely to be supported by this Application quite early during the WP5 pilot

trials. The SmartOpenData App may also help Burren farmers (as well as Irish farmers

generally) to determine if their farm might contain a potential National Monument

Site (especially field systems) on their land. The BFCP works closely with the National

Monuments Service and the existing legislation for National and Recorded

Monuments. In order to satisfy the legal requirement for farmers with regard to work

carried out at, or near, any recorded monument or place, a procedure has been

developed whereby the BFCP prepares and submits the notification to the

Monument Protection Officer for County Clare with the National Monument Service,

on behalf of the farmer. In addition, the BFCP arranges for the Field Monuments

Advisor, to visit farmers who are proposing work in the vicinity of monuments or in

archaeologically sensitive areas so that he can talk to them about their archaeological

features, provide advice and generally raise their awareness re the significance of

archaeology in the Burren.

• SmartOpenData Platform input to the Irish OGP process and W3C “CSV on the Web”

standard - SmartOpenData Platform input to the Irish OGP Open Data Portal process

and W3C “CSV on the Web” standard, with a CKAN CSV-to-RDF extension and

reference implementation. The SmartOpenData platform will complement the

ongoing work of the Government Reform Unit by providing best practice tools,

insights seamless access to data sources and validation (in the WP5 pilots) to address

objective 4 of the Irish Government’s OGP Initiative. The SmartOpenData platform

CSV-to-RDF tools, as developed and used in the other Irish use cases and

applications, will be adapted as a CKAN add in, for use on the Irish OGP alpha site and

reference implementation of the W3C “CSV on the web” WG standard. . The WG will

finalise and issue their standard most likely at the end of 2014, so will evolve in

parallel with the SmartOpenData WP5 trials. The CKAN extension will be trialed and

validated in the Irish Pilots and Irish OGP process implementation of the Irish Open

Data Portal alpha site.

The generic LOD functionalities required by Irish pilots are:


• Storage

• Search


• Visualization LOD


• Publishing data from existing CSV databases as RDF

• Upgrading data

5.2.3 Italy

The Italian pilot in Sicily will explore the role of aggregating information from different Open

Data sources in order to support ARPA’s institutional mission of providing up to date

monitoring of water quality in Sicily. Some of the main issues to address are (i) Overcoming

the barriers (cultural, political, administrative) to opening up the data; (ii) Identifying the

optimal role of the general public in crowdsourcing environmental information; (iii)



Identifying the technical means of publication; (iv) Overcoming incompatibilities of datasets

in terms of technical standards (eg JSON vs XML), semantic structuring, etc.; (v) Optimum

ways of aggregation, analysis, and visualization to support decision-making for the different

stakeholders including the general public.

The three main stakeholder groups involved are: (i) Different offices within A.R.P.A. Sicilia

itself, in particular the link between the central office in Palermo and the local offices

distributed throughout the region; (ii) Key external stakeholders with a role in environmental

monitoring, including The ASL (local health care divisions), Ex-ATO (water supply at sub-

regional level) and City administrations; (iii) The general public providing information

through e.g. Geoblogging.

The Italian pilot defined next concrete use cases:

• Quality of sub-surface water publishing a list of existing or available datasets held by

ARPA or other public actors potentially useful for different purposes (work, hobby,

tourism, general information, etc.) of different stakeholders as LOD

• Quality of surface water (rivers, lakes and transition) publishing a list of existing or

available datasets held by ARPA or other public actors potentially useful for different

purposes (work, hobby, tourism, general information, etc.) of different stakeholders

as LOD

• Presence of micro-algae in coastal waters - publishing a list of existing or available

datasets held by ARPA or other public actors potentially useful for different purposes

(work, hobby, tourism, general information, etc.) of different stakeholders

is(Ostreopsis ovata and other dinoflagellates potentially toxic) as LOD

• Quality of treated urban waste water - publishing a list of existing or available

datasets held by ARPA or other public actors potentially useful for different purposes

(work, hobby, tourism, general information, etc.) of different stakeholders as LOD

• Air quality - publishing a list of existing or available datasets held by ARPA or other

public actors potentially useful for different purposes (work, hobby, tourism, general

information, etc.) of different stakeholders as LOD

The generic LOD functionalities required by Italian pilots are:


• Storage

• Search


• Visualization of LOD


• Publishing data from existing SQL databases as RDF

5.2.4 Czech

Czech Republic is focused on the forest site classification, sustainable management and

utilization of forest road network using the National Forest Inventory and the Regional Plans

for Development datasets. Data products and statistical outcomes will be widely open,



standardized and accessible by foresters and public bodies through web services and

applications.

The Czech National Forest Inventory (NFI) Geoportal will present valuable information

derived from NFI database. Our data infrastructure can support SME’s decision-making

process via provision of reliable, up-to-date information about forests and its wood as well

as other types of resources. The geoportal will also enhance public awareness about the

NFI2 project funded by Czech Government, as well as the awareness about the UHUL FMI

institute itself. Data and metadata will be aggregated and standardized according to

commonly used standards and SmartOpenData recommendations, so they will be accessible

on a shared project platform. The geoportal will be a gate to results of the NFI1 and NFI2

projects, which are derived by statistical sound methods using a huge, high-quality database

fusing field survey, photogrammetric interpretation and remote sensing data. Data quantity,

its quality and a thematically broad scope of the UHUL FMI activities is an asset for testing

semantic search approaches among our data services and other environmental organization

sources across EU. UHUL FMI manages data covering the entire CZ territory, some of them

are freely available, however free access to the rest is limited to an aggregated level. It might

be useful to include also data with restricted access into semantic search tools and bring

them closer to SMEs -at least on the aggregated level.

The Czech pilot defined the next concrete use cases:

• Access to Czech NFI information - GENERIC USE CASE: Actors require up-to date

information on forests, forestry and land-use structure in the Czech Republic (CZ).

This information is needed for a variety of reasons that differs among actors. The

purposes and properties of NFIs are widely known. A full description is far beyond of

explanation possibilities here. In Czech, the current, regularly published informations

on forests are mostly significantly biased. These informations come either from

summarized forest management plans data (abbreviated as SFMP) or are based on

forest enterprise level statistics (questionnaire surveys). In Czech, high-quality,

regular update and statistical sound information on forests and landscape has not

been addressed yet. Czech NFI portal will address these issues in a generic way. It will

be based on NFI data and information, which will be transferred effectively to any

potential users. Users will be able to identify, search, filter and combine relevant

target parameters, the estimates of which would be delivered for specified domains

(geographic areas or attribute-based partitions e. g. land tenure, management

system, site index, potential and current soil degradation). The pilot aims to design

and implement ways how these informations can be found and presented to actors.

Detailed use-cases follow and extend this generic one.

• Access to Czech NFI information – general public - Actor requires information

confirming/or rejecting the practice of sustainable use of forests in particular area. To

be more concrete – possible indicators for the sustainable development in forests

might be: the species composition – estimators of species proportion on total forest

land in given area is asked, soil degradation - information on pH and nutrients

content changes from the previous inventory is required, change in forest cover

between actual and previous (inventory)

• Access to Czech NFI information – Ministry of Agriculture - Actor requires up-to date

information on available wood resources and balance of harvest and growth in: the



whole CZ territory and its NUTS subdivisions up to NUTS3 level, private forests,

forests managed by Czech state forest enterprise (cca. 1 200 000 ha), This

information is needed to address forest policy issues at the level of the whole sector

within CZ. Currently this information is based on severely biased data coming either

from summarized forest management plans data (abbreviated as SFMP, the amount

of wood resources underestimated by almost 40% when compared to Czech NFI1

inventory in 2001-2004) or from enterprise level surveys (amount of wood cut yearly)

- enterprises fill in a form, in which they themselves declare how much wood they

have harvested in particular year (existence of several sources of bias is well known).

When it comes to growth no direct observations are involved in the calculations at

the moment – SFMP data are used as inputs to yield tables (stand level growth

model). Czech NFI pilot would address this and similar issues in a generic way –

selecting and combining relevant target parameters, the estimates of which would be

delivered for specified domains (geographic areas or attribute-based subsets e.g.

land tenure, height above sea level etc.)

• Access to Czech NFI information – Ministry of Environment - Actor needs fresh

information on GHGs (green-house gasses) sinks and emissions from forest land to

include it into national-level calculations. These are used to compare emission limits

with the current state and possibly sell or buy emissions to/from other countries. At

this moment, this can be correctly done using the NFI data on soil GHGs content and

its changes as well as GHGs content in total biomass on forest land. These estimates

are obtained within NFI by means of relevant sampling scheme carried out straightly

in the field – assessing relevant variables, taking soil samples... Current practice

builds on data sources mentioned in previous example (SFMP, enterprise level

surveys, SFMP as input to growth model). The way of handling complex issues of soil

GHGs balance is not known to me at the moment. Currently we’re not aware of any

sampling based and periodic survey, collecting and processing soil data on forest land

over the whole country in a statistical sound manner. Czech NFI pilot would therefore

address this and similar issues in a generic way – selecting and combining relevant

target parameters, the estimates of which would be delivered for specified domains

(geographic areas or attribute-based subsets e.g. land tenure, height above sea level

etc.)

• Access to Czech NFI information – JRC of the EU - Actor needs information on total

growing stock and its changes to involve this into GHGs emissions scenarios

dedicated to forest land. This information is required in a standardized way –

following reference definitions elaborated by ENFIN (see

http://www.metla.fi/eu/cost/e43/) and for spatial unites defined by NUTS and

INSPIRE reference grids. Currently this can be achieved only via providing source data

from NFI2 to JRC or his partners. Using sophisticated estimation procedures

benefiting from proprietary auxiliary data is not possible – proprietary auxiliary data

can’t be provided to third parties (JRC or its partners). Czech NFI pilot would address

this and similar issues in a generic way – selecting and combining relevant target

parameters, the estimates of which would be delivered for specified domains


etc.



• Access to Czech NFI information – saw or pulp mill company - The actor needs

information on the current and future availability of wood fulfilling requirements for

a specific assortment class (e.g. saw-wood, or pulp-wood). Information on spatial

distribution of this resource is essential to allocate companies resources e.g. the mills

and logistic centers for the company’s products. Czech NFI portal would address this

and similar issues in a generic way – selecting and combining relevant target

parameters, the estimates of which would be delivered for specified domains


etc.).

• RPFD – Regional Plans of the Forest Development - RPFD contain global information

on forest state and on the role of the forest as a public interest subject including

strategic recommended management guidelines following the ecosystem

conception. They become an important background in decision process within the

regionally differentiated state forest policy and in promotion of the public interests.

The generic LOD functionalities required by Czech pilots are:



5.2.5 Slovakia

Slovakian pilot will include the proposal, development and deployment of two conceptually

different types of web applications in order to achieve reuse of environmental data and

information in line with European Open Data Strategy:

1. Spatial Web Crawler: Web-based application allowing a query definition for a

particular spatial and thematic domains related information resources available in

the deep web retrieved from indexing processes in the context of most used search

engines (eg.Google) and making this content available for SDI as well as linked data

world. Nowadays a geospatial user within an SDI searches for geospatial information

using discovery clients as components of applications of geoportal type (i.e. INSPIRE

Geoportal42

). If data producer wants to promote related resources and make them

available in SDI, they need to create metadata according to predefined rules (i.e.

INSPIRE metadata regulation) and publish them using CSW standard. Nevertheless

there is still a lot of data producers, making their resources available without

documenting and publishing the metadata through the CSW interface. The pilot

proposes solution, which aims at design and implementation of a framework allowing

to discover and provide an access to geospatial resources available in the deep web

which is not directly discoverable within existing SDIs (i.e. INSPIRE, GEOSS) as well as

by surface web search engines (i.e. Google). In addition a semantic extension based

on Linked Data principles will be taken into the consideration as the metadata

content collected by this pilot framework will be published using standard RDF

format. This step should close the circle by interconnection across different levels of

the geospatial web and at the same time should lead towards current open data

principles.

42

http://inspire-geoportal.ec.europa.eu



2. Biodiversity MashUp Linked Open Data Extension: Extension of existing Biodiversity MashUp

web 2.0 application43

with web 3.0 semantic linked open data dimension. Initial version of

this MashUp application was result of effort to show possibility to link various biodiversity

related information, where selected birds species were combined with their spatial

distribution and descriptive information from various sources (national spatial data providers

with information from Encyclopedia of Life). Main aim of the pilot is to make current

application content available in RDF encoding allowing linking available content with another

sources in a way to be read and queried by human readers as well as computers.

Geographical and content scope can be extended with the coverage of the countries

represented in the project consortium as well as relevant information resources (PESI (Pan-

European Species Directories Infrastructure)). In addition to this, pilot will contribute into

decision support process via helping citizens and decision makers to investigate, what kind of

biodiversity potential as well as environmental risks can be identified in the area of their

interest.

The Slovak pilot defined next concrete use cases:

• Creation of biodiversity and environmental risk related spatial linked data resources -

Use case is addressing the identification of essential procedural steps leading to

generation of spatial linked resources from the biodiversity and environmental risk

domain from different public sector bodies.

• Discovery of available spatial information resources on the Geospatial Web -

Objective is to collect data access service’s interface endpoints for spatial

information resources available within the layers of geospatial web as spatial data

infrastructures (available catalogues - CSW), the mainstream web (OGC WxS available

in the mainstream and not registered at any CSW – identification based on the OGC

WxS discovered URL = linkage/URL CSW ISO APP) represented by search engines

Basic metadata extracted will be provided to facilitate the discovery of available

resources through standardized human and machine-readable interfaces.

• View/Display of available spatial information resources aiming to support the

visualization of discovered and other available spatial data resources. Web based

application of geoportal type provides unique GUI to discover and view spatial

information from resources progressively predefined by users.

• Biodiversity & contamination in my neighborhood addressing possibilities to link

relevant biodiversity spatial and non-spatial linked data resources and their reuse via

mash up in the area of decision supporting for citizens as well as local governance

related activities.

The generic LOD functionalities required by Slovak pilots are:

• Transformation (GML -> RDF, relational data -> RDF, GMD44

-> RDF , or in another

words INSPIRE MD -> DCAT as described above)

• Storage

• Search




43

geop.sazp.sk:8080/geoserver-old/www/ps_mashup/index.html 44 http://www.isotc211.org/schemas/2005/gmd/



5.3 Cross-pilot functional requirements

5.3.1 Plan4business

The plan4business development team has designed and implemented the first operational

implementation of the plan4business service platform. There will be two main

implementations of the system. One is for non-expert users enabling browsing the content,

viewing the thematic compositions and predefined analysis and core pan-European datasets

such as Urban Atlas and Corine Land Cover. The other implementation is targeted at experts

users who are additionally able to create own map compositions, perform own analysis,

integrate spatial data into the common data model based on INSPIRE, download certain

datasets or use the developed Application Programming Interface (API). The prototypes are

used mainly for testing purposes and no charges will apply. The portal for non-expert users is

publicly available. Your feedback is appreciated.

The Plan4business defined next concrete use cases:

• Publishing of Plan4business Open Data as RDF - Types of data that the

plan4business.eu platform will allow to aggregate include: Urban and regional

planning data from different countries, Land use data including GMES Urban Atlas

data, Open Street Map data as representative of traffic and as a key reference

dataset, Natura 2000 data as information about potential restriction coming from

environment protection, Market information (number of properties, content, etc.

and transactions (sale, rental), their number, content, areas of distribution, price

levels), Social and economic data (CSP, Eurostat data), Individual property data (legal

status, current use, resources, construction data) and cadastral parcels data,

including property types. All data sets are stored n relational database together with

metadata. The objective is to making this data available as LOD

• Provide thematic reports combining Plan4business data with LOD - The Location

Evaluator is an app for user friendly access to data from various sources including

statistical, analytical and cadastral information. User can generate a comprehensive

report about a region, a municipality or a point of interest (currently limited to

buildings) in the Czech Republic through navigation in a map. Currently Location

Evaluator is working only with local data. The objective is to include LOD coming from

pilot as part of the reports.

The generic LOD functionalities required by Plan4business are:


• Search

• Visualization LOD with other data


5.3.2 Tourism

Active tourism (as the contrary of passive tourism) is a special way to spend leisure

time. It is a new life philosophy that combines adventure, sports, experience, discovering,

events, and relations to nature, history, culture, habits or traditions. Active tourism is rapidly

growing in popularity due to unusual experience that is totally different from the typical in

sea resorts. Elements of active tourism (such as excursions or offer of sport activities) are



added to the traditional form of tourism. The new forms of tourism cover for example sport

activities (e.g. rafting), nature tourism (e.g. trekking or hiking), rural tourism, congress

tourism, adventure tourism (e.g. rock climbing) or experience tourism (e.g. mountaineering

expeditions). Data and information represents the keyword of current society as well as

contemporary tourism and tourist industry. Both main subjects of tourist industry

(participants and providers) deal with data and information and need them mainly for

communication in each group and also between both groups of tourism subjects. Data and

information mean a huge number of various items related to selection of destination or offer

of services of tourist industry. Data and information do not mean just spatial data sets,

maps, web cameras, handouts or catalogues, but also personal information such as

recommendations, comments on social media channels, published private photos or stories.

Existing solutions for tourist industry based on information technologies (IT) are focused

mainly on one component of information such as global information, local or regional data

or social media and crowd-sourcing

The Tourism defined next concrete use cases:

• Discovery relevant data sources in relation to user position – the goal is to build

catalogue of existing tourist related LOD, which will be possible to include into

analysis on the base of user preferences

• Provide federated spatial search and visualization of LOD – for example provide

search in buffer around planed road trough distributed local repositories and

visualize data in list and also inside of the map

The generic LOD functionalities required by Tourism are:

• Search


• Visualization LOD with other data




6666 Information viewpoint

The Information Viewpoint describes the way that the SmartOpenData stores, collects,

updates, manipulates, manages, and distributes information. The major issues which are

analyzed and assessed are:

� Basic data types used in SmartOpenData

� Ontologies and vocabularies

� Registers and Registries

� Tasks related to SmartOpenData data

� Information structure and content with a clear focus on the metadata and data

models;

6.1 Local data

Local data is the data that is stored in End-user’s domain, in different formats. This data

must be converted a set of structured to be further processed by the tools present on the

client side and server data.

Within the local data types include:

6.1.1 GIS Files

A GIS file format is a standard of encoding geographical information into a file. They are

created mainly by government mapping agencies (such as the USGS or National Geospatial-

Intelligence Agency) or by GIS software developers.

Raster

A raster data type is, in essence, any type of digital image represented by reducible and

enlargeable grids. Anyone who is familiar with digital photography will recognize the Raster

graphics pixel as the smallest individual grid unit building block of an image, usually not

readily identified as an artifact shape until an image is produced on a very large scale.

Raster data is stored in various formats; from a standard file-based structure of TIF, JPEG,

etc. to binary large object (BLOB) data stored directly in a relational database management

system (RDBMS) similar to other vector-based feature classes. Database storage, when

properly indexed, typically allows for quicker retrieval of the raster data but can require

storage of millions of significantly sized records.

The most popular Raster file formats are:

Esri grid - proprietary binary and metadataless ASCII raster formats used by Esri

GeoTIFF - TIFF variant enriched with GIS relevant metadata

JPEG2000 - Open source raster format. A compressed format, allows both lossy and lossless

compression.

Vector



In a GIS, geographical features are often expressed as vectors, by considering those features

as geometrical shapes. Different geographical features are expressed by different types of

geometry: Points, Lines and Polygons.

Each of these geometries is linked to a row in a database that describes their attributes. For

example, a database that describes lakes may contain a lake's depth, water quality, pollution

level. This information can be used to make a map to describe a particular attribute of the

dataset. For example, lakes could be coloured depending on level of pollution.

The most popular Vector file formats are:

Geography Markup Language (GML) - XML based open standard (by OpenGIS) for GIS data

exchange

GeoJSON - a lightweight format based on JSON, used by many open source GIS packages

Keyhole Markup Language (KML) - XML based open standard (by OpenGIS) for GIS data

exchange

Shapefile - open, hybrid vector data format using SHP, SHX and DBF files (by ESRI)

6.1.2 General purpose files

Other formats that must be considered are the ones used to store general purpose

information. This information can be related to GIS or to other scenarios such as legislation,

tourism, finance, etc.

We divide this section between open formats and proprietary formats

6.1.2.1 Open Formats

The analysed open formats are: XML, JSON, YAML, and CSV

XML is primarily an extensible mark-up language. It can store general data structures as well.

Language support for IDs means that complex graphs can be created, although it's best used

for trees. A document can be tested for correctness against a specification in XML Schema.

The main problem with this format is its extreme verbosity.

JSON is primarily a way to store simple object trees. There is no support for general graphs.

JSON has no concept of type beyond primitives string, integer, float, boolean, null and the

collection types array and object.

YAML is an extension of JSON. Has a notion of aliases that allow object graphs of arbitrary

complexity to be created. Has a concept of metadata like tags that can be used for proper

typing.

CSV format stores tabular data (numbers and text) in plain-text form. Plain text means that

the file is a sequence of characters, with no data that has to be interpreted instead, as binary

numbers. A CSV file consists of any number of records, separated by line breaks of some

kind; each record consists of fields, separated by some other character or string, most

commonly a literal comma or tab. Usually, all records have an identical sequence of fields.



6.1.2.2 Proprietary formats

A proprietary format is a file format of a company, organization, or individual that contains

data that is ordered and stored according to a particular encoding-scheme, designed by the

company or organization to be secret, such that the decoding and interpretation of this

stored data is only easily accomplished with particular software or hardware that the

company itself has developed. The specification of the data encoding format is not released,

or underlies non-disclosure agreements.

The analysed proprietary formats are: PDF and MS-Office formats (DOC, XLS and PPT):

PDF – Adobe's Portable Document Format (open since 2008 - ISO 32000-1), but there are still

some technologies indispensable for the application of ISO 32000-1 that are defined only by

Adobe and remain proprietary

DOC – Microsoft Word Document. Microsoft released a .DOC format specification under the

Microsoft Open Specification Promise. However, this specification does not describe all of

the features used by DOC format and reverse engineered work remains necessary. Since

2008 the specification was changed several times, the last change was made in November

2013.

XLS – Microsoft Excel spreadsheet file format

PPT – Microsoft PowerPoint Presentation file format. In Microsoft Office 2007 the binary file

formats were replaced as the default format by the new XML based Office Open XML

formats, which are published as an open standard. Nevertheless, they are not complete as

there are binary blobs inside of the XML files, and several pieces of behavior are not

specified but refer to the observed behavior of specific versions of Microsoft product.

6.2 Basic data types used in SmartOpenData

Data in SmartOpenData architecture are divided according to their origin and their usability. Basic types of data can be found in next table

Pilot Dataset(s) Format(s)

Ireland ETIS Dataset Excel / ESRI Shape

BFCP Operational Dataset CSV / ESRI Shape

National Monuments Service dataset CSV / ESRI Shape

ALL-ISLAND RESEARCH OBSERVATORY ESRI Shape

National biodiversity Data Centre ESRI Shape

Irish National Parks and Wildlife Services ESRI Shape

Spain SIOSE PostGIS

SIGPAC SQLite

REGA ESRI Shape

Silvadat Access

Admission units ESRI Shape

Nature DataBank (Species, soil characteristics, physical characteristics: humidity, temperature…) ESRI Shape

Czech NFI Results Various Format:XLS, pdf

Tabular Data Various Format

GIS Data INSPIRE Compliant

Near Infrared Ortophotos ECW

nDSM - The Normalised Digital Surface Model .tif

RPFD – Regional Plans of the Forest Development SQL Server Spatial / ERSI Shape

Forest and Other land classes .tif

Slovakia Protected sites GML

Sample dataset of habitats and biotopes GML

Sample dataset of species distribution GML

Environmental burdens in Slovakia WFS, REST API, SOAP API



External Sparql (GeoSparql) endpoints RDF

External OGC WxS services endpoints

Based on OGC WxS service type (e.g. GeoTIFF, PNG,

KML…)

GeoSpecies Knowledge Base Rdf, owl

Other available endpoints THREDDS Catalogs, Z39.50, ESRI GeoPortal REST interface

GeoNames XML JSON RDF CSV TXT RSS KML

PESI Pan-European Species directories Infrastructure SOAP/WSDL

Metadata and sample datasets from LTER CSW, SOS, RDF

Portugal CORINE Land Cover PostGIS

Ortophotos PostGIS

Kyoto Landuse PostGIS

Administrative Units CAOP PostGIS

Land Use Monitorin - LUM PostGIS

LUM_Place PostGIS

Watershed boundaries PostGIS

Digital Elevation Model PostGIS

Drainage Network PostGIS

Forest Fires PostGIS

Environmental Data PostGIS

Hydrological monitoring PostGIS

Water quality monitoring PostGIS

Italy Physic / Chemical parameters measures Excel / SQL (LIMS)

Plan4business CORINE Land Cover PostGIS

Spatial plans PostGIS

Urban Atlas PostGIS

Czech Cadaster PostGIS

RUIAN PostGIS

Nature 2000 PostGIS



Eurostat data PostGIS

Czech Statistic Office data PostGIS

German Statistical Office Data PostGIS

OSM PostGIS

Polish Statistical Office data PostGIS

Flood risk maps of Czech Republic PostGIS

Tourism OSM PostGIS

Table 1 Pilot Data source

6.3 RDF Data integration

Data integration is one of the main problems in distributed data sources. The most used

approach is to provide an integrated mediated schema for various data sources.

There are two basic approaches to the data integration problem, called procedural and

declarative.

The procedural approach is used in the Local data subsystem. In this approach data are

integrated in an ad-hoc manner with respect to a set of predefined information needs. In

this case, the basic issue is to design suitable software convertors that access the sources in

order to fulfill the predefined information requirements.. They do not require an explicit

notion of integrated data schema, and rely on two kinds of software components: wrappers

that encapsulate sources, converting the underlying data objects to a common data model,

and mediators that obtain information from one or more wrappers or other mediators,

refine this information by integrating and resolving conflicts among the pieces of

information from the different sources, and provide the resulting information either to the

user or to other mediators. The basic idea is to have one mediator for every query pattern

required by the user, and generally there is no constraint on the consistency of the results of

different mediators.

The declarative approach is used in the server side top model the data at the local and

external sources by means of a suitable language, to construct a unified representation, to

refer to such a representation when querying the global information system, and to derive

the query answers by means of suitable mechanisms accessing the sources and/or the

materialized views. The declarative approach provides a crucial advantage over the

procedural one: although building a unified representation may be costly, it allows

maintaining a consistent global view of the information sources, which represents a reusable

component of the information integration systems.

The RDF data integration functionality can be implemented from a computational view by

using different technologies:

� RDF can be is used for integrated schema descriptions as well as providing a unified

view of data.

� Description Logic (DL)45

[ref] can be used to find any contradiction in the integrated

schema (satisfiability of concepts in DL terms). It has well-defined semantic and

decidable routines for basic services like satisfiability, which makes it suitable for

knowledge representation and reasoning in this domain.

� RDQL46

or SPARQL can be used for re-formulation of queries. It is a query language

for RDF in Jena47

and provides a data-oriented query model. Jena provides necessary

APIs to work on RDF data and execute queries.

45

Franz Baader, Werner Nutt, Basic Description Logics, Description Logic Handbook, pp. 47-100 46

RDQL - A Query Language for RDF: http://www.w3.org/Submission/2004/SUBM-RDQL-20040109/ 47

Apache Jena Homepage: https://jena.apache.org/



6.4 Ontologies and vocabularies

In addition to the references identified in the Deliverable D3.1 and due the complexity of

problem, project SmartOpenData has to recognized and defined set of vocabularies and

ontologies, which will be used for linkage of data. This ontologies and vocabularies have to

cover set of areas:

• Agriculture – Agriculture and Forestry Ontology (AFO), AGRIFOREST Thesaurus,

AGROVOC Multilingual agricultural thesaurus, Network of Fisheries Ontologies

• Nature or environment – AGRIFOREST Thesaurus, Agro forestry Database, AGROVOC

Multilingual agricultural thesaurus, GEMET, CSA-NBII Biocomplexity Thesaurus,

Environmental Dictionary (EnDic), International Standard Statistical Classification of

Aquatic Animals and Plants, TaxonMap Ontology, LTER-Europe EnvEurope

Thesaurus48

, Linked Science49

• Forestry – Agriculture and Forestry Ontology (AFO), AGRIFOREST Thesaurus, Agro

forestry Database, AGROVOC Multilingual agricultural thesaurus, Forestry Terms,

Global Forest Decimal Classification, Multilingual Glossary of Forest Genetic

Resources

• Water management – Environmental Dictionary (EnDic), International Standard

Statistical Classification of Aquatic Animals and Plants, Les Mots de l’ Eau, SEMIDE

thesaurus

• Planning – AGROVOC Multilingual agricultural thesaurus, GeoNames, Freebase,

Dbpedia, Land cover and land use Terminology for integrated resources planning and

management, The Statistical Core Vocabulary, The Places Ontology, Ordnance Survey

Topography Ontology, Ontologie démographique de l'INSEE

• Tourism – Ontología de Turismo para el Ayuntamiento de Zaragoza, CSA-NBII

Biocomplexity Thesaurus (for biotourism), Theatre Ontology, Wine Ontology, Whisky

Ontology

• Species - International Standard Statistical Classification of Aquatic Animals and

Plants, AGROVOC, A classification of the bird species of South America, Agroforestree

Database, Animal Diversity Web - Kingdom Animalia Classification, Catalogue of Life

Annual Check List, Encyclopedia of Life, Fishbase, Grassland Index, GRIN Taxonomy

for Plants, IUCN Red List of Threatened Species, Listing of Useful Plants of the World,

Species Fungorum

• Administrative division - FAO Geopolitical Ontology, Eurostat - Linked Data

6.5 Registers and Registries

Potential for easier access and exchange of digital spatial content and functionality is

significantly rising with making existing underlying models and ontologies available through

the standardised registers or registries50

. Register provides an environment for management

48

http://vocabs.lter-europe.net/EnvThes3.html 49

http://linkedscience.org/vocabularies/ 50

ISO 19135:2005 Geographic information – Procedures for item registration



of the items (containing at least persistent identifiers and their definition) often reused in

various applications and domains. Registries are representing information systems

maintaining one or more registers. Demand for availability of this type of registers is also

increasing in line with initiatives focused on standardization of spatial data and services

sharing with Registry services support (e.g. INSPIRE51

, UKGovLD Registry52

), ensuring high

level of interoperability53

.

SmartOpenData architecture will therefore reuse relevant registers and registry services in

order to provide the evidence about the feasibility and importance of these interoperability

components.

For geospatial data description and representation can be used several vocabularies (see

more in D3.1 Review of geographic resources metadata and related metadata standards):

• GeoSPARQL - GeoSPARQL is a standard for representation and querying of geospatial

linked data for the Semantic Web from the Open Geospatial Consortium. It attempts

to solve the problems with the heterogeneous and incompatible implementations for

representing and querying spatial data. It achieves this by defining an ontology that

follows the existing standards from the OGC with regard to spatial indexing in

relational databases. The GeoSPARQL specification consists of three components –

the definition of vocabulary to represent features, geometries and the relationships

between them; a set of spatial functions for use in SPARQL queries; and a set of

query transformation rules. ( )

• NeoGEO - The NeoGeo Vocabulary project make efforts to develop a common

vocabulary for the representation of geodata. The NeoGeo Vocabulary is based on

the GML Simple Features Profile54. Simple geometries are described explicitly in RDF.

By the aggregation of simple geometries arise more complex geometries. It allows

reasoning and querying on these geometries.

• W3C Basic Geo - The W3C Basic Geo Vocabulary is a RDF vocabulary for representing

latitude/longitude position of the data geographical location, using WGS84 as a

reference datum. It is used within RDF documents, as well as a namespace within

non-RDF XML documents, for instance RSS 2.0 or Atom. There are many publishers,

applications and services using the format, such as DBpedia.

• Location Core Vocabulary - The ISA Programme Location Core Vocabulary is a

simplified, reusable and extensible data model that offers a minimum set of classes

and properties for describing any place by its name, address or geometry. The

vocabulary is primarily designed to aid the publication of data that is interoperable

with the EU INSPIRE Directive. The vocabulary is integrated with the Business and

Person Core Vocabularies of the EU ISA Programme.

51

http://inspire.ec.europa.eu/registry/ 52

https://github.com/UKGovLD/ukl-registry-poc/wiki 53

http://www.w3.org/2014/03/lgd/papers/lgd14_submission_60



6.6 Information structure and content with a clear focus on the metadata and data models;

This part is task of future deliverables D3.2 Initial SmartOpenData data model and D3.3

Harmonisation of data to SmartOpenData model, where relevant ongoing projects and

initiatives will be taken into the consideration (e.g. Linking Geospatial Data Workshop

outcomes, or ARE3NA RDF and PIDs for Location framework

55)

6.7 Tasks related to SmartOpenData data

The tasks related to SmartOpendata were identified in previous chapter and it include;

• Transformation (GML -> RDF, relational data -> RDF, GMD -> RDF)


• Storage

• Search




As additional task, related to SmartOpenData was the publishing data on the Web as they

are.

6.8 Data flows

Next graph demonstrate SmartOpenData data flow

Figure 7 SmartOpenData data flow

55 https://joinup.ec.europa.eu/community/are3na/event/rdf-and-pids-location-preliminary-results



7777 Computational viewpoint

The computational viewpoint is functional decomposition of the system into basic objects

that interact at interfaces. It describes the functionality provided by the system and its

functional decomposition. This viewpoint focuses on the components of the system, not

considering distribution aspects, which are managed within the Engineering and Technology

viewpoints.

The computational viewpoint must conform to the policies of the enterprise viewpoint.

Some roles, activities and behaviors identified in the enterprise viewpoint will be

computational objects, and are describe in this section. Some of these may have definitions

visible in the information viewpoint.

The basic components recognized in previous chapters are:

7.1 Local data

� Data in proprietary formats (XSL, shapefile…)

� Data in open format (csv, gml)

� Unstructured data (plain text, html pages, pdf, social networks

content) - Scraping

� Structured data (xml, geojson, georss)

� Metadata (xml, txt, RDF (DCAT, VoID?))

� Proprietary formats such as Office (doc, xls), OpenOffice etc

7.2 Server side

o Publishing data on the Web as they are.

� Generic convertor for translate local data in database Functional

requirements:

� Mapping from input to target schema/model

� ETL Transformation functionality

� Extraction of non structured data

� Support of existing registries

� Support for local and external meta&data

� Metadata editor

o Relational storage

� Relational database repository (if needed) for classic GIS / text

applications

� Structured data (xml, geojson, georss, sparql, geosparql)

� Harmonised metadata (SDI & LoD compliant)



o Transformation (GML -> RDF, relational data -> RDF, metadata (GMD -> RDF))

o Conversion to RDF (incl.support for SMoD Data model)

� RDFise & Linking

� Persistent URIs & Standard Vocabularies - INSPIRE Registry or MDR

� Some external data may be incorporated into process

� Creating links to other sources

� Functional requirements:

• Mapping from input to target schema/ontology

• Transformation functionality (data/metadata)

• Linkage with local and external meta/data

� Non-Functional requirements:

• Support for persistent identification

• Support of existing registries, ontologies

• Support for new ontologies development and publishing

o Publishing data from existing SQL databases as RDF

� Be able access SQL data as RDF

o RDF Storage

o Triple store repository with structured linked open metadata/(geo)data (xml,

geojson, georss, sparql, geosparql). RDF file to be directly used in next step

Triples can be serialized as RDF/XML, JSON-LD,N-Triples, Notation3, Turtle

� Support for features

� Support for coverages, rasters

� Harmonized metadata (SDI & LoD compliant)

� Cashed Triple store repository

� SPARQL endpoint

� GeoSPARQL endpoint

o Distributed semantic indexing infrastructure

o Administration and notification framework

7.3 Client side

o Supports read and write mode for repositories and (where possible) for

external data and metadata

o Support for Semantic Front-end Facilities

o Distributed Semantic Indexing infrastructure

o Visualization Framework



o Administration and Notification Service

o Allows creation of new links (triples)

o Metadata harvesting

o Process outputs of the processor

o Support for Semantic Front-end Facilities

o Visualization Framework

o Table Tree view

o Analytics UI

o GeoSpargl/Sparql Workbench

o Client displays the results

o GeoJSON / kml for geographic data

o HTML pages

o Linkage with reference/most used open source CMSs

7.4 Non-functional requirements

� Support for persistent identification

� Support of existing registries, ontologies

� Support for new ontologies development and publishing



8888 Engineering viewpoint

The engineering viewpoint focuses on the mechanisms and functions required to support

distributed interactions among objects in the system. It describes the distribution of

processing performed by the system to manage the information and provide the

functionalities. This section defines the SmartOpenData conceptual Architecture.

As we mentioned in introduction, that focus is not to define complex platform, we defined

basics relation of components.

8.1 Generic architecture

Short narrative description of element and groups of elements depicted in schema bellow

Figure 8 Generic architecture



8.2 Server side

The server side offers two functionalities: The Storage or relational data and SPARQL

EndPoint.

8.2.1 Relation database

Relational database will store spatial and non-spatial structured and non-structured data. It

will guarantee access using conventional SDI tools.

8.2.2 Data Integration

It involves practices, architectural techniques and tools for achieving consistent access to

and delivery of data across a wide range of subject areas and structure types in an

enterprise. Data integration capabilities are at the heart of the information-centric

infrastructure and will power the frictionless sharing of data across all organizational and

system boundaries.

Related to data integration, server-side processors provide ETL tools. This ETL tools can be

classified info four categories:

• Pure ETL tools: These tools are independent of the database and the Business

Intelligence tool with which it will be used. The companies do not rely on any other

product for the functionalities offered by them and they also allow migration to

different database without changing the integration process.

• Data base integrated: These tools are supplied as an option integrated into a

database software and some of the functionality is built into the database, not

available separately in the ETLtool itself.

• Business Intelligence Integrated: These are the tools from the same supplier as the BI

software. In many cases these are separate products and the supplier will claim that

they can be used independently of the BI tool.

• Niche Product: These are the tools that don’t fit well into any of the above

mentioned groups, but still have considerable ETL functionality in them.

Also, an extension of ETL tools can be used to manage spatial data. Spatial ETL tools provide

the data processing functionality of traditional Extract, Transform, Load (ETL) software, but

with a primary focus on the ability to manage spatial data (which may also be called

geographic, map or location data).

A Spatial ETL system may translate data directly from one format to another, or via an

intermediate format; the latter being more common when transformation of the data is to

be carried out.

Many existing GIS applications are now incorporating Spatial ETL tools within their products;

the ArcGIS Data Interoperability Extension being a good example of this.



8.2.3 Structured and non-structured data extraction

This functionality refers to the problem of automatically extracting the database values from

the web pages without any learning examples or other similar human input.

Data extraction tools often combine the notion of a template (a model that describes how

values are encoded into pages) and extraction algorithms (that uses sets of words that have

similar occurrence pattern in the input pages, to construct the template). The constructed

template is then used to extract values from the pages.

Most of the related work uses a “wrapper-based” system for extracting data. In a wrapper

based system, extraction is a two step process. In the first step, a wrapper for the given set

of pages is generated. In the second step, the wrapper is used to extract the data from the

web pages. A wrapper is just a program that extracts the data from the set of pages. Note

that a wrapper is specific to the set of pages — the wrapper for the pages of one web site

will be different from the wrapper for the pages of a second web site.

8.2.3.1 Schema matching and mapping

The terms schema matching and mapping are often used interchangeably. For this article,

we differentiate the two as follows: Schema matching is the process of identifying that two

objects are semantically related (scope of this article) while mapping refers to the

transformations between the objects.

Automating these two approaches has been one of the fundamental tasks of data

integration. In general it is not possible to determine fully automatically the different

correspondences between two schemas, primarily because of the differing and often not

explicated or documented semantics of the two schemas.

8.2.3.2 Metadata integration and edition

One of the main elements for the success in the development of SDI or any other type of

information infrastructure is the appropriate annotation of resources to be accessed and

distributed by means of metadata. Metadata constitute the mechanism to characterize data

and services (e.g., descriptions of the content, quality, condition, authorship and any other

features) in order to enable other users and applications to make use of such data and

services.

However, due to the heterogeneity of contents in information infrastructures, it is not

possible to consider a unique metadata model or schema.

The diversity of metadata standards has been a critical issue for the development of SDIs.

During the last fifteen years, standardization bodies have proposed different metadata

standards such as the Content Standard for Digital Geospatial Metadata (CSDGM) [2] or ISO

19115 Geographic Information – Metadata [3]. Additionally, apart from the standards, it is

also common to find application profiles and extensions of these standards.

There are a variety of free and commercial software tools available to support metadata

development. These tools offer a range of features and capabilities:



8.2.4 Converting data to RDF

This function converts application data from an application-specific format into RDF for use

with RDF tools and integration with other data. Converters may be part of a one-time

migration effort, or part of a running system which provides a semantic web view of a given

application.

The converting function is usually done by scrapers. Scraping is a computer software

technique of extracting information from unstructured data. It is a field with active

developments sharing a common goal with the semantic web vision, an ambitious initiative

that still requires breakthroughs in text processing, semantic understanding, artificial

intelligence and human-computer interactions. Scraping, instead, favors practical solutions

based on existing technologies that are often entirely ad hoc.

For example, OpenRefine56

is a convertor that permits loading data, cleaning it up,

reconciling it to master database, and augmenting it with data coming from other RDF or

Web sources.

The convertor modules are located in Figure XX between the Data subsystems and the

server subsystems, as can be used from these two sides.

8.2.5 SPARQL EndPoint

A SPARQL endpoint is a conformant SPARQL protocol service as defined in the SPROT

specification. A SPARQL endpoint enables users (human or other) to query a knowledge base

via the SPARQL language. Results are typically returned in one or more machine-processable

formats. Therefore, a SPARQL endpoint is mostly conceived as a machine-friendly interface

towards a knowledge base. Both the formulation of the queries and the human-readable

presentation of the results should typically be implemented by the calling software, and not

be done manually by human users.

8.2.5.1 RDF Storage

An RDF store allows storage of RDF data and schema information, and provides methods to

access that information. Thus, the two primary components of an RDF store are a repository

and a middleware that builds on top of that repository. The middleware can be further

divided into components as the access methods can be categorized into methods for adding,

deleting, querying and exporting data.

Different repositories are imaginable, e.g. main memory, files or databases, but the access

methods should remain the same. Thus, it is reasonable to encapsulate the access to the

repository in an own layer, which provides well defined interfaces to the upper layers and

can be exchanged if another repository is used. The inference support also resides in this

layer as close to the repository as possible. Sesame57

implements such a layer and calls it the

Storage and Inference Layer (SAIL)

56

OpenRefine Webpage: https://github.com/OpenRefine/OpenRefine

57

Sesame Webpage: http://www.openrdf.org/



For persistent storage the data can be serialized to files, but for large amounts of data the

use of a database management system is more reasonable. Examining currently existing RDF

stores we found that they are using relational and object-relational database management

systems (RDBMS and ORDBMS).

8.2.6 Support for standard vocabularies

In order to expose INSPIRE data and metadata as geospatial Linked Data server-side processors must

provide providing automatic and semi-automatic tools to repurpose/reuse geospatial information

from INSPIRE SDIs which consists of:

� Exposing catalogue services through SPARQL endpoints

� On-the-fly and custom transformations of INSPIRE metadata as VoiD (vocabulary of

Interlinked Documents) with support for standard vocabularies (e.g. GEMET)

� On-the-fly and custom transformations of geospatial data sets as geospatial RDF, based on

standard (e.g. INSPIRE Data Specifications), or custom data schemas.



9999 Technology viewpoint

The Technology Viewpoint describes the technological specifications for the physical

deployment of the system implementation. In particular, it focuses on:

• the choice of technology in the system;

• how specifications are implemented;

• specification of relevant technologies;

• Support for testing.

The Technology Viewpoint will be analyzed deeply in WP5. The current version provides

indication of foreseen implementation for main components of the architecture. This list is

not complete and will be updated during the second phase of design.

9.1 Tools for technical implementations

Low level tools or Libraries oriented to working with DF data. Data serialization formats and

processing available in a number of popular programming environments (source from D3.1).

C# JavaScript Java PHP Python Ruby

RDF/XML dotNetRDF

http://ww

w.dotnetr

df.org/,

SemWebht

tp://razor.

occams.inf

o/code/se

mweb/

rdflib.js

https://githu

b.com/linke

ddata/rdflib.

js

Apache Jena

https://jena.

apache.org/

documentati

on/rdf/

EasyRDF

http://www

.easyrdf.org

/

RDFLib

https://githu

b.com/RDFLi

b/rdflib

Rdf-rdfxml*

http://ruby-

rdf.github.io

/rdf-rdfxml/

N-Triples - Rdflib.js Apache Jena RDFLib rdf-turtle*

https://githu

b.com/ruby-

rdf/rdf-

turtle

TriG - - Apache Jena RDFLib -

RDFa - - Apache Jena RDFLib RDF-rdfa*

http://ruby-

rdf.github.io

/rdf-rdfa/

Notation3 SemWeb Rdflib.js Apache Jena RDFLib Rdf-n3*

http://ruby-



rdf.github.io

/

Turtle - Rdflib.js Apache Jena EasyRDF

RDFLib Rdf-turtle

JSON/JSON-LD json-ld.net

https://git

hub.com/

NuGet/jso

n-ld.net

https://githu

b.com/digita

lbazaar/jsonl

d.js

Apache Jena,

JSONLD-

JAVA

https://githu

b.com/jsonld

-java/jsonld-

java

EasyRDF

php-

json-ld

https://gith

ub.com/digi

talbazaar/p

hp-json-ld

and JsonLD

https://gith

ub.com/lant

haler/JsonL

D

PyLD

https://githu

b.com/digita

lbazaar/pyld

, Fiona

https://pypi.

python.org/

pypi/Fiona

able to add

JSON-LD

context to

GeoJSON

data

JSON-LD

reader/write

r*

https://githu

b.com/ruby-

rdf/json-ld/

rdf/JSON - - Apache Jena EasyRDF

RDFLib -

Microdata - - - - RDFLib rdf-

microdata*

https://githu

b.com/ruby-

rdf/rdf-

microdata

*project is a part of Ruby RDF/Linked Data for Ruby

Table 2 Available components for SmartOpenData

9.2 Spatial data serialization tools

Tools able to serialize data form commonly used GIS formats (e.g. ESRI ShapeFile,

PostgerSQL PostGIS RDBMS, and other)

9.2.1 GeoKnow TripleGeo

One example is Comandline (CLI) application for geospatial data transformation to triples.

The current version of TripleGeo utility can access geometries from:

• ESRI shapefiles, a widely used file-based format for storing geospatial features.

• Geographical data stored in GML (Geography Markup Language) and KML (Keyhole

Markup Language).

• INSPIRE-aligned datasets for seven Data Themes (Annex I) in GML format: Addresses,

Administrative Units, Cadastral Parcels, GeographicalNames, Hydrography, Protected

Sites, and Transport Networks (Roads).



• Spatially-enabled DBMSs: Oracle Spatial, PostGIS, MySQL, and IBM DB2 with Spatial

extender.

Output formats: RDF/XML (default), RDF/XML-ABBREV, N-TRIPLES, N3, and TURTLE (TTL)

Concerning geospatial representations, triples can be exported according to:

- the GeoSPARQL standard for several geometric types (including points, linestrings,

and polygons)

- the WGS84 RDF Geoposition vocabulary for point features

- the Virtuoso RDF vocabulary for point features.

- Resulting triples are written into a local file, so that they can be readily imported into

a triple store.

To run TripleGeo as WEB service for data transformation is available also WEB application

TripleGeo-Service.

9.3 Non-spatial data serialization tools

The complete list of available tools for RDBMS data transformation to RDF you can visit

http://www.w3.org/TR/rdb2rdf-implementations/-

9.3.1 D2RQ

The D2RQ Platform is a system for accessing relational databases as virtual, read-only RDF

graphs. It offers RDF-based access to the content of relational databases without having to

replicate it into an RDF store. Using D2RQ you can:

o query a non-RDF database using SPARQL

o access the content of the database as Linked Data over the Web

o create custom dumps of the database in RDF formats for loading into an RDF store

o access information in a non-RDF database using the Apache Jena API

9.3.2 db2triples

As an example, it could be mentioned Antidot as implementations of R2RML and Direct

Mapping specifications.

9.3.3 TripleStores and SPARQL/GeoSPAQL endpoints

There are (almost) no complete implementations of GeoSPARQL. There are few partial or

vendor implementations of GeoSPARQL. Currently there are the following implementations:

9.3.4 Parliament

Parliament has an almost complete implementation of GeoSPARQL by using JENA and a

modified ARQ query processor.



9.3.5 Strabon

Strabon is a semantic spatiotemporal RDF store. You can use it to store linked geospatial

data that changes over time and pose queries using two popular extensions of SPARQL.

Strabon supports spatial datatypes enabling the serialization of geometric objects in OGC

standards WKT and GML. It also offers spatial and temporal selections, spatial and temporal

joins, a rich set of spatial

functions similar to those offered by geospatial relational database systems and support for

multiple Coordinate Reference Systems. Strabon can be used to model temporal domains

and concepts such as events, facts that change over time etc. through its support for valid

time of triples, and a rich set of temporal functions. Strabon is built by extending the well-

known RDF store Sesame and extends Sesame’s components to manage thematic, spatial

and temporal data that is stored in the backend RDBMS.

Strabon also supports the querying of static geospatial data expressed in RDF using a subset

of the recent OGC standard GeoSPARQL which consists of the core, geometry extension and

geometry topology extension. The implementation of the other components of GeoSPARQL

is underway.

9.3.6 OpenSahara uSeekM IndexingSail Sesame Sail plugin

uSeekM IndexingSail uses a PostGIS installation in a backend to deliver GeoSPARQL. They

deliver an almost complete implementation of GeoSPARQL along with some of its own

vendor prefixes.

9.3.7 Openlink Virtuoso Universal Server

Since version 7.1 OpenLink Virtuoso also contains some geospatial functions and reasoning

based on vendor prefixes, although not compatible with GeoSPARQL it is usable for

geospatial.

9.3.8 Ontotext OWLIM

OWLIM has a partial geospatial implementation based on vendor prefixes, also not

compatible with GeoSPARQL, but is sufficiënt for basic usage. Ontotext states that OWLIM

will support GeoSPARQL in the near feature with version 5.7 (currently 5.4). But at the time

of writing it seems that they are currently two versions behind schedule.

9.3.9 SPARQ ED

SPARQL Editoris as an open source project. SparQLed also is one of the components of the

commercial Sindice Suite for helping large enterprises build private linked data clouds. It is

designed to give users all the help they need to write SPARQL queries to extract information

from interconnected datasets.

With SQL, the advantage lies in having a schema which users can look at and understand

how to write a query. RDF, on the other hand, has the advantage of providing great power

and freedom, because information in RDF can be interconnected freely. The comprehensive

SparQLed environment provides fully assisted SPARQL query editing that includes full



syntactic assistance with syntax highlighting, auto completion and data-driven assistance.

Analytics technology is behind the suggestions it provides on writing queries.

9.4.10 SIRENDB

SIREn is a schemaless structured document search system that combines free text search

with structured search over arbitrary json data. The system is a extension of Solr, the world’s

leading search engine technology. Imagine. SirenDB is SindiceTech Solr plugin for rich

structured data and arguably the most advanced structured data search engine available.

9.3.11 Sefarad-Faceted Search

Sefarad is a web application whose purpose is providing a semantic front end to Linked Open

Data (LOD) datasets. Sefarad allows to configure a dashboard to visualize different

perspectives of such datasets. Two predefined screens have been defined: (i) search, where

semantic faceted search can be carried out and the results are shown in the widgets and (ii)

control panel, where statistics such as Key Performance Indicator (KPI) about the dataset are

visualized.

9.4 High-level technical specification – generic level

High-level technical specification extended with concrete common tools to be used in all

pilots. This specification serves as underlying framework indicating common and specific

tools used across the pilots.



Figure 9 High-level technical specification – generic level

The set of evaluable components for every layer is on next image



Figure 10 Component diagram of SmartOpenData

9.5 Examples for architecture implementation of the pilots

On the base of generic models, pilot specifications are prepared. The work on this concrete

pilot specification will continue during WP5. Currently are ready specification for Czech,

Slovak, Ireland, Spain pilot and tourist pilot



9.5.1 Czech pilot

Figure 11 Czech Pilot Architecture

9.5.2 Slovakian pilot



Figure 12 Slovaks pilot architecture

9.5.2 Irish pilot

Irish pilot will be modification of generic scheme. The Irish Pilot will require from the

SmartOpenData platform (as described in D2.2, in alphabetical order) the following:

1. Scalable crowdsourced/VGI real-time data collection with an Open API.

2. Scalable Excel/CSV to RDF transformation tool(s), preferably as a CKAN plugin, in line

with W3C “CSV on the web” WG (See D2.1) 3. Scalable general GI to LOD

transformation, harmonisation and semantic indexing infrastructure service, with

persistent URIs, preferably using the INSPIRE Registry (see D2.1).

3. Scalable INSPIRE GI schema to LOD transformation and harmonisation service, with

persistent URIs, in particular INSPIRE Annex I theme “Protected Sites” (similar to

GeoKnow D2.7.1).

4. Scalable RDF Triple Storage service for the LOD (such as Virtuoso , see D2.1)

5. Visualisation framework (of GI and non-GI components)



Figure 13 Irish pilot

9.5.3 Spanish pilot

o Agroforestry Management

Figure 14 Spanish pilot: Agroforestry Management



o Water and Drinking water Management

Figure 15 Portuguese-Spanish pilot: Water Management

9.5.4 Italian pilot

Figure 16 Italian pilot



9.5.5 Tourist cross border pilot

Figure 17 Tourist Linked Open Data

9.6 Relation with semantic indexing

This part is developed in WP4 and will offer search trough all SmartOpenData pilots. It will

include.

• Distributed Semantic Indexing is composed by:

– ETL/Indexing Pipeline

– Entity Extraction Pipeline

– Semantic Server

– Siren DB

• Distributed Data Access is composed y:

– SPARQL-ED

– Sefarad-Faceted Search

• Administration and Notification is composed by:

– Notification

– Administration

The basic schemes and linkage with pilots implementation is on next scheme



Figure 17 WP4 implementation and linkage with pilots



10101010 Conclusions and Recommendations

This deliverable defines a reference infrastructure model and high-level technical

specification for SmartOpenData, including its main components and connection points to

other tools and systems. The RM-ODP (Reference Model for Open Distributed Processes)

methodology was used to define the SmartOpendData reference architecture that meets the

technical and user requirements established throughout T2.1 and T2.2. These address

interoperability and multilingualism aspects, metrics engine and interfaces. The reference

architecture defines both, platform neutral components and also provides suggestions for

concrete implementation. The aim was not to design a monolithic solution for all Link Open

Data, but to define basic architecture components of the Link Open Data chain and potential

solutions for solving the concrete problems of the 5 pilots in WP5.

10.1 Linkage with main technical components of SmartOpenData

Additional requirements came from analysis of cross links between the individual pilots and

also from project partners, SMEs and potential users of Linked Open Data.

The RM-ODP divides all processes of architecture design into five generic and

complementary “viewpoints” (Enterprise, Information, Computational, Engineering and

Technical) of the system and its environment. The conclusions of each viewpoint for

SmartOpenData were as follows:

1. Enterprise viewpoint - is focused on the analysis of pilot scenarios and the definition of a

limited numbers of generic use cases, which are implemented to support basis

functionalities required by more scenarios, but also supporting the process of data and

metadata harmonisation based on outputs from WP3. This Viewpoint concluded that the

SmartOpenData LOD functionalities required by the Pilots and cross functional themes

are as follows:

Pilots Cross functional Required generic LOD

functionalities ES IE IT CZ SK Business Tourism

Transformation (Relational -> RDF) X X X X X

Transformation (GML -> RDF) X

Transformation (GMD -> RDF) X

Storage X X X

Search X X X X X

Federated querying X X X X

Visualization of LOD with other data X X X X X

Visualization of LOD using

conventional GI tools X X X X X

Upgrading X

Publishing data from existing SQL

databases as RDF X X X

Publishing data from existing CSV

databases as RDF X

Table 3 SmartOpenData LOD functionalities



2. The Information viewpoint describes the way that SmartOpenData stores, collects,

updates, manipulates, manages, and distributes information. The major issues which

were analysed and assessed are:

� Basic data types used in SmartOpenData

� Ontologies and vocabularies

� Registers and Registries

� Tasks related to SmartOpenData data

� Information structure and content with a clear focus on the metadata and

data models.

Alignment of INSPIRE metadata with the DCAT-AP is under preparation by the Joint

Research Centre of the European Commission, in the framework of Action 1.17 of the EU

ISA Programme. The profile uses the INSPIRE registry58

together with DCAT defined

classes for the mapping. This profile is very important because it mediates the bridge

between the INSPIRE and other European portals. It is intended to implement this profile

and test its usability and stability in the scope of SmartOpenData project.

3. The Computational viewpoint is focused on generic components, which could be reused

for more scenarios and which will be some basic parts of the infrastructure. Various basic

components were identified in various areas of the SmartOpenData system, including:

• Local data

• Server side

o Publishing data on the Web as they are.

o Relational storage

o Transformation (GML -> RDF, relational data -> RDF, metadata (GMD -> RDF)))

o Conversion to RDF (incl.support for SMoD Data model)

o Publishing data from existing SQL databases as RDF

o RDF Storage

• Client side

4. The Engineering viewpoint focused on the mechanisms and functions required to

support distributed interactions among objects in the system. It defines the

SmartOpenData conceptual Architecture to address the distribution of processing

performed by the system to manage the information and provide the functionalities.

5. The TechnologyTechnical viewpoint describes the technological specifications for the

physical deployment of the system implementation, including:

• the choice of technology in the system;

• how specifications are implemented;

• specification of relevant technologies;

• support for testing.

Many potential tools and solutions to address the requirements of the SmartOpenData

infrastrucutres were identified for reuse from projects such as HABITATS, Plan4Business,

GeoKnow, LOD2 and SemGrow, along with use of the DCAT-AP and CKAN metadata profiles.

The HABITATS RL and Plan4Business solutions are from finished projects and represent a

58 http://inspire.ec.europa.eu/registry



traditional approach to GI information, but will be integrated with reuse of the LOD tools

from the other projects to provide implementation of the SmartOpenData system.

The Technology Viewpoint will be analyzed in more depth in WP3 and WP4. The current

version only gives initial ideas for implementation of some components of the architecture.

This list is not complete and will be updated during the second phase of design.



Annex A:

AJAX - Asynchronous JavaScript and XML

API - Application Programme Interface

CC - Creative Commons

CMS - Content Management System

Copernicus - the European Earth Observation Programme, used to be known as GMES

CRS - Coordinate Reference System

CSS - Cascading Style Sheets

CSW - Catalogue Service for the Web

DBMS - DataBase Management System

DCAT-AP - Data Catalogue vocabulary Application Profile

DOM - Document Object Model

DOW - SmartOpenData Description of Work, Annex I to the Grant Agreement.

EC – European Commission

ECMS - Enterprise Content Management System

EU – European Union

FI - Future Internet

FLOSS - Free/Libre and Open Source Software

GA - SmartOpenData Grant Agreement.

GeoJSON - Geographic JavaScript Object Notation

GEOSS - Global Earth Observation System of Systems

GI - Geospatial/Geographic Information

GIS - Geographic/Geospatial Information Systems

GLOD - Geospatial Linked Open Data.

GMES - Global Monitoring for Environment and Security – now known as Copernicus

GML – Geography Markup Language

GPS - Global Positioning System

HTML - Hypertext Markup Language

IDE - Integrated Development Environment

INSPIRE – INfrastructure for SPatial InfoRmation in Europe

IOT - Internet of Things

IPR - Intellectual Property Rights



ISO – International Organisation for Standardisation

ISO 19115 – ISO 19115:2003-Geographic Information Metadata

ISO 19118 – ISO 19118 Geographic Information-Encoding

ISO 19139 – ISO/TS 19139-Geographic Information-Metadata -XML schema implementation

JSON - JavaScript Object Notation

KML – Keyhole Markup Language

LBS - Location-based Services

LOD - Linked Open Data

LOGD – Linked Open Government Data

LR - Language Resources

MDA - Model Driven Architecture

MDR - Metadata Registry of the Publications Office of the EU.

MT - Machine Translation

MVC - Model View Controller

MVP - Model View Presenter

NGO - Non-Governmental Organisation

NLP – Natural Language Processing

ODC - Open Data Commons

OGC - Open Geospatial Consortium

OSM - OpenStreetMap

OWL - Web Ontology Language

PM - Person Month

PPP - Public-Private-Partnership

QoS - Quality of Service

RDBMS - Relational DataBase Management System

RDF - Resource Description Framework

RDFS - Resource Description Framework Schema

REST - Representational State Transfer

RL - Reference Laboratory

RM-ODP - Reference Model for Open Distributed Processes

ROI - Return on Investment

RSS - Rich Site Summary (originally RDF Site Summary)

SaaS - Software as a Service

SDI – Spatial Data Infrastructure



SEIS – Shared Environmental Information System

SGML - Standard Generalized Markup Language

SKOS - W3C Simple Knowledge Organization System

SLA – Service Level Agreement

SME - Small to Medium Enterprise59

SmOD - SmartOpenData

SMT - Statistical MT

SOA – Service Oriented Architecture

SOAP - Simple Object Access Protocol

SPARQL - SPARQL Protocol and RDF Query Language

SQL - Structured Query Language

SQL/MM - SQL Multimedia and Application Packages (as defined by ISO 13249

SRID - Spatial Reference system IDentifier

SRS - Spatial Reference System

SSRI - Social Spaces for Research and Innovation

SVG - Scalable Vector Graphics

UI - User Interface

UML – Unified Modelling Language

URI – Uniform Resource Identifier

URL - Uniform Resource Locator

URM – Uniform Resource Management

W3C - World Wide Web Consortium

WCS – Web Coverage Map

WFS – Web Feature Map

WMC – Web Map Context

WMS – Web Service Map

WPS – Web Processing Services

WWW - World Wide Web

XHTML - eXtensible HyperText Markup Language

XML - eXtensible Markup Language

59

Defined by the European Commission at http://ec.europa.eu/enterprise/policies/sme/facts-figures-

analysis/sme-definition/

Architecture of SmartOpenData infrastructuresmartopendata.eu/sites/default/files/SmartOpenData...

Documents

Transcript of Architecture of SmartOpenData infrastructuresmartopendata.eu/sites/default/files/SmartOpenData...