Making the Web of Data Available Via Web Feature...

21
Making the Web of Data Available Via Web Feature Services Jim Jones, Werner Kuhn, Carsten Keßler and Simon Scheider Abstract Interoperability is the main challenge on the way to efficiently find and access spatial data on the web. Significant contributions regarding interoperability have been made by the Open Geospatial Consortium (OGC), where web service standards to publish and download spatial data have been established. The OGCs GeoSPARQL specification targets spatial data on the Web as Linked Open Data (LOD) by providing a comprehensive vocabulary for annotation and querying. While OGC web service standards are widely implemented in Geographic Information Systems (GIS) and offer a seamless service infrastructure, the LOD approach offers structured techniques to interlink and semantically describe spatial information. It is currently not possible to use LOD as a data source for OGC web services. In this chapter we make a suggestion for technically linking OGC web services and LOD as a data source, and we explore and discuss its benefits. We describe and test an adapter that enables access to geographic LOD datasets from within OGC Web Feature Service (WFS), enabling most current GIS to access the Web of Data. We discuss performance tests by comparing the proposed adapter to a reference WFS implementation. J. Jones · S. Scheider Institute for Geoinformatics, University of Münster, Münster, Germany e-mail: [email protected] S. Scheider e-mail: [email protected] W. Kuhn Center for Spatial Studies University of California, Santa Barbara, USA e-mail: [email protected] C. Keßler (B ) CARSI, Department of Geography, Hunter College, City University of New York, New York, USA e-mail: [email protected] J. Huerta et al. (eds.), Connecting a Digital Europe Through Location and Place, 341 Lecture Notes in Geoinformation and Cartography, DOI: 10.1007/978-3-319-03611-3_20, © Springer International Publishing Switzerland 2014

Transcript of Making the Web of Data Available Via Web Feature...

Making the Web of Data Available Via WebFeature Services

Jim Jones, Werner Kuhn, Carsten Keßler and Simon Scheider

Abstract Interoperability is the main challenge on the way to efficiently find andaccess spatial data on the web. Significant contributions regarding interoperabilityhave been made by the Open Geospatial Consortium (OGC), where web servicestandards to publish and download spatial data have been established. The OGCsGeoSPARQL specification targets spatial data on the Web as Linked Open Data(LOD) by providing a comprehensive vocabulary for annotation and querying.WhileOGC web service standards are widely implemented in Geographic InformationSystems (GIS) and offer a seamless service infrastructure, the LOD approach offersstructured techniques to interlink and semantically describe spatial information. Itis currently not possible to use LOD as a data source for OGC web services. Inthis chapter we make a suggestion for technically linking OGC web services andLOD as a data source, and we explore and discuss its benefits. We describe and testan adapter that enables access to geographic LOD datasets from within OGC WebFeature Service (WFS), enabling most current GIS to access the Web of Data. Wediscuss performance tests by comparing the proposed adapter to a reference WFSimplementation.

J. Jones · S. ScheiderInstitute for Geoinformatics, University of Münster, Münster, Germanye-mail: [email protected]

S. Scheidere-mail: [email protected]

W. KuhnCenter for Spatial Studies University of California, Santa Barbara, USAe-mail: [email protected]

C. Keßler (B)

CARSI, Department of Geography, Hunter College, City University of New York,New York, USAe-mail: [email protected]

J. Huerta et al. (eds.), Connecting a Digital Europe Through Location and Place, 341Lecture Notes in Geoinformation and Cartography, DOI: 10.1007/978-3-319-03611-3_20,© Springer International Publishing Switzerland 2014

342 J. Jones et al.

1 Introduction

Linked Open Data (LOD) is an approach for creating typed links between data fromdifferent sources in the Web. These typed links are based on objects, which havetheir meaning explicitly defined by terms in shared LOD vocabularies (Heath andBizer 2011). With the advent of LOD vocabularies, these objects and their linkscan be built in a machine-readable way, enabling computers to perform queries andreasoning on datasets. The LOD approach is based on the Linked Data Principles,1

which define essential steps for publishing data in the Web and for making it part ofa single global dataset (Bizer et al. 2009). These principles help to enable interoper-ability and discoverability of datasets, creating a rich network of information. Dueto these characteristics, LOD has become a key solution when it comes to efficientlypublishing data on the Web.2

The LOD cloud is growing very rapidly, and some of its most important centralhubs contain vast amounts of geographic information. The DBPedia initiative,3 forexample, systematically extracts information from Wikipedia,4 publishes it as LODand links it to other datasets (Auer et al. 2007). Part of this information is a geo-coordinate for every localizable phenomenon described in Wikipedia. Successfulefforts on implementing geographic LOD have also been carried out by governmentagencies, such as the Ordnance Survey of Great Britain,5 which contributes signifi-cantly to the growth of the Web of geographic LOD based datasets (Goodwin et al.2008).

Despite the benefits and efforts around LOD and also its inarguably increas-ing acceptance, the specific requirements of publishing geographic information onthe Web have been addressed by standardized web services so far. An example isthe Web Feature Service (WFS), a standard for providing geographic features on theWeb, widely implemented in most Geographic Information Systems (GIS), but notsupporting LOD.Despite their difference, both techniques, LODand geographicwebservices, have their specific benefits and shortcomings for publishing and accessinggeographic information on the Web. It has been argued before that combining bothworlds has a great potential for boosting accessibility and interoperability of geo-graphic information (Janowicz et al. 2010). For example, making Linked Open Dataavailable in a geo service standard will turn all geo-service compatible GIS tools,whether they consist of simple desktop clients or distributed service implementa-tions, into powerful geographic analysis tools of the LOD cloud. This combines thestrengths of spatial datamanipulation in aGISwith the potential of accessing datasetsthat are interlinked in the Web of Data.

This chapter addresses one of the open challenges for reaching this goal. Wepropose a way to efficiently access geographic LOD datasets via WFS. The main

1 http://www.w3.org/DesignIssues/LinkedData.html2 http://lod-cloud.net3 http://dbpedia.org/About4 http://www.wikipedia.org5 http://www.ordnancesurvey.co.uk

Making the Web of Data Available 343

idea is to use current Geographic Information Service standards and re-implementthem in order to consume geographic LOD datasets published on the Web. Theremainder of the chapter is structured as follows: Sect. 2 gives an overview of LinkedGeographic Data, showing how it is described in different vocabularies. Section3describes the Web Feature Service standard, and explores its capabilities through itsstandard operations. Section4 outlines the requirements and introduces our solution.Section5 evaluates the performance of our implementation against theWFS referenceimplementation. Section6 reviews related work, followed by conclusions and anoutlook on future work in Sect. 7.

2 Linked Geographic Data

LOD datasets are described using the Resource Description Framework6 (RDF),specified by the World Wide Web Consortium (Brickley and Guha 2004). RDF isa technology for describing resources and their interrelations in subject-predicate-object form. These so-called RDF Triples are commonly stored using an optimizedstorage and retrieval technology called Triple Store. Most Triple Stores organizeRDF Triples in sub-sets called Named Graphs.7 Named Graphs aggregate data, sothat, for example, RDF Triples from distinct sources can be easily identified.

There have been several efforts to use LOD with geographic data. Suggestionsinclude vocabularies for describing geographic data, together with storage and querytechniques (Battle and Kolas 2011). Among the existing vocabularies for describinggeographic LOD datasets is the Basic Geo Vocabulary8 (WGS84 lat/long), whichprovides a namespace for describing geographic entities by coordinates pairs. Thisvocabulary is thus limited to points using WGS84 as a geodetic reference datum.Listing 1 shows an example using the WGS84 Vocabulary.

Listing 1 An example of a feature described with the WGS84 lat/long Vocabulary.

@prefix wgs84_pos: <www.w3.org /2003/01/ geo/wgs84_pos#>.@prefix my: <http :// ifgi.lod4wfs.de/resource/>.@prefix gn: <http ://www.geonames.org/ontology#>.

my:GEOMETRY_1 a gn:Feature ;wgs84_pos:lat "1.71389" ;wgs84_pos:long "69.3857" .

6 http://www.w3.org/RDF/7 http://www.w3.org/TR/rdf11-concepts/#section-dataset8 http://www.w3.org/2003/01/geo/

344 J. Jones et al.

An alternative to describe geographic LOD is the GeoSPARQL Vocabulary,9

defined by the Open Geospatial Consortium10 (OGC). It offers not only classes andproperties for describing geographic LOD, but also spatial relations for querying geo-graphic datasets (e.g. intersects, touches, overlaps, etc.). Listing 2 shows an exampleof a geographic LOD dataset using the GeoSPARQLVocabulary, with the same pointas in Listing 1. Geometries are defined by the class Geometry and the coordinatescan be encoded in an RDF literal of typeWell Know Text (WKT) using a single RDFproperty, namely asWKT.

Listing 2 An example of a feature described with the GeoSPARQL Vocabulary.

@prefix geo: <http ://www.opengis.net/ont/geosparql /1.0# >.@prefix my: <http :// ifgi.lod4wfs.de/resource/>.@prefix sf: <http ://www.opengis.net/ont/sf#>.

my:GEOMETRY_1 a geo:Geometry ;geo:asWKT "POINT ( -69.3857 1.71389)"^^ sf:wktLiteral .

Due to the use of WKT literals, which correspond to OGC simple features(Herring 2011), GeoSPARQL enables an efficient way to describe many differentkinds of geometry (e.g. polygons, lines, points, multipoint, etc.). Another importantaspect of the GeoSPARQL vocabulary is the flexibility regarding coordinate refer-ence systems. The latter are encoded as a literal type. This enables the use of manydifferent coordinate reference systems by adding their corresponding URI to theWKT literal (see Listing 3). If no specific reference system is provided in the WKTliteral, the WGS84 Longitude-Latitude11 reference system is assumed by default.

Listing 3 An example of a feature described with the GeoSPARQL Vocabulary stating a specificCoordinate Reference System.

@prefix geo: <http ://www.opengis.net/ont/geosparql /1.0# >.@prefix my: <http :// ifgi.lod4wfs.de/resource/>.@prefix sf: <http ://www.opengis.net/ont/sf#>.

my:GEOMETRY_1 a geo:Geometry ;geo:asWKT "<http ://www.opengis.net/def/crs/EPSG /0/4326 >POINT ( -69.3857 1.71389)"^^ sf:wktLiteral .

GeoSPARQL also offers the possibility to use the Geography Markup Language(GML) to encode geometries. In this case, the data type (GMLLiteral), property(asGML) and theURL for the geometry type (e.g.http://www.opengis.net/def/gml/ Polygon) have to be changed accordingly.

9 http://www.opengis.net/doc/IS/geosparql/1.010 http://www.opengeospatial.org/11 http://www.opengis.net/def/crs/OGC/1.3/CRS84

Making the Web of Data Available 345

Fig. 1 Web feature service standard operations overview

3 Web Feature Service

The Web Feature Service12 (WFS) is a platform-independent web service standardfor vector-based geographic feature requests on the Web, defined by the OGC.A feature contains one or many geometries, optionally with attribute values. Itscommunication interface is established by HTTP requests encoded as key-valuepairs, to which the server responds with XML documents. The standard operationsof WFS are based on the GetCapabilities, DescribeFeatureType andGetFeature requests, as shown in Fig. 1.

3.1 GetCapabilities Request

The GetCapabilities request lists the WFS versions that the server can workwith, the geometries available on the WFS server, together with their metadata (e.g.title, maintainers, abstract, provider’s contact information, spatial reference system,etc.). It also informs the client which encodings are available for delivering therequested geometries (e.g. GML, GML2, JSON, CSV, etc.). Finally, the XML-basedCapabilities Document also indicates which spatial functions are supported for eachfeature type. Listing 4 shows an example of how a GetCapabilities request can besent to a WFS server.

Listing 4 GetCapabilities Request Example.

http ://[ SERVER_ADDRESS]/wfs?SERVICE=WFS&REQUEST=GetCapabilities

3.2 DescribeFeatureType Request

As shown in Fig. 1, the next step after receiving the Capabilities Document from theWFS server is to perform the DescribeFeatureType request. This request, as

12 http://www.opengeospatial.org/standards/wfs

346 J. Jones et al.

shown in Listing 5, enables the client to select a feature—previously listed in theCapabilities Document—and specify in which WFS encoding version it should bedelivered. The response of this request is an XML document containing all fields ofthe requested features attribute table and their data types.

Listing 5 DescribeFeatureType Request Example.

http ://[ SERVER_ADDRESS]/wfs?SERVICE=WFS&VERSION =1.0.0&REQUEST=DescribeFeatureType&TYPENAME=FEATURE_ID&SRSNAME=EPSG :4326

3.3 GetFeature Request

The last step to obtain features from a WFS is to perform the GetFeature operation.In this operation the client asks for a feature in a specific WFS encoding version,as shown in Listing 6. Finally, the client receives an XML document containing thefeature and its attribute table.

Listing 6 GetFeature Request Example.

http ://[ SERVER_ADDRESS]/wfs?SERVICE=WFS&VERSION =1.0.0&REQUEST=GetFeature&TYPENAME=FEATURE_ID&SRSNAME=EPSG :4326

Although the DescribeFeatureType and GetFeature requests syntacti-cally only differ in the REQUEST parameter, they play different roles in the WebFeature Service standard, namely request information about a certain feature andretrieve the feature itself, respectively.

Another implementation of WFS—the Web Feature Service Transaction(WFS-T)—allows creating, deleting and updating features, but these functionalitiesare currently not addressed in this work. The WFS characteristics of: a) providinga platform-independent layer for querying geographic features requests on the Web,b) the capability of attaching attributes to the geographic features, and c) being astandard widely used as a vector data source, make WFS one of the most suitablestandards for this work.

4 Linked Open Data for Web Feature Services(LOD4WFS Adapter)

Linked Open Data offers a structured approach to describe and interlink raw dataon the Web, and the Web Feature Service standard offers a standardized andwidely used way to deliver geographic features through web services. The union ofthese two technologies could increase the accessibility of geographic LOD datasets

Making the Web of Data Available 347

Fig. 2 LOD4WFS adapter overview

significantly. However, there is currently no common way for them to communicate.Filling this gap between LOD and WFS will allow current GIS to access geographicLOD datasets, thus enabling users to exploit the interactive tools of GIS to visualizeand analyse them. Having LOD as a data source can also open new functionalitiesfor WFS, namely the possibility of integrating different data sources, which is cur-rently not supported by conventional WFS implementations that usually host theirdata sources in geographic databases or Shapefiles. This would enable, for instance,having access to the municipalities of a certain country from server A and havingits river basins from server B in a single request. From this scenario emerged theidea of creating an adapter to enable access from WFS to LOD. Figure2 gives anoverview of how such anLOD4WFS Adapter13 would enable access fromGIS clientsto geographic LOD datasets via WFS.

The adapter implements a service, compliant to the OGC WFS specification,which listens to WFS requests and converts these requests into the SPARQL QueryLanguage for RDF.14 After the SPARQLQuery is processed, the LOD4WFSAdapterreceives the RDF15 result set from the Triple Store, encodes it as a WFS XML doc-ument, and returns it to the client (e.g. a GIS). This approach enables current GIS totransparently have access to geographic LOD datasets, using their implementation ofWFS, without any adaptation whatsoever being necessary. In order to reach a highernumber ofGIS, the currentlymost common implementationofWFShas been adoptedfor the LOD4WFS Adapter, namely the OGC Web Feature Service ImplementationSpecification 1.0.0 (Vretanos 2002). The LOD4WFS Adapter enables access to geo-graphic LODdatasets in two different ways, whichwewill call Standard Data Accessand Federated Data Accesses in the following.

4.1 Standard Data Access

The Standard Data Access feature was designed in order to enable access to allgeographic LOD datasets contained in a triple store. This feature basically worksas an explorer, looking for geographic LOD datasets from a certain Triple Storeand making them available via WFS. Due to the possibility of describing different

13 https://github.com/jimjonesbr/lod4wfs14 http://www.w3.org/TR/rdf-sparql-query/15 http://www.w3.org/RDF/

348 J. Jones et al.

types of geometries (polygons, lines, points) andmany different coordinate referencesystems, which are characteristic requirements of aWFS, we chose the GeoSPARQLVocabulary as an input requirement for the Standard Data Access feature. Listing7 shows how geometries and their related attributes are expected to be structured.The geometries are encoded as WKT literals and the attributes of features are linkedto the instance of the geo:Geometry class via RDF Schema16 and Dublin CoreMetadata Element Set17 vocabularies. However, there are no constraints on whichvocabularies or properties may be used for describing attributes.

Listing 7 LOD dataset example: Turtle RDF encoding of a dataset, including ID and description.

@prefix geo: <http ://www.opengis.net/ont/geosparql /1.0# >.@prefix my: <http :// ifgi.lod4wfs.de/resource/>.@prefix sf: <http ://www.opengis.net/ont/sf#>.@prefix dc: <http :// purl.org/dc/elements /1.1/ >.@prefix rdf: <http ://www.w3.org /1999/02/22 -rdf -syntax -ns#>.@prefix xsd: <http ://www.w3.org /2001/ XMLSchema#>.

my:FEATURE_RECIFE a geo:Feature ;rdf:ID "2611606"^^ xsd:integer ;dc:description "Recife "^^xsd:string ;

geo:hasGeometry my:GEOMETRY_REFICE .

my:GEOMETRY_RECIFE a geo:Geometry ;geo:asWKT "<http ://www.opengis.net/def/crs/EPSG/ 0/4326 > POLYGON ((

-35.0148559599999984 -8.0564907399999992 ,-34.9939074400000010 -8.0493884799999993 ,...-35.0148559599999984 -8.0564907399999992)) "^^sf:wktLiteral .

4.1.1 Required Metadata

In order to make the datasets discoverable via the Standard Data Access feature,additional metadata must be added to the datasets. We make use of Named Graphsfor this purpose. Every Named Graph in the LOD data source must contain onlyobjects of the same feature type. This approach facilitates the discoverability ofFeatures, speeding up queries that list the Features available in the triple store. Incase a Named Graph contains multiple feature types, the features can be split intodifferent layers using the Federated Data Access (see Sect. 4.2). Finally, each NamedGraph needs to be described by certain RDF properties, namely abstract, titleand subject from the Dublin Core Terms Vocabulary.18 This information helpsthe adapter to classify all Features available in a Triple Store, so that they can befurther on discovered by the WFS client through the WFS Capabilities Document(see Listing 8). Alternatively, the LOD4WFS Adapter could also use a query basedon other RDF types to construct the Capabilities Document.

16 http://www.w3.org/TR/rdf-schema/17 http://dublincore.org/documents/dces/18 http://dublincore.org/documents/dcmi-terms/

Making the Web of Data Available 349

Listing 8 Named Graph Example.

@prefix dct: <http :// purl.org/dc/terms/>.@prefix xsd: <http ://www.w3.org /2001/ XMLSchema#>.

<http :// ifgi.lod4wfs.de/graph/municipalities > dct:title "BrazilianMunicipalities "^^xsd:string ;dct:abstract "Municipalities of the Brazilian FederalStates ."^^ xsd:string ;dct:subject "municipalities boundaries "^^xsd:string .

It is important to emphasize that these RDF properties are used simply as a proofof concept for the proposed adapter, therefore other vocabularies and properties couldbe used instead.

4.2 Federated Data Access

The Federated Data Access feature offers the possibility of accessing geographicLOD datasets based on predefined SPARQL Queries. Differently than the StandardData Access, the Federated Data Access feature is able to execute SPARQL Queriesto multiple SPARQL Endpoints, thus enabling WFS features to be composed of datafromdifferent sources.As a proof of concept ofwhat can be achieved, Listing 9 showsan example of a federated query, combining data fromDBpedia andOrdnance Surveyof Great Britain. The SPARQL Query is executed against the Ordnance Survey’sSPARQL Endpoint,19 retrieving the GSS Code20 and geographic coordinates fromdistricts of Great Britain—the coordinates are provided by the Ordnance Surveyusing the WGS84 lat/long Vocabulary, but this example converts them to WKTliterals using the function CONCAT. Afterwards, the retrieved entries are filtered bymatching the districts’ names with DBpedia entries from the east of England, whichare written in English language. The result of this SPARQL Query can be furtheron listed as a single WFS feature via the LOD4WFS Adapter, thereby providinga level of interoperability between datasets that is currently unachievable by anyimplementation of WFS, whether using Shapefiles or geographic databases.

Listing 9 Federated Data Access – SPARQL Query Example.

PREFIX xsd: <http :// www.w3.org /2001/ XMLSchema#>PREFIX rdfs: <http ://www.w3.org /2000/01/rdf -schema#>PREFIX dbpo: <http :// dbpedia.org/ontology/>PREFIX dbp: <http :// dbpedia.org/resource/>PREFIX wgs84: <http ://www.w3.org /2003/01/ geo/wgs84_pos#>PREFIX os: <http :// data.ordnancesurvey.co.uk/ontology/admingeo/>

SELECT ?abstract ?name ?gss(CONCAT ("POINT(", xsd:string (?long), " ",

19 http://data.ordnancesurvey.co.uk/datasets/os-linked-data/explorer/sparql20 http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode

350 J. Jones et al.

xsd:string (?lat), ")") AS ?wkt)WHERE

{? subject rdfs:label ?name .?subject wgs84:lat ?lat .?subject wgs84:long ?long .?subject os:gssCode ?gss .?subject a os:DistrictSERVICE <http :// dbpedia.org/sparql/> {

?entry rdfs:label ?place .?entry dbpo:abstract ?abstract .?entry dbpo:isPartOf dbp:East_of_EnglandFILTER langMatches(lang(?place), "EN")FILTER langMatches(lang(? abstract), "EN")FILTER ( STR(?place) = ?name )}

}

The LOD4WFS Adapter provides a web interface that allows users to write,validate and store SPARQL Queries (see Sect. 4.3.2).

4.3 LOD4WFS Software Architecture

The LOD4WFS Adapter, which was entirely developed in the Java programminglanguage, is divided into 6 main system modules: WFS Interface, Web Interface,Request Validator,Query Manager,Connection Manager andRDF2WFS Converter.Figure3 shows an overview of the application modules.

4.3.1 Web Interface

The Web Interface is responsible for receiving HTTP requests and translating themto the WFS Interface. It also provides access to a web-based system for maintainingSPARQL Queries created via Federated Data Access and changing the system’ssettings (e.g. default SPARQL Endpoint). This interface was developed using theJava-based HTTP server Jetty,21 enabling the application to be deployed without theneed of an external servlet container.

4.3.2 WFS Interface

The WFS Interface implements a listener for the standard operations defined inthe OGCWFS Specification, namely GetCapabilities, DescribeFeatureType and GetFeature. Its main goal is to create an agnostic communicationlayer that enables anyWFS client implementation to send requests and receive queryresults.

21 http://www.eclipse.org/jetty/

Making the Web of Data Available 351

Fig. 3 LOD4WFS modules

Table 1 Validated WFS operations

Operation Values

SERVICE WFS by defaultREQUEST GetCapabilities, DescribeFeatureType or GetFeatureSRSNAME Spatial reference system of a feature available in the system, e.g. EPSG:4326TYPENAME ID of a feature available in the system, provided by at the capabilities documentVERSION 1.0.0 by default

4.3.3 Request Validator

This module is responsible for validating the HTTP request received by the WFSInterface, making sure all operations sent by the WFS client are properly fulfilled.Table1 shows the operations implemented by the Request Validator.

352 J. Jones et al.

In case of invalid or unknown requests are sent (e.g. non-existing feature or wrongversion), an exception report is delivered, according to the Web Feature ServiceImplementation Specification.

4.3.4 Query Manager

Once the requests have been approved by the Request Validator, they must be trans-lated and processed. The Query Manager is responsible for parsing requests sent bythe WFS client and for translating them into SPARQL queries. It is also responsiblefor mapping each feature to its data access technique (Standard Data Access or Fed-erated Data Access), which have their requests translated differently. The requestsare translated as follows:

GetCapabilities

Standard Data Access–Selects all named graphs (Containers of Features) from thetriple store, together with the geometry type of the containing Feature.Federated Data Access–Lists all customized SPARQL Queries stored via the WebInterface.

DescribeFeatureType

Standard Data Access–Lists all properties attached to a selected Feature togetherwith their range.Federated Data Access–Lists the variables expected from the customized SPARQLQueries.

GetFeature

Standard Data Access–Selects all geometries of a selected Feature together with thevalues of their related properties.Federated Data Access–Executes the customized SPARQL Query of the requestedfeature to its predefined SPARQL Endpoint.

4.3.5 Connection Manager

The Connection Manager module is responsible for establishing communicationfrom the LOD4WFS Adapter to Triple Stores. Its main goal is to execute SPARQLqueries, previously composed by the Query Manager, and forwards its results to the

Making the Web of Data Available 353

RDF2WFS Converter for further processing. It is based on the Apache Jena API22

for building Semantic Web applications.

4.3.6 RDF2WFS Converter

Once theSPARQLQueryhas beenprocessed and its results are returned to the system,the RDF2WFS module converts it to standard WFS documents. Depending on therequest performed by theWFS client (GetCapabilities,DescribeFeatureTypeor GetFeature) it creates anXMLdocumentwith theSPARQLQuery resultand delivers it back to the WFS client.

5 Solution Evaluation

In order to evaluate the performance of the proposed adapter, this section presentstests to compare it to the reference implementation of OGC WFS, namely the soft-ware server for geospatial data GeoServer.23 The test compares the server responsetime for the GetFeature request in both LOD4WFS and GeoServer WFS imple-mentations. Its main goal is to measure the time each of the services takes to processa GetFeature request, perform the query on the storage management system andsend the XML document back to the client. For setting up GeoServer, the databasePostgreSQL,24 with its spatial extension PostGIS,25 was chosen as feature storagefor the WFS (Scenario A). For the LOD4WFS Adapter, three different Triple Storeswere tested, namely Parliament,26 Fuseki27 and OWLIM-Lite28 (Scenario B). TheGetFeature requests were performed using the command line tool cURL.29 Thestandard installations of all software involved in the tests were kept. Figure4 showsan overview of how the test environment is structured.

5.1 Test Environment

All tests were performed using a virtual machine as specified in Tables2 and 3.

22 http://jena.apache.org/23 http://geoserver.org/display/GEOS/Welcome24 http://www.postgresql.org/25 http://postgis.net/26 http://parliament.semwebcentral.org/27 http://jena.apache.org/documentation/serving_data/28 http://www.ontotext.com/owlim29 http://curl.haxx.se/

354 J. Jones et al.

Fig. 4 Test environment overview

Table 2 Hardware environment

Processor Intel(R) Xeon(R)CPU E5530 @ 2.40GHz, dual core

Network card 82545EM Gigabit ethernet controller (copper)Capacity: 1GB/s

Memory Clock: 66MHz8GB

Table 3 Software environment

Software Version

Operating system Ubuntu serverLinux 3.2.0-58-generic (amd64)Version 12.04 LTSFile system: ext4

Apache tomcat 6.0.35GeoServera 2.4.4PostgreSQL 9.1PostGIS 1.5.3OWLIM-Litea 4.0Java runtime Sun microsystems Inc.: 1.6.0_27

(OpenJDK 64-Bit server VM)cURL 7.29.0a Embedded at OpenRDF Sesame 2.7.0 and hosted with Apache Tomcat

Making the Web of Data Available 355

Table 4 Test datasets

Brazilian municipalities dataset

Number of geometries 5799Dataset size 11.2MBAmazon rivers datasetNumber of geometries 18690Dataset size 45MBAmazon vegetation datasetNumber of geometries 39082Dataset size 173.2MB

Table 5 Datasets overview for scenario A

Dataset Table records Table sizea

Brazilian municipalities 5799 16MBAmazon rivers 18690 55MBAmazon vegetation 39083 183MBa Including indexes

5.2 Test Datasets

The datasets used for the tests (see Table4) were created by the Brazilian Instituteof Geography and Statistics30 (IBGE). They all contain polygon geometries and areavailable in Shapefile format.31

To test Scenario A, the dataset was stored in the PostgreSQL database and furtheron added to the GeoServer as a data source for feature layers (see Table5). This wasnecessary to enable access to the features through the GeoServer WFS interface.

In order to use the same dataset for Scenario B, the dataset had to be converted toLOD, fulfilling the characteristics previously discussed in Sect. 4.1. For this purpose,a script (shp2rdf ) in the R programming language32 was developed for readingShapefiles and creating an LOD dataset in N-Triples syntax (Beckett 2014). Thescript uses the rgdal33 and rgeos34 packages.

After the conversion, the same RDF N-Triples files (see Table6) were loadedinto the Parliament (Scenario B.1), Fuseki (Scenario B.2) and OWLIM-Lite (Sce-nario B.3) Triple Stores. The datasets in all test scenarios could also be successfully

30 http://ibge.gov.br/31 ftp://geoftp.ibge.gov.br/mapas_interativos/32 http://www.r-project.org/33 http://cran.r-project.org/web/packages/rgdal/rgdal.pdf34 http://cran.r-project.org/web/packages/rgeos/rgeos.pdf

356 J. Jones et al.

Table 6 Datasets overviewfor scenario B

Dataset Total triples File size

Brazilian municipalities 86988 28.7MBAmazon rivers 359206 113.8MBAmazon vegetation 703497 416.1MB

downloaded and displayed using the WFS clients of GIS QGIS35 and ArcMap.36

The converted datasets can be found at the following SPARQL Endpoint.37

5.3 Test Procedure

The loaded datasets were queried via HTTP GetFeature requests using cURL.The GetFeature request was performed 10 times in each test scenario for eachdataset, afterwards the arithmetic mean value of the time elapsed was calculated. Toavoid the network speed to affect the test results, the download speed was limited to500kilobytes per second, so that all test scenarios have the same download perfor-mance. Listing 10 shows an example of how the requests per cURL were sent to thetest server. Table7 summarizes the tests performed in each test scenario.

Listing 10 Sample HTTP Request Sent via cURL.

$ curl --limit -rate 500k ’http ://[ SERVER_ADDRESS:PORT]/wfs?SERVICE=WFS&VERSION =1.0.0& REQUEST=GetFeature&TYPENAME=FEATURE_ID ’-o feature.xml;$

5.4 Results and Discussion

The results demonstrated a non-substantial efficiency difference between the testscenarios. Querying the Brazilian municipalities dataset, all tested scenarios showeda similar response time, having Scenario A as the most efficient one, being 1.33%faster than the second fastest scenario, namely Scenario B.1 (see Table7-I). Theefficiency difference querying this dataset was limited to the milli-second scale,though (see Fig. 5).

The tests querying the Amazon rivers dataset showed again a similar performancebetween the test scenarios using triple stores. Among them, Scenario B.3 had aslightly better performance than Scenario B.1 and B.2. Scenario A had again the best

35 http://www.qgis.org/36 http://esri.de/products/arcgis/about/arcmap.html37 http://data.uni-muenster.de/open-rdf/repositories/lod4wfs

Making the Web of Data Available 357

Table 7 Performance of GetFeature requests

Test scenario Avg. Time (mm:ss.ms) Standard deviation

I. Brazilian municipalities datasetA – (GeoServer WFS with PostgreSQL) 00:38.217 0.1916B.1 – (LOD4WFS with Parliament) 00:38.731 0.0961B.2 – (LOD4WFS with Fuseki) 00:38.877 0.1283B.3 – (LOD4WFS with OWLIM-Lite) 00:38.857 0.0762II. Amazon rivers datasetA – (GeoServer WFS with PostgreSQL) 02:36.542 0.0802B.1 – (LOD4WFS with Parliament) 02:38.110 0.1181B.2 – (LOD4WFS with Fuseki) 02:38.150 0.2795B.3 – (LOD4WFS with OWLIM-Lite) 02:38.076 0.0872III. Amazon vegetation datasetA – (GeoServer WFS with PostgreSQL) 08:35.681 0.0642B.1 – (LOD4WFS with Parliament) 08:44.013 0.2447B.2 – (LOD4WFS with Fuseki) 08:44.771 0.1037B.3 – (LOD4WFS with OWLIM-Lite) 08:39.079 0.0868

Fig. 5 Performance comparison for the Brazilian municipalities dataset

performance among all test scenarios (see Table7-II), being 0.97% faster than thesecond fastest test scenario, namely Scenario B.3.

Tests querying the Amazon vegetation dataset showed a bigger performancedifference between the test scenarios involving triple stores (see Table7-III). Sce-nario B.3 demonstrated to have a more efficient response time than Scenarios B1and B.2 when querying bigger datasets, being 0.94% faster than the second fastesttriple store based test scenario, namely Scenario B.1. Among all test scenarios, Sce-nario A demonstrated again a better performance than all others test scenarios (seeFig. 7), being 0.65% faster than Scenario B.3.

Though the test results showed no expressive difference between the test scenar-ios, it demonstrated that the combination of GeoServer with the relational databasePostgreSQL still provides a slightly faster platform for enabling access to geographic

358 J. Jones et al.

Fig. 6 Performance comparison for the Brazilian rivers dataset

Fig. 7 Performance comparison for the Brazilian vegetation dataset

vector data. The results showed also, considering the given test environment, that theefficiency difference between the LOD4WFS approach and GeoServer with Post-greSQL gets smaller when bigger datasets are requested. The approach proposedby the LOD4WFS relies on the respective triple store’s efficiency, which has beenshown to be slower than a relational database in our test scenarios. However, themain point we want to stress in this work is the great benefit of having LOD datasetsas data source for WFS. This approach provides not only an innovative and com-petitive way for serving data to current web service standards, but also offers thepossibility of combining multiple data sources and creating new datasets on demand(see Sect. 4.2), which is currently not provided by any WFS implementation.

It is also important to mention that the results presented in these tests represent theperformance of specific system versions in a single-user environment (see Sect. 5.1),therefore reproducing the tests with other releases will inevitably lead to differentresults.

Making the Web of Data Available 359

6 Related Work

Significant efforts have been made to introduce and enhance the usage ofsemantics (Kuhn 2005) in geospatial information and web services. Among them arethe works on geographical Linked Data (Goodwin et al. 2008), Semantic Geospa-tial Web services (Roman and Klien 2007), semantic enablement for spatial datainfrastructures (Janowicz et al. 2010), structured alignment methods to geospatialontologies (Cruz andSunna 2008), semantic-based automatic composition of geospa-tial Web service chains (Yue et al. 2007) and a framework for semantic knowledgetransformation of geospatial data (Zhao et al. 2009). The technological challengesand benefits of adding a spatial dimension to the Web of Data have been also dis-cussed by Auer et al. (2009), where spatial data was systematically extracted fromthe collaborative project OpenStreetMap38 and converted to RDF. Efforts on yield-ing geographic information in OGCweb services and embedding them as LOD havebeen conducted by Roth (2011) with the Geographic Feature Pipes.

Other authors have suggested to use the OGC WFS standard as an interfacefor providing access to semantic data; Staub (2007) and Donaubauer et al. (2007)have proposed an extension of the existing WFS standard to create a model-driveninterface. These works, however, require modification of the OGC WFS standard.In contrast, we use the WFS standard as it is specified by OGC, so that current GIScan access it without any modification.

7 Conclusions and Future Work

This chapter presents an alternative way of accessing geographic LOD datasets fromcurrent GIS. We have explored the possibility of using the OGC WFS standard asan intermediate layer between geographic LOD datasets and GIS. We developedan application (LOD4WFS Adapter) that acts as a service for: 1) listening to WFSrequests and translating them to SPARQL Queries; and 2) transforming the RDFresult set intoWFSstandard documents. Performance tests of theLOD4WFSAdapteragainst the reference implementation of OGC WFS (GeoServer) were conducted.The test environment involved three different triple stores and a relational database.The preliminary tests showed that our LOD4WFS Adapter can compete with thereference implementation for WFS services, while providing significantly largerflexibility in accessing and integrating data sources on the Web.

This chapter demonstrates that using LOD as data source for WFS is perfectlyfeasible andhas agreat potential. It combines thebenefits of awidelyusedweb servicestandard with the interoperability offered by LOD. This improves accessibility ofgeographical information on the Web of Data for GIS. Future work includes:

First, the implementation of WFS spatial operations. This would allow theLOD4WFS Adapter to translate supported WFS spatial operations (e.g. contains,

38 http://www.openstreetmap.org/

360 J. Jones et al.

intersects) to SPARQL using the Geographic Query Language for RDF (GeoSPARQL). Currently only a few Triple Stores implement GeoSPARQL (e.g. Parlia-ment, Oracle Spatial RDFSemanticGraph,39 Strabon40). This situationmay improveonce standard Triple Stores will adopt GeoSPARQL and corresponding OGC stan-dards for spatial queries.

The second enhancement is the transaction operation (WFS-T). Currently, theLOD4WFS Adapter implements only requests of geographic information, and doesnot allow any data manipulation. Implementing the operations defined by WFS-Twould enable WFS clients not only to query geographic LOD datasets, but also toinsert, edit and delete existing features. The third enhancement we intend is the pos-sibility of accessing geographic LOD datasets encoded as GML and other commonformats, e.g. GeoRSS,41 or GeoJSON.42 Currently, only WKT is supported.

Finally, we intend to perform more detailed comparisons of the LOD4WFSAdapter and conventional WFS implementations. In order to achieve this, we planto perform stress tests and to evaluate the application behavior in both single andmulti-user environments using different operating systems.43

Acknowledgments This work is funded by the German Research Foundation (DFG) through theLinked Data for eScience Services (LIFE) Project, KU 1368/11-1.

References

Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z (2007) Dbpedia: a nucleus for a webof open data. In: The semantic web. Springer, Heidelberg, p 722–735

Auer S, Lehmann J, Hellmann S (2009) LinkedGeoData: adding a spatial dimension to the web ofdata. In: The semantic web—ISWC 2009. Springer, Heidelberg, p 731–746

Battle R, Kolas D (2011) Geosparql: enabling a geospatial semantic web. SemanticWeb J 3(4):355–370

BeckettD (2014)N-Triples.A line-based syntax for anRDFgraph.W3Cproposed recommendation.http://www.w3.org/TR/n-triples/

Bizer C, Heath T, Berners-Lee T (2009) Linked data—the story so far. Int J Semantic Web Inf Syst5(3):1–22

Brickley D, Guha RV (2004) RDF vocabulary description language 1.0: RDF schema. W3C rec-ommendation. http://www.w3.org/TR/rdf-schema/

Cruz IF, Sunna W (2008) Structural alignment methods with applications to geospatial ontologies.Trans GIS 12(6):683–711

Donaubauer A, Straub F, Schilcher M (2007) mdWFS: a concept of web-enabling semantic trans-formation. In: Proceedings of the 10th AGILE conference on geographic information science

39 http://www.oracle.com/technetwork/database/options/spatialandgraph/overview/rdfsemantic-graph-1902016.html40 http://www.strabon.di.uoa.gr/41 http://www.georss.org/42 http://geojson.org/43 http://lodum.de/life

Making the Web of Data Available 361

Goodwin J, Dolbear C, Hart G (2008) Geographical linked data: the administrative geography ofgreat britain on the semantic web. Trans GIS 12(1):19–30

Heath T, Bizer C (2011) Linked data: evolving the web into a global data space. Synth lect semanticweb theor technol 1(1):1–136

Herring JR (2011) Simple feature access—part 1: common architecture. OpenGIS implementa-tion standard for geographic information, OGC 06–103r4. http://portal.opengeospatial.org/files/?artifact_id=18241

Janowicz K, Schade S, Bröring A, Keßler C, Maué P, Stasch C (2010) Semantic enablement forspatial data infrastructures. Trans GIS 14(2):111–129

Kuhn W (2005) Geospatial semantics: why, of what, and how. J Data Semant III. Lecture notes incomputer science, vol 3534, pp 1–24

Roman D, Klien E (2007) Swing-a semantic framework for geospatial services. In: The geospatialweb. Springer, Heidelberg, p 229–234

Roth M (2011) Geographic feature pipes. Diploma thesis, Institute for Geoinformatics, Universityof Münster, Germany

Staub P (2007) A model-driven web feature service for enhanced semantic interoperability. OSGeoJ 3(1)

Vretanos PA (2002) Web feature service implementation specification. OpenGIS implementa-tion standard for geographic information, OGC 02–058. http://portal.opengeospatial.org/files/?artifact_id=7176

Yue P, Di L, Yang W, Yu G, Zhao P (2007) Semantics-based automatic composition of geospatialweb service chains. Comput Geosci 33(5):649–665

Zhao P, Di L, Yu G, Yue P, Wei Y, Yang W (2009) Semantic web-based geospatial knowledgetransformation. Comput Geosci 35(4):798–808