Feratel mapping technical_notes

15
STI INNSBRUCK TECHNICAL NOTES ON MAPPING OF FERATEL CONTENT TO SCHEMA.ORG Zaenal Akbar, Ioan Toma STI Innsbruck, University of Innsbruck, Technikerstraße 21a, 6020 Innsbruck, Austria [email protected] 2014-11-12 Semantic Technology Institute Innsbruck 1

Transcript of Feratel mapping technical_notes

STI INNSBRUCK

TECHNICAL NOTES ON MAPPING OF FERATEL CONTENT TO

SCHEMA.ORGZaenal Akbar, Ioan Toma

STI Innsbruck, University of Innsbruck,Technikerstraße 21a, 6020 Innsbruck, Austria

[email protected]

2014-11-12

Semantic Technology Institute Innsbruck

STI INNSBTRUCKTechnikerstraße 21aA – 6020 Innsbruck

Austriahttp://www.sti-innsbruck.

1

Contents1. Introduction................................................................................................................................................2

2. Feratel Mapping.........................................................................................................................................2

2.1. Service Providers................................................................................................................................3

2.2. Shop Items..........................................................................................................................................6

2.3. Event...................................................................................................................................................6

2.4. Infrastructure.......................................................................................................................................8

2.5. Destination Packages..........................................................................................................................8

3. Feratel Plugin.............................................................................................................................................9

4. Technical Notes.......................................................................................................................................10

4.1. Missing Relationships.......................................................................................................................10

4.2. Missing Required Properties.............................................................................................................10

4.3. ID as Item Values..............................................................................................................................11

References....................................................................................................................................................13

2

1. Introduction

This document presents a few technical notes of Feratel Schema.org mapping. It contains the latest updates of implementation for conceptual mapping Feratel content to Schema.org [1], including the latest updates of the Feratel Plugin [2], a web service-based system to markup the XML responses were obtained from Feratel API endpoints by using an XSL Transformation [3].

The Deskline 3.0 Standard Interface (DSI) [4] is the interface to interact with Feratel content and offers various functionalities such as get basic data for various content, searches for availabilities, booking, saving requests, etc. Two functionalities are relevant to our work in content annotation:

1. Basic Data. Provides the detail data of Service Providers, Shop Items, Events, and Infrastructure items.

2. Search. Provides the brief data of Service Providers and their products, Destination Packages and their details.

Each functionality is offered through a specific API endpoint with a specific XML format for API requests and responses as well.

2. Feratel Mapping

Based on the conceptual mapping of Feratel Content to Schema.org [1], the mappings were implemented concerning the specification of Schema.org especially the Domain and Range for each property.

Table 1 Specification for property http://schema.org/startDate

http://schema.org/startDate

Domains:http://schema.org/Eventhttp://schema.org/Rolehttp://schema.org/Seasonhttp://schema.org/Serieshttp://schema.org/TVSeasonhttp://schema.org/TVSeries

Ranges:http://schema.org/Date

As shown in Table 1, a value for property “startDate” is expected to be a type of Date and used for one of entities Event, Role, Season, Series, TVSeason, TVSeries only.

3

Table 2 Specification for property http://schema.org/organizer

http://schema.org/organizer

Domains:http://schema.org/Event

Ranges:http://schema.org/Organizationhttp://schema.org/Person

As indicated in Table 2, a value for property “organizer” must be an Organization or a Person. Therefore, in our mapping implementation for Events (which is mapped to PostalAddress through a property “organizer”), an Organization entity has to be inserted between those classes to make sure the specification is conformed as shown at Figure 5.

2.1. Service Providers

A service provider is an accomodation provider such as Hotel. Beside offers an accommodation service, a provider could also offers additional services such as ski-passes, spa-entries, guided hiking tours. Information about service providers and their offered services can be obtained from the Basic Data endpoint and Search endpoint (including for the additional services that are migh be offered by a provider).

4

Figure 1 Entity Relationship for the Basic Data of Service Providers

As shown at Figure 1, there are about 12 entities can be extracted from a service provider basic data, where a LodgingBusiness has multiple PostalAddress entities (to represent Object, Landlord, Owner, KeyHolder). An Offer could has multiple PriceSpecification and a Review has multiple UserComments entities.

5

Figure 2 Entity Relationship for Search of Service Providers

Figure 2 shows the extracted entities from the service provider search data, while the extracted entities from additional services search data are shown at Figure 3.

Figure 3 Entity Relationship of Search for Additional Sevices

6

2.2. Shop Items

The extracted entities from Shop Items (include brochures, articles and guides) basic data are shown at Figure 4.

Figure 4 Entity Relationship for Basic Data of Shop Items

2.3. Event

Content about events can be obtained from the Basic Data endpoint and Search endpoint. Figure5 shows the extracted entities from event basic data. From 4 different available addresses (Organizer, Booking, Info and Venue), the address for Venue is connected by “location” property while the other three are connected by “organizer” property.

7

Figure 5 Entity Relationship for Basic Data of Event

Figure 6 Entity Relationship for Search of Event

Only two entities were extracted from the event search data as shown at Figure 6.

8

2.4. Infrastructure

Figure 7 Entity Relationship for Basic Data of Infrastructure

As shown at Figure 7, there are four entities were extracted from the infrastructure basic data. Each LocalBusiness could has two PostalAddress (ExternalAddress and InternalAddress).

2.5. Destination Packages

Figure 8 Entity Relationship for Basic Data of Destination Packages

As shown at Figure 8, from the destination packages basic data, about four entities were extracted, where an Offer could has multiple PriceSpecification.

Figure 9 An Entity from Search of Destination Packages

Only one entity was extracted from the destination packages search data as shown at Figure 9.

9

3. Feratel Plugin

The feratel plugin is a web service-based system to insert the Schema.org vocabulary into XML responses from Feratel API endpoints. The system comprises of two main components:

1. Dispatcher, is responsible to organize the communication flow between Client, Feratel API and Annotator.

2. Annotator, is responsible to annotate any XML input with Schema.org vocabulary according to the predefined mapping and produce an annotated XML output.

Figure 10 Diagram of Feratel Plugin Implementation

As shown at Figure 10, the Dispatcher will intercept a request from Client (1) and then forward it to the designed Feratel API endpoint (2), receive the response (3) and forward it to the Annotator (4), receive the result from the Annotator (5) and forward it back to the Client (6).

To use the plugin, a simple step is required at client side, instead of pointing to the Feratel API directly; a client could use our endpoints to receive an annotated XML response of Feratel content.

10

4. Technical Notes

During the mapping and plugin implementation, we encountered a few drawbacks that are opened for possible improvements in the future.

4.1. Missing Relationships

While the mapping was trying to map as much as possible the Feratel content to Schema.org, a few adaptations were necessary to meet with Schema.org specifications.

As shown at Figure 1 - Figure 10, several entities were extracted successfully but have no connection to the other entities. One of the two possible following conditions can cause this situation:

1. There is no property in Schema.org that could be representing suitable relation beween entities.

2. A suitable property is available in Schema.org but only available for relation between certain entities only. For example, property “geo” is possible to link entity Place to entity GeoCoordinate or GeoShape only.

4.2. Missing Required Properties

Each entity in Schema.org must be accompanied by a few basic properties. If these properties are missing then an error will be raised during the extraction of structured data from content. We are detecting these errors by using Yandex Structured Data Validator [5] and Google Structured Data Testing Tool [6].

11

Figure 11 Structured Data Extraction with Yandex Validator

Figure 11 shows a structured data extraction using Yandex structured data validator from an annotated XML response of additional services search data of ServiceProviders. It shows that the “address” property is missing and a warning also rose for the missing of “telephone” property.

4.3. ID as Item Values

Several items in XML response from the Feratel API are provided in the format of IDs only as shown in the following response:

…<Town>5c7e1d37-0060-4811-a996-a8b36094d01e</Town><District>65f056cd-4bc9-4272-a1d6-3fadf7aac9d9</District>...<Stars Id="EBF4EE39-F7E0-45FE-A410-46D75C3B769C" /><Categories>

<Item Id="AB4F2086-F06D-4DAC-8B99-09EDA5577C67" /></Categories>

12

<Classifications><Item Id="50A3AF54-33DB-4612-8848-B9CF0A65C558" /><Item Id="89C1A7D7-0222-4DEE-AA35-D6A01496B0BC" />

</Classifications><MarketingGroups>

<Item Id="9B3F881D-D73F-4772-AD7B-99DCAE16BB59" /></MarketingGroups>...

Technically, this problem can be solved by sending another request to the Feratel API to find the relevant values for those IDs or maintain a local database of those IDs-values mapping. But first, we have to decide if we want to alter the XML response structure by adding the external relevant values into the original XML response including to decide which external additional values are to be selected.

13

References

[1] Ioan Toma, Zaenal Akbar, “Conceptual Mapping of Feratel Content to Schema.org”, August 2014

[2] Zaenal Akbar, Ioan Toma, Christoph Fuchs, Corneliu Valentin Stanciu, Lanzanasto Norbert, “Feratel Schema.org Plugin Implementation”, May 2014

[3] W3C, “XSL Transformations (XSLT)”, http://www.w3.org/TR/xslt[4] Simone Schanitz, “Documentation Deskline 3.0 Standard Interface (DSI), version

1.0.58”, June 2014[5] Yandex Structured Data Validator, https://webmaster.yandex.com/microtest.xml[6] Google Structured Data Testing Tool,

http://www.google.com/webmasters/tools/richsnippets

14