D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020...
Transcript of D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020...
D1.4 - Event, Weather andMultilingualDataServices
Deliverablen: 1.4Date: 27December2018Status: FinalVersion: 1.0Authors: Aljaž Košmerlj (JSI), Matteo Palmonari (UNIMIB), Flavio De Paoli
(UNIMIB)
Contributors: JSI,UNIMIB
Reviewers: MatejŽvan(BT),DumitruRoman(SINTEF) Distribution: Public
Grantn.732590-H2020-ICT-2016-2017/H2020-ICT-2016-1
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
2
HistoryofChanges
Version Date Description Revisedby
0.1 30/10/2018 TentativeTableofContents AljažKošmerlj(JSI)
0.2 18/12/2018 WroteChapter4andsection5.2 AljažKošmerlj(JSI
0.3 19/12/2018 WroteSection2.2 FlavioDePaoli,MatteoPalmonari(UNIMIB)
0.4 19/12/2018 Improved Section 2.2 and added Section4.1
FlavioDePaoli,MatteoPalmonari(UNIMIB)
1.1.1.1.1.1.1 0.91.1.1.1.1.1.2 27/12/20181.1.1.1.1.1.3 Finalizeddocument 1.1.1.1.1.1.4 AljažKošmerlj(JSI)
1.1.1.1.1.1.5 1.01.1.1.1.1.1.6 28/12/20181.1.1.1.1.1.7 Finalcheckbycoordinatorandminoredits1.1.1.1.1.1.8 Matteo Palmonari(UNIMIB)
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
3
ExecutiveSummary
This deliverable describes the event, weather and multilingual data services of the EW-Shoppproject.Theseservicessupplycontextualinformationtothebusinessdataprovidedbytheprojectbusiness partners as well as the cross-lingual linking of datasets in different languages. The textbuilds on specifications and descriptions from previous deliverables and details extensions andadditionsmadebasedonexperience fromdeploymentof business casepilots. The chief amongstthese additions is the introduction of the custom events ontology for description of custombusiness-impacting events. The ontology stems from the alignment of custom event data thatbusiness partners use or plan to use to the Schema.org vocabulary, so as to maximizeinteroperabilitywithanincreasingvolumeofeventdatathatarepublishedusingthisvocabulary.
The deliverable expands on specifications and descriptions from deliverable D1.3. It describesservicesused indeploymentofpilotsoutlined indeliverableD4.2.Theresultsofevaluationof theservicesbasedontheirperformanceinthepilotsarereportedindeliverableD2.3.
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
4
TableofcontextHistoryofChanges.................................................................................................................................2
ExecutiveSummary...............................................................................................................................3
Listoffigures.........................................................................................................................................5
Listoftables...........................................................................................................................................6
Chapter2 Introduction......................................................................................................................7
2.1 RelationshiptoOtherDeliverables..........................................................................................7
2.2 AbbreviationsandAcronyms...................................................................................................7
2.3 DocumentStructure...............................................................................................................10
Chapter3 EventData......................................................................................................................10
3.1 UpdatesSinceD1.3................................................................................................................10
3.2 CustomEvents........................................................................................................................10
3.2.1 ObjectivesofanOntologyforCustomEvents.................................................................11
3.2.2 Methodology...................................................................................................................11
3.2.3 Schema.orgEventModel................................................................................................12
3.2.4 Partners’EventData.......................................................................................................14
3.2.5 UseCasesforInteroperableDescriptionsofCustomEventData...................................18
3.2.6 GuidelinesfortheDesignoftheOntology......................................................................18
3.2.7 TheEW-ShoppOntologyBasedonSchema.org..............................................................19
Chapter4 WeatherData.................................................................................................................24
4.1 UpdatesSinceD1.3................................................................................................................25
4.2 AlternativeSourcesofWeatherData....................................................................................25
Chapter5 MultilingualDataLinkingServices..................................................................................26
5.1 UpdatesSinceD1.3................................................................................................................27
5.2 HandlingKeywordsintheJOTDataset..................................................................................27
Chapter6 Conclusion......................................................................................................................28
References...........................................................................................................................................29
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
5
Listoffigures
FIGURE1.MAINTYPESUSEDINTHECUSTOMEVENTONTOLOGYANDTHEIRMUTUALRELATIONS...............................................20
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
6
Listoftables
TABLE1.ABBREVIATIONSANDACRONYMS.......................................................................................................................8TABLE2.SHORTREFERENCESFORPROJECTPARTNERS.........................................................................................................9TABLE3:SUMMARYOFTHETOOLSINTHEEW-SHOPPTOOLKIT............................................................................................9TABLE4-CENEJEEVENTDATAPROPERTIES.....................................................................................................................14TABLE5-BIGBANGEVENTDATAPROPERTIES.................................................................................................................14TABLE6-CDEEVENTDATAPROPERTIES........................................................................................................................15TABLE7-FACEBOOKEVENTDATAPROPERTIES................................................................................................................17TABLE8-EW-SHOPPPROPERTIES................................................................................................................................21TABLE9–PROPERTYMAPPINGS..................................................................................................................................23
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
7
Chapter2 IntroductionTheEW-Shoppprojectaimstosupportmoderne-commercebusinessesbygivingthemthemeanstoplace their business data into context. Roughly this context can be split into environmental andsocial.Fortheenvironmentalaspect,theprojectisfocusingonweatheranditseffectonconsumers.Forthesocialinfluences,theprojectofferstoolstoexploreimpactsofeventsaswellastoolstolinkbusinessdataacrosslanguagesinmultilingualsettings.
Theprojectevent,weatherandmultilingualdata serviceswereall developedearly in theproject,sincetheywereneededforthedevelopmentofthepilotsdescribedindeliverableD4.2[7].Duetothis, the deliverablewith the specificationof these data services,D1.3 [3], already also describedtheir technicaldetails andAPIs. Thisdocument therefore focusesonupdates sinceD1.3anddoesnotunnecessarilyrepeatcontent.
Foreventdatathelargestdevelopmentistheintroductionofcustomevents.TheseareeventsthatimpactbusinessesbutarenotcoveredinnewsmediaandcannotbedetectedusingdatafromEventRegistry–anewsmediamonitoringplatformandprimarysourceofprojecteventdata.Sincetheseeventsarebusiness-specificweintroduceanontologyfortheirdescription.
Weather data services have received few functional updates, withmost of the work focusing onimproving their stability and removing bugs. In this document we explore potential sources ofweatherdata for after theproject,whendata fromEuropeanCentre forMedium-RangeWeatherForecastsmaynotbeavailable.
Forthemultilingualdataservicestherearealsofewupdatestoreport,butweintroduceanewtaskthat arose during the development of the JOT business case. JOT data contains large amounts ofkeywords that need to be clustered based on their semantics, to enable efficient processing.Wedescribetheproblemandtheclusteringapproach.
2.1 RelationshiptoOtherDeliverables
This deliverable describes the updated versions of the EW-Shopp data services introduced anddescribed inD1.3[3].Theservicesusedata formatsspecifiedbydeliverableD1.2 [2]andfulfil theinteroperabilityrequirementsfromdeliverableD1.1[1].Theserviceswereusedinthedeploymentof pilots described in deliverable D4.2 [7]. Based on the pilots’ outcome the performance of theserviceswasevaluatedindeliverableD2.3[4].
2.2 AbbreviationsandAcronyms
AbbreviationsandacronymsusedinthedocumentareexplainedinTable1.
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
8
Table1.Abbreviationsandacronyms
Abbreviation Description
API ApplicationProgrammingInterface
BC BusinessCase
CSV CommaSeparatedValues
EAN EuropeanArticleNumber
EC EuropeanCommission
ECMWF EuropeanCentreforMedium-RangeWeatherForecast
EU EuropeanUnion
HTTP HypertextTransferProtocol
ID Identifier
JSON JavaScriptObjectNotation
JSON-LD JavaScriptObjectNotationforLinkedData
MARS MeteorologicalArchivalandRetrievalSystem
RDF ResourceDescriptionFramework
RDFS ResourceDescriptionFrameworkSchema
OWL WebOntologyLanguage
REST RepresentationStateTransferwebservices
URI UniformResourceIdentifier
URL UniformResourceLocator
UTF UnicodeTransformationFormat
W3C WorldWideWebConsortium
Table2showstheprojectpartnersalongwiththeirshortreferencesforeasiermentionsthroughoutthedocument.
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
9
Table2.Shortreferencesforprojectpartners
No. Beneficiary(partner)nameasin[GA] Shortreference
1 UNIVERSITÀDEGLISTUDIDIMILANO-BICOCCA UNIMIB
2 CENEJEDRUZBAZATRGOVINOINPOSLOVNOSVETOVANJEDOO CE
3 BROWSETEL(UK)LIMITED BT
4 GfKEURISKOSRL GfK
5 BIGBANG,TRGOVINAINSTORITVE,DOO BB
6 MEASURENCELIMITED ME
7 JOTINTERNETMEDIAESPAÑASL JOT
8 ENGINEERING–INGEGNERIAINFORMATICASPA ENG
9 STIFTELSENSINTEF SINTEF
10 INSTITUTJOZEFSTEFAN JSI
Finally,Table3containsasummaryofthetoolsandcomponentsoftheEW-Shopptoolkit,whicharementionedinthisdocument.
Table3:SummaryofthetoolsintheEW-Shopptoolkit.
ComponentName Shortdescription
DataGraft DataGraftisacloud-basedplatformfordatahostingandinteractivedatatransformations.InthetoolkitithastheroleofthedatawranglercomponenttogetherwithGrafterizer,itsdatatransformationinterface.ItisdevelopedandmaintainedbySINTEF.
ASIA Atoolforthesemanticenrichmentofdataavailableintabularformats.ItissupportedbyABSTAT,atooltoprofileknowledgegraphsrepresentedinRDFbasedonlinkeddatasummarizationmechanisms.ItisincludedasaplugininDataGraftandisdevelopedandmaintainedbyUNIMIB.
QMiner QMinerisadataanalyticsplatformforprocessinglarge-scalereal-timestreamscontainingstructuredandunstructureddata.Inthetoolkitithastheroleofthedataanalysercomponent.ItisdevelopedandmaintainedbyJSI.
Knowage Knowageisabusinessintelligencesuitewithstrongsupportforproducinghigh-qualityreportsofthetransformed,enrichedandanalysedinformationobtainedfromthetoolkit.Inthetoolkit,ithastheroleofthedata-reportingcomponent.ItisdevelopedandmaintainedbyENG.
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
10
2.3 DocumentStructure
Thisdocumenthasthefollowingstructure.Chapter2containstheintroductiontogetherwiththelistofabbreviationsandacronymsusedinthedocumenttextandthisstructureoverview.Thefollowingthere chapters,Chapter3Chapter4andChapter5describeevent,weatherandmultilingualdataservicesrespectively.Chapter6closesthedeliverablewithconcludingremarks.
Chapter3 EventDataEventsrepresentthesocialcontextoftheshoppers’journey.Theoccurrencesandobservancesthatmay encourage them to spend more, direct them to particular products, drive them to requestcustomersupportenmasseordistract themfromshoppingaltogether.Thischapterdescribes thedatasourcesandformatsofeventdataintheEW-Shoppproject.
3.1 UpdatesSinceD1.3
The source of global events in the EW-Shopp project is Event Registry, a platform formonitoringmass news media. Since it is an established platform and it already has an extensive andcomprehensiveAPI,little-to-noneextensionwasneededforthepurposesoftheproject.TheAPIanditsdataformataredescribedindetailindeliverableD1.3[3].
EventRegistryisaveryrichdatasource,butitonlyallowsustoobservetheworldthroughthelensofnewsmedia.Duringthedevelopmentoftheprojectpilotsitbecameincreasinglyclear,thatthisisnotsufficient.Thereisawiderangeofeventsnotcoveredinthenewsthatcanhaveamassiveeffectonconsumerbehaviour.Marketingcampaignswithspecialoffereventsanddiscountsareperhapstheclearestexampleofthis.Boththepricechangeeventfromthediscountofferedaswellastheincreased visibility from additional advertisement canmove themarket. Another example for thecaseofcallcentremanagement is thedatewhenreceipts formobilephonesubscriptionpackagesare sent to the subscribers. Customers commonly have questions regarding their receipts, whichincrease, call volume. If the company issuing the receiptshasby some chancemadea systematicerrorwhencalculatingreceiptamounts,thecallcentreisfloodedwithcallers.
Thepreviousparagraphpresentedtwoexamplesofeventswehavenamedcustomevents.Therearemanymoreeventsofsuchnatureandwecallthemcustomsincetheyarespecifictoeachbusinessandtypicallyneedtobeat leastpartially tailoredto theirneeds.Tosupportcustomevents in theEW-ShopptoolkitweintroduceanontologyfordescribingtheminSection3.2.
3.2 CustomEvents
This section is devoted to the definition of the EW-Shopp Ontology to model custom events ofinteresttotheproject.
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
11
3.2.1 ObjectivesofanOntologyforCustomEvents
ThedefinitionoftheEW-Shoppcustomeventontologyhasthegoalofharmonizingthedescriptionofeventsthatareprovidedorusedbypartners intheEW-Shoppproject. InEW-Shoppeventsareusedtoenrichinformationaboutothermeasuresthatdescribeabusinessphenomenonofinterestto build predictivemodels. Thesemeasures are different in different business cases.We refer toD4.2[7]andD2.3[4]foradetaileddescriptionofthedatapreparationandanalyticsworkflowsthatneedtobesupportedintheproject.
Theontologyforinternaleventshastheaimofdefiningasharedterminologytodescribeeventsandsupporttheintegrationofdataabouttheseeventswithexistingdatasoastobuildintegrateddatathatarefeedtotheanalyticalmodellingstepsintheworkflows.SinceinEW-Shoppwerefertothisintegration step as semantic enrichment, the aim of the internal event ontology is to supportsemanticenrichmentofadatasetwitheventdata.Moreingeneral,wecansetthegoalofthiseventontologyassupportingevent-basedanalyticsworkflowsintheindustry.
3.2.2 Methodology
The methodology adopted to design the EW-Shopp custom event ontology is inspired by amethodologyfortheagileandsimplifieddesignofontologiesproposedbySilvioPeroni [8],oneofthe most recent methodologies proposed for ontology design; in particular, this methodologyproposesacycleconsistingofthefollowingthreephases:M1)collectionofdomaininformationwiththe help of domain experts, definition of usage scenarios and test cases, definition of amodelet(ontologypiece)basedon theseprinciplesandmeeting theusage requirements,definitionof testcasesandreleaseofthemodelet;M2) integrationofthetestcaseswiththecurrentontology;M3)refactoring of the current ontology. The methodology also includes in the sub steps severalrecommendations:usageofaglossary(termstobeconsidered)forthedefinitionofthetestcases,reuse of ontology design patterns and existing ontologies, keep the modelets and the ontologysimpleandclosetotherequirementsspecifiedinthetestset,bestpracticesforentitynames.
TheworktothedefinitionoftheEW-Shoppinternaleventontologyhasbeenthereforeorganizedinthefollowingphases(weincludereferencestotheabove-mentionedmethodology).
1. (M1) State of the art: a comprehensive review of the literature and available tools wasalready conducted inD1.1 [1]. This preliminary study allowedus to identify the recurrentpatternsformodellingevents,andrankontologiesbypopularityandcompleteness.Tothisontology we added the analysis of event descriptions in Schema.org (enclosed in thisdocument). The outcome is that Schema.org is themost popular event ontology and themostcompleteaccordingtoEW-Shopprequirements.Thisontologyprovidesinfactseveralpatterns for modelling events and related information (a guideline recommended in theadoptedmethodology)asdiscussedinSection3.2.3.
2. (M1) Sampleeventdata collection: the actual definitionof EW-Shoppontology startedbycollecting event data samples from partners to identify the main concepts and data ofinterest foreachpartner.SamplesaretableswithdataextractedfromactualdatasetsandwillbepresentedinSection3.2.4.
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
12
3. (M1) Samples schema alignment: sample tables have been compared to identify commonconcepts (properties for the description of events), and preliminary data type definition(reportedinSection3.2.4).
4. (M1)Use and test cases definition: the usage of ontology-compliant event descriptions inEW-Shopp, with consequent test cases, is well defined in EW-Shopp: it consists in theenrichmentofcorporatedatawithcustomeventdatarelevantfortheiranalysis,asdefinedinthebusinesscases.
5. (M1)Definitionofguidelinesforthedefinitionoftheontology:basedonthereviewofthestateoftheartandontheanalysisofsamplesofeventdatausedbythepartners,wehavederived a set of guidelines that have inspired the definition of the ontology, which arereportedinSection3.2.5.
6. Ontologydefinition:Schema.orghasbeenadoptedasstartingontologytodefinemappingswherepossibleandaddnewconceptstocomplytheEW-Shoppneeds.Themainadvantageis to keep compliance with existing tools and systems that already adopt Schema.org asreferenceontology.Theresultsof thisphasearediscussed inSection3.2.7.Thisdefinitionphasehasfollowedthefollowingsubsteps:6.1. (M2) definition of the subset of Schema.org of interest based on the vocabulary
usedinthesampleschemas;6.2. (M2-M3)foreacheventdatasource:mappingofeachdataschemastoSchema.org
and extension of the ontology with the source properties not covered bySchema.org;
6.3. (M3)refactoringoftheontologyandfinalizationofthefirstversion.Theresultsofthesephasesarefurtherdescribedinthenextsubsections.
3.2.3 Schema.orgEventModel
AccordingtothedefinitiongiveninSchem.orgofficialdocumentation1,thedatamodelusedisverygenericandinspiredbyRDFSchema2.Schema.orgdefines
1. asetoftypesarrangedinamultipleinheritancehierarchywhereeachtypemaybeasub-classofmultipletypes.
2. asetofproperties:I. eachpropertymayhaveoneormoretypesasitsdomains.Thepropertymaybe
usedforinstancesofanyofthesetypes.II. eachpropertymayhaveoneormoretypesasitsranges.Thevalue(s)ofthe
propertyshouldbeinstancesofatleastoneofthesetypes.
ThedecisiontoallowmultipledomainsandrangeswaspurelypragmaticandrelatedtothedifficultyofspecifyingmultiplepossibledomainsandrangeswithontologyweblanguageslikeRDFSandOWL.Whilethecomputationalpropertiesofsystemswithasingledomainandrangeliketheonesbasedon RDFS are easier to understand, in practice, this forces the creation of a lot of artificial types,
1 Schema.orgeventmodel:http://schema.org/Event2RDFschema:http://www.w3.org/TR/rdf-schema/
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
13
whicharetherepurelytoactasthedomain/rangeofsomeproperties.Otherwise,OWLsupportsthespecificationofmultipledomainsandrangesbyspecifyingasadomain(orrange)thedisjunctionofmore classes or datatypes (here we will use the term type to refer to a class or a data type).However,OWLwasfoundtoocomplicatedtobeunderstoodbyalargenumberofpractitioners.Forthis reason, domains and ranges are specified in Schema.org using the meta predicatedomainIncludesandrangeIncludes.Inthefollowing,whenwesaythatapropertyPhasdomain“CorD”,we indicate thatPhasCandDas recommendeddomains, i.e., that<P,domainIncludes,C>and<P,domainIncludes,D>. These specifications are not strong enough to support inference but areintendedaspragmaticrecommendationsabouttheusageoftheproperties.Thisimpliesthatthereis no logical enforcing on domains and ranges in Schema.org and specifications of properties’domainsandrangesprovided inSchema.orgcanbeoverwrittenorslightlychanged inanontologythatreusethesepropertieswithoutcausingproper inconsistencies.Finally, itshouldbenotedthatpropertiesinSchema.orgareusedpolymorphicallywithclassesandliteralsasdomains(wewillreferto this feature as polymorphic property usage). So it may happen that a property, e.g.,schema:identifier has Text andProduct as domains,whichmeans that the valueof suchpropertymaybeapieceof text (e.g.,avalueof thexsd:Stringdatatype)oraURI identifyingan instanceofProduct.
The canonical machine representation of schema.org is in RDFa3. Representations in JSON-LD,Microdata,andOWL4arealsoavailable.
Schema.orgwas not designed to become a universal ontology. Instead, it is expected to be usedalongsideothervocabularies that share thebasicdatamodeland theuseofunderlying standardslike JSON-LD, Microdata and RDFa as proposed by Schema.org. We observe that JSON-LD(particularly in combination with Schema.org) is a W3C-supported language that has gained asignificantuptakeamongpractitioners andprogrammers. Polymorphicusageof properties canbeusedwithJSON-LD,whichimposesfewerrestrictionsthanRDFwhenusedunderOWLspecifications.Theseobservationswillbeconsideredinthedesignofthecustomeventontologydescribedinthisdeliverable(seeSection3.2.7).
Asoftoday,theeventmodelhasbeenadoptedinmany(between100.000and250.000)systemsasreported on the official site; including a popular WordPress plugin that adds complete JSON-LDbasedschema(structureddata)toeventpostsgeneratedwithfollowingplugins.
• StandardGoogleEventRichSnippetSchema.
• EventdetailpageEventschema
• AutomaticallycreateEventRickSnippetforyourevent,NoManualwork.
• WorkwithleadingEventCalendarPluginslikeEventManager,AllinoneEventCalendar,
• EventOn,WPEventAggregator, ImportFacebookEvents, ImportEventbriteEvents, ImportMeetupEvents
• EventsManagerTicketcanbeshowninGoogleSchema3CanonicalrepresentationofSchema.org:https://schema.org/docs/schema_org_rdfa.html4OWLrepresentationofSchema.org:https://schema.org/docs/schemaorg.owl
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
14
• For All In One Event Calendar by Time.ly support Event List, Agenda, Day,Month,Week,Posterboard,Stream.(Pro)
Maturity,flexibilityandpopularityoftheschema.orgmodelarethemainreasonsforitsadoptionfordevelopingtheEW-Shoppeventontology.
3.2.4 Partners’EventData
3.2.4.1 BC1-Pilot1-CenejeTable 4 reports the properties that have been found in the event data samples collected fromCeneje.
Table4-Cenejeeventdataproperties
Property Description
DateTime Eventdateandtime
ProductDescription Productname
EanCode EANcodeifexists
CenejeProductId Cenejeinternalproductid
CategoryId Cenejeinternalcategoryid
SellerId Sellerid
SellerProductId Sellerproductid
PriceChanged 1OR0ifpricechangedcomparetopreviousday
Price Price
Change Pricechangein%
3.2.4.2 BC1-Pilot2-BigBangTable5 reports theproperties thathavebeen found in theeventdatasamplescollected fromBigBang.
Table5-BigBangeventdataproperties
Property Description
ProductID ProductID
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
15
ProductDescription Productdescription/modelname
EANCode EANcode
ProductGroupLevel4 Productgroup
ActivityID Uniqueidofactivity
ActivityID_new Uniqueidofactivity
ActivityType Activitytypeaccordingtoclassification
ActivityTypeDesc Activitytypedescription
ActivityTitle Shortactivitydescription
ChannelID Mktchannelid
ChannelDesc Mktchanneldescription
ProductCatalogID IDoftheCatalog(distributedtohouseholds)
ProductCatalogDesc Catalog#
BeginDate Beginningoftheactivity
EndDate Endoftheactivity
PriceDiscount Whethermodelincludesdiscountin%orlowerpriceinEUR
Discount Discountin%
Price Priceonthepricelist(sellingpriceinaction=pricewithdiscount,taxincl)
3.2.4.3 BC1-Pilot3-Browsetel-CDETable6reportsthepropertiesproposedbyCDEtodescribeinternalevents.
Table6-CDEeventdataproperties
Property Description
ID EventID
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
16
NAME Shortdescription
DESCRIPTION Description
START_DATE Startdate
[END_DATE] Enddate(optional)
START_TIME Starttime
END_TIME Endtime
CLASSIFICATION_ID IDtoclassificationdefinition
QUANTITY Quantitativevalue(e.g.,numberofcalls)
QUANTITY_UNIT_ID Typeassociatedwithvalues(e.g.,countinginpositiveinteger)
PRODUCT_ID IDtoproductclassification
LOCATION_ID IDtolocationdefinition
CLASSIFICATION_DEFINITION
CLASSIFICATION_CODE Classificationcode
CLASSIFICATION_DESCRIPTION Description
PRODUCTDEFINITION ProductID
EAN_CODE EANcode
PRODUCT_DESCRIPTION Description
LOCATIONDEFINITION Catalog#
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
17
LOCATION_NAME Name
LOCATION_DESCRIPTION Description
GIS_X Longitude
GIS_Y Latitude
3.2.4.4 BC3-MeasurenceThis case isdifferent from theonesofotherpartners sinceMeasurence is interested in capturingeventsassociatedwithcampaignsthatarepromotedasFacebookevents.Therefore,theEW-ShoppmodelshouldcapturethepeculiarpropertiesfromtheFacebookeventdatamodel.Table7reportsthepropertiesofinteresttakenfromtheFacebookAPIdescriptions5.
Table7-Facebookeventdataproperties
Property Description
id TheeventID
name Eventname
description Long-formdescription
start_time Starttime
end_time Endtime,ifonehasbeenset
interested_count Numberofpeopleinterestedintheevent
attending_count Numberofpeopleattendingtheevent
PlaceID EventPlaceinformation
PLACEDEFINITION
name Name
5 FacebookeventAPI:https://developers.facebook.com/docs/graph-api/reference/event/
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
18
city City
country Country
country_code Countrycode
latitude Latitude
longitude Longitude
region Region
street Street
3.2.5 UseCasesforInteroperableDescriptionsofCustomEventData
Enrichmentwith custom event data has already been donewith ad-hoc coding strategies for theimplementation of BC1 pilot services (as developed by the companies Browsetel, Ceneje, and BigBang).DeliverableD2.3[4]reportsabouttheevaluationofthecapacityofthetoolkitinreplicatingthedataworkflowsusingthetoolsdevelopedintheproject.
The integrationofFacebookevents forusage inBC3(developedbyMeasurence) isongoing,whileeventdatausedinBC4willrelyontheEventRegistry.
Test cases for the custom event data ontology consist in the successful development of dataenrichmentworkflows that use representationsof customeventsbasedon the vocabularyof thisontology. Inparticular, theontologyhas thegoalofmaking it possible forpartners to share theircustomeventsviaAPIs,whichreturneventdata inJSON-LDformatbasedontheEW-Shoppeventontology.Theontologywillworkasarecommendationforpartnersabouttheterminologytouseforsharing and exchanging custom event data. We plan to extend the ASIA tool to consume theserepresentations, possibly defining widgets in the ASIA GUI to fetch events similar to the widgetdevelopedforweatherdataanddescribedinD2.3[4].
3.2.6 GuidelinesfortheDesignoftheOntology
Based on the goal of the event ontology, i.e., supporting event-based analytics workflows in theindustry, and on the previous steps of the adopted methodology, we have drawn the followingguidelinestodrivethedesignoftheontology:
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
19
1. Harmonization re-using sharedontologies. Tomake theontology valuable andextensiblebeyondthespecificdatapreparationandanalyticsworkflowssupported intheproject,wewilltrytousetheterminologyofexistingontologiestoharmonizetheterminologyusedtodescribeevents.
2. Limitednestingofeventdescriptions.Afterthesemanticenrichmentstep,eventdatawillappearincolumnsofatablethatcontainstheenricheddata;asaconsequence,whenusedin the analytical modelling steps, event descriptions are flattened into a table; the eventontologyshould,therefore,nativelysupporttheenrichmentstep.
3. Intuitiverenderingofpropertiesas tableattributes ineventdescriptions.Becauseof (2),thecolumnheadersshould intuitivelydescribethecontentof thecolumn;whilesearchingfor harmonizing the terminology used to describe the event, i.e., reducing the number ofdifferent terms used to describe similar properties of the events, the terminology mustmake the data still understandable by users who will work with them in the analyticalmodelling steps of the workflow. As a consequence, some of the terminology used bypartnerstodescribetheireventswillbepreservedintheontology.
4. Polymorphicpropertyusageandheuristicspecificationsofdomainsandranges.Wefoundthat the reasons that motivated the polymorphic property usage and the heuristicspecification of domains and range, i.e., as a recommendationmore than as a normativespecification, also applies to the contextwhere this event ontology is used. For example,also in this case, the event ontology would be mostly used to specify the meaning ofpropertiesusedindataexchangedusingtheJSON-LDformat.Wheneventdatawillappearinanenricheddataset,eventswillbeeithermodelledinJSONorinatabularformat;inthefirstcase,JSON-LDisfullyJSONcompliant;inthesecondcase,ontologytypeswillnotappearwhilepropertynameswillbeheadersofthecolumns.
3.2.7 TheEW-ShoppOntologyBasedonSchema.org
We introduceaproperty-drivenontology,whichmeans that theprimarygoal is toharmonize thepropertiesusedtodescribeevents.WhendatawillbecollectedasJSONdata,JSON-LDcanbeusedtoreusetheontologyproperties;whendatawillbecollectedasorfactoredintoatable,propertiescan provide the header for each column. For this reasonwemostly specify the properties of theontology,identifyingaminimalnumberoftypesthatarerelevantbecauseusedastypesofsubjectsorobjects(values)fortheseproperties.
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
20
Figure1.Maintypesusedinthecustomeventontologyandtheirmutualrelations
According to thedecisionofadoptingSchema.orgas referenceontology,we identifiedamong theavailablepropertiesthosethatcanbemappedtotheonesinusebypartners.Forthosethatdonotrepresent the concepts of interest we introduced new properties as specialization of existingSchema.orgproperties,sotokeepthehighestcompliancypossible.
Table 8 reports the properties taken from Schema.org (with schema prefix and highlighted withorangebackground), and theones introducedbyEW-Shopp (withewsprefixandhighlightedwithlight orange background). In the notes, we report: references to Schema.org types fromwhich aproperty is derived (when possible),wherewith “derived from a type”wemean that the type isspecifiedamongthedomainsoftheproperty,and,foranewproperty,thepropertyofSchema.orgwhichthepropertyisasubpropertyof.
TheEW-ShoppEventOntologyisanontologyspecifiedinRDF.Themaintypesusedintheontologyand theirmutual relations are depicted in Figure 1.We omit from the figure the properties thateither have data types as domains (e.g., integers, floats, etc.), with the only exception of time-relatedinformationthatiscrucialforeventrepresentation;otherpropertiesthathaveliteralvaluesordescribemoredetailed information,e.g.,ofpostaladdressesareomittedanddescribed later inthissubsection.ThedarkorangecolourindicatestypesandpropertiesspecifiedinSchema.org,thelightorangecolour indicates typesandproperties introduced in theEW-Shoppontology (with the“ews:”prefix),thegreencolourindicatesdatatypesandthepurplecolourindicatesthegenericURItype(consideredequivalenttoThing)andonetypefromanotherontology.WeomittheprefixesofalltypesandpropertiesthatareeitherreusedfromSchema.orgorbasedonxsd:types(i.e.,TimeandDateTime).
Theontologyhasthefollowingproperties:
• ItisbasedonanextensionofSchema.orgontology.• As Schema.org it uses polymorphic properties and heuristic domain/range specifications
(with includesDomain and includesRange); this featuresmake itdifficult toproperlydepict
O
NTO
LOG
IE E
“TE
CN
OLO
GIE
SE
MA
NTI
CH
E”
Event
Place
PostalAddress
Product
URI | skos:Concept location
location
category
category
address
ews:product
ews:MarketingEvent
Date | DateTime
endDate startDate
subClassOf
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
21
multiple domain and range specifications in Figure 1 (wherewe representmultiple rangespecificationsassinglenodeswithmorelabelsseparatedbythe“|”symbolandonlyreportmaintypesusedasranges).
• Themain typesconsidered in theontologyarederived fromSchema.org,whereare listedamongthemostfrequentlyusedtypes6.Thesetypesare:
o schema:Event,whichisthetypeassociatedtoallevents;o schema:Product,whichisthetypeassociatedtoproducts;o schema:Place,whichisthetypeassociatedtolocations;
• Additionaltypesusedintheontologyare:o ews:MarketingEvent,whichistheonlynewtypeintroducedintheontology,andis
definedassubclassofschema:Event;o skos:Concept, which is defined as possible type for a property schema:category,
which is introduced in to associate a category to an event; the typeschema:CateogryCode is pending in the Schema.org definition and not used indomainssimilartotheonesaddressedinEW-Shoppsofar;forthisreasonwereusedatype(i.e.,anOWLclass)definedinSKOS,aW3C-recommendedlanguagetodefinesimplecategorizationsystems;
o schema:PostalAddress,which isusedbecause it is the recommendedvalue for theschema:addresspropertythatisattributedtolocations(instancesofschema:Place);inpractice,apostaladdressisaplaceholderusedtoaggregatemorespecificaddressinformation specified using a number of properties; leveraging the non-normativespecification of domains and ranges in Schema.org, we also consider descriptionswhere these properties (e.g., schema:postalCode ) are directly referred to placeswithoutusinganinstanceofpostaladdressasintermediary.
Schema.orgdoesnotprovidepropertiestodescribemeasuresofevents’aspects,e.g.,thenumberof attendees; we introduced several properties to describe these measures; in this case, wepreferred to keep a terminology as close as possible to the terminology used to specify thesemeasuresby thepartners; however,we linked theseproperties to Schema.orgby specifying theirsuperpropertiesinSchema.org.
Table8-EW-Shoppproperties
NAME RANGE DESCRIPTION NOTES
EW-Shoppcustomeventdefinition(propertiesthatdescribeinstancesofschema:Event)
schema:identifier TextorURI Anidentifierofanitem schema:Thing
schema:name Text Thenameoftheitem. schema:Thing
schema:description Text Adescriptionoftheitem. schema:Thing
ews:source Text Adescriptionofthesourceoftheevent
ews:channelCode Text Acodeassociatedwithachannelinamarketingevent ews:MarketingEvent
ews:channelDescription Text Adescriptionassociatedwithachannelinamarketingevent
6https://schema.org/docs/gs.html#schemaorg_types
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
22
schema:startDate DateorDateTime
Thestartdateandtimeoftheitem(inISO8601dateformat).
schema:Event
schema:endDate DateorDateTime
Theenddateandtimeoftheitem(inISO8601dateformat).
schema:Event
schema:category URI Acategoryforanitem
schema:Thing(subpropertyofschema:about;rec.rangeisskos:Concept)
ews:quantity xsd:int AnumberidentifyingagenericquantitySubpropertyofews:simpleMeasure
ews:quantyUnitId URIorTextThespecificationoftheunitinwhichaquantityismeasured
Subpropertyofschema:identifier
ews:interestedAudience xsd:int Thenumberofpeopleinterestedinanevent Subpropertyofews:simpleMeasure
ews:attendingAudience xsd:int Thenumberofpeoplewhoplantoattendanevent Subpropertyofews:simpleMeasure
ews:priceChanged BooleanAmeasurethatassignsabooleanvaluetospecifyifthepriceofaproducthaschangedornot
Subpropertyofews:booleanMeasure
schema:discount TextorBoolean Anydiscountapplied(toanOrder) schema:Order
ews:priceChange xsd:float Pricechangein% Subpropertyofews:simpleMeasure
schema:price xsd:float Theofferpriceofaproduct,orofapricecomponentwhenattachedtoPriceSpecificationanditssubtypes. schema:Offer
ews:product URIorProductTheproducttheeventrefersto-ifwearedescribingeventsaboutproducts
Subpropertyofschema:about
schema:locationPlaceorPostalAddressorText
Thelocationofforexamplewheretheeventishappening,anorganizationislocated,orwhereanactiontakesplace.
schema:Event
ews:simpleMeasure xsd:floatorxsd:int Ameasureusedto Subpropertyof
schema:value
ews:booleanMeasurexsd:floatorxsd:int Ameasurethatassignsabooleanvalue
Subpropertyofschema:value
EW-Shoppclassificationdefinition
schema:description Text Adescriptionoftheitem. schema:Thing
EW-Shoppproductdefinition(propertiesthatdescribeinstancesofschema:Product)
schema:gtin13 TextTheGTIN-13codeoftheproduct,ortheproducttowhichtheofferrefers.Thisisequivalentto13-digitISBNcodesandEANUCC-13.
schema:Product
schema:description Text Adescriptionoftheitem. schema:Thing
schema:seller URIAnentitywhichoffers(sells/leases/lends/loans)theservices/goods.Asellermayalsobeaprovider.
schema:BuyActionorschema:Offerorschema:Order
schema:sku TextTheStockKeepingUnit(SKU),i.e.amerchant-specificidentifierforaproductorservice,ortheproducttowhichtheofferrefers.
schema:Productorschema:Offer
ews:catalogId Text Specifytheidentifier Subpropertyofschema:identifier
schema:description Text Adescriptionoftheitem. schema:Thing
schema:category URI Specifiedassubpropertyofschema:about;rangeisskos:Concept
schema:Productorschema:Thing
EW-Shopplocationdefinition(propertiesthatdescribeinstancesofschema:PlaceandPostalAddress)
schema:name Text Thenameoftheitem. schema:Thing
schema:description Text Adescriptionoftheitem. schema:Thing
schema:addressLocality Text Thelocality.Forexample,MountainView. schema:PostalAddress
schema:addressCountry CountryorText Thecountry.Forexample,USA.Youcanalsoprovidethetwo-letterISO3166-1alpha-2countrycode.
schema:PostalAddress
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
23
schema::addressCountry CountryorText Thecountry.Forexample,USA.Youcanalsoprovidethetwo-letterISO3166-1alpha-2countrycode.
schema:PostalAddress
schema:latitude NumberorText Thelatitudeofalocation.Forexample37.42242(WGS84). schema:GeoCoordinates
schema:longitude NumberorTextThelongitudeofalocation.Forexample-122.08585(WGS84). schema:GeoCoordinates
schema:addressRegion Text Theregion.Forexample,CA. schema:PostalAddress
schema:streetAddress Text Thestreetaddress.Forexample,1600AmphitheatrePkwy. schema:PostalAddress
schema:postalCode Text Thepostalcode.Forexample,94043. schema:PostalAddress
schema:address TextorPostalAddress
Theaddress,possiblyspecifiedasastructuredPostalAddressspecification.
schema:Placeorschema:Personorschema:Organizationorschema:GeoShapeorschema:GeoCoordinates
3.2.7.1 MappingBetweenEW-ShoppOntologyPropertiesandPartners’PropertiesThefollowingTable9reportsthemappingsbetweenpropertiesdefinedintheEW-Shoppontologyandthepropertiesdiscussedintheprevioussection(theintendedsemanticsofamappingsbetweentwoproperty is that they represent equivalent relations). Full descriptionof themappings canbefoundonaspreadsheetonline7
Table9–PropertyMappings
EW-Shopp BT CE BB ME(Facebook)
CUSTOMEVENTDEFINITION
schema:identifier ID ActivityID id
schema:name NAME ActivityTitle name
schema:description DESCRIPTION description
ews:source SOURCE
ews:channelCode ChannelID
ews:channelDescription ChannelDesc
schema:startDate START DateTime BeginDate start_time
schema:endDate [END] EndDate end_time
schema:category CLASSIFICATION_CODE CategoryId ActivityType
ews:quantity QUANTITY
ews:quantyUnitId QUANTITY_UNIT_ID
ews:interestedAudience interested_count
ews:attendingAudience attending_count
ews:priceChanged PriceChanged
schema:discount PriceDiscount
ews:priceChange Change Discount
schema:price Price Price
ews:product PRODUCT_ID CenejeProductId ProductID
7https://docs.google.com/spreadsheets/d/1DgaWlVJiI2ZvXT_z8B3kGx4W6XcmwWGdOw7SGL0HbK8/edit?usp=sharing
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
24
schema:location LOCATION_ID Place/LocationID
ews:simpleMeasure
ews:booleanMeasure
CLASSIFICATIONDEFINITION
schema:description CLASSIFICATION_DESCRIPTION ActivityTypeDesc
PRODUCTDEFINITION
schema:gtin13 EAN_CODE EanCode EANCode
schema:description PRODUCT_DESCRIPTION ProductDescription ProductDescription
schema:seller SellerId
schema:sku SellerProductId
ews:catalogId ProductCatalogID
schema:description PRODUCT_DESCRIPTION ProductDescription ProductDescription
schema:category ProductGroupLevel4
LOCATIONDEFINITION
schema:name LOCATION_NAME name
schema:description LOCATION_DESCRIPTION
schema:addressLocality city
schema:addressCountry country
schema::addressCountry country_code
schema:latitude GIS_X latitude
schema:longitude GIS_Y longitude
schema:addressRegion region
schema:streetAddress street
schema:postalCode zip
schema:address
3.2.7.2 EncodingandDataFormatsAlldata,and,inparticular,textualdatawillberepresentedusingUnicodeUTF-8characterencodingtosupportinteroperabilityacrosslanguagesatthealphabetlevel.
Chapter4 WeatherData
Weather is amajor factor in the environmental context of the shopper’s journey. To analyse andmodelitseffectsonshopperbehaviour,theprojecthasanagreementwiththeEuropeanCentreforMedium-RangeWeather Forecasts8 (ECMWF) to access itsMARS9weather data archive. As this istheir operational archive, this means the project can obtain historic weather state data, historic
8https://www.ecmwf.int/9MeteorologicalArchivalandRetrievalSystem
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
25
weatherforecastdataaswellascurrentweatherforecastdata.AwrapperAPIaroundtheECMWFAPIwasdevelopedforprojectpurposes.ItwaspresentedindeliverableD1.3[3]andhasbeenusedforthedevelopmentofalltheprojectpilots.
4.1 UpdatesSinceD1.3
ECMWF provides an API in Python for access to its weather data archive. It is intended anddevelopedforuseinmeteorologicalinstitutions,whichmeansitisnotwellsuitedforbusinessuse.For example, it uses a set of internal codes to denote individual weather parameters to access(temperature, pressure etc.) and it retrieves the data inGRIB, a binary format commonly used inmeteorology.
To enable theuseofweather data in projectworkflows awrapperAPIwasdeveloped inPython,whichstreamlines the retrievaloperationsofbusiness-relatedweatherparameters (i.e. those thatmay reasonably be expected to influence shopper behaviour). For example, it retrieves relativehumidityoftheairbutskipsthetemperatureoflakewater.TheAPIalsosupportsaggregationovertimeandgeographicalregions.
TheAPIisavailableonapublicGitHubrepository10.Documentationwithexamplesofuseisprovidedin its wiki page11. Since it was necessary for the development of the pilots, its developmentwasprioritizedearlyintheprojectanditwasinaverymaturestatewhenitwasdescribedinD1.3[3].Inthe past year, there were little-to-none functional updates to it, only bug fixes and stabilityimprovements.
4.2 AlternativeSourcesofWeatherData
Theagreement theprojecthaswithECMWFtoaccess theirdata is for researchpurposesandwillendwiththeproject.Touseweather-basedanalyticsinthetoolkitaftertheprojectanappropriatesource of weather datawill need to be used. Herewe provide a set of options collected after asurveyofthefield.
ECMWF:ThesimplestoptionwouldbetoreachanewagreementwithECMWFandcontinueusingtheir data. Some appropriate level of compensationwould of course need to be negotiated. Themain consumers of ECMWF data are research institution and weather forecast agencies, whoseneedsdifferfromthosefortheworkflowsdevelopedintheproject.InourearlytalkswithECMWFtheywereopentothe ideaofexploringnewpossibilities forexploitingtheirdata.Theprojectwillreopenthisdialogueandpresenttheresultscollectedfrompilotstotogetherexaminepossibilitiesforcollaborationinfuture.
National weather forecast agencies: Most European countries have their own national weatheragencies that produce local forecasts, conduct meteorological research and provide weatherforecast data to other institutions in the country. Most of these countries are also member or
10https://github.com/JozefStefanInstitute/weather-data11https://github.com/JozefStefanInstitute/weather-data/wiki
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
26
cooperatingstatesinECMWFandsharedatawiththem.Theyareagoodoptiontoprovideweatherdata as the formatwould likely be very close to that of ECMWF and littlemodificationwould beneeded.On thedown side, individual agreementswith eachof themwouldneed tobemade foreachcountryyou’dwanttocover.
OpenWeatherMap12:OpenWeatherMap is a service inspired by thewell-knownOpenStreetMap13projectthatprovidesaccesstoglobalweatherdata–bothhistoricdataandforecasts–viaaRESTAPI.Theyofferafree-tierservicethatallows60requestsperminutefordataoneitherthecurrentweather state or a forecast on 3 hours for the next 5 days, which can reasonably cover thepredictionneedsofseveralpilots.Theyalsooffercommercialuseraccountswithmorerequestsanddailypredictionsforupto16daysahead.Accesstohistoricdataforthepast5yearsisalsoavailableatacost,sotheycanalsoprovidebulklearningdata.Theweatherparameterstheyoffercorrespondto the ones used in the project to a large degree, so the servicemay be a goodmatch even formodelsbuiltonECMWFdata.
WeatherUnderground14:ThisserviceisasubsidiaryoftheIBM-ownedTheWeatherCompany.Theyofferasubscription-basedweatherdataAPIwhereitispossibletoobtainglobalweatherstateandforecastdataforupto15daysahead.AtanextracosthistoricdatafromJuly2011isalsoavailable.
DarkSky15:LikeWeatherUndergroundtheyofferasubscription-basedweatherdataAPIforweatherstate,forecastforupto7daysaheadandhistoricdatagoingbackdecades.
AccuWeather16: Another well-known service offering a subscription-based API. Forecast data isavailablefor15daysahead.Somehistoricweatherdataisalsoavailableasaseparateexpense,butitisunclearhowfarbackitgoes.
Chapter5 MultilingualDataLinkingServices
The tools formultilingual data linking ensure the interoperability of the EW-Shopp toolkit acrossdifferent languages.Moderne-commercebusinesses typically operate acrossdiverse geographicalregions.Inordertoleveragedataandinsightsoverlanguageborderssomemeansofinterlinkingthisdatamustbeprovided.ThischaptercoverstheservicesdevelopedwithinEW-Shopptosupportsuchinterlinking.Aswithpreviousdataservicesdescribedinthisdeliverable,theyweredevelopedearlyto support deployment of the pilots andwere already described in deliverableD1.3 [3]. Herewe
12https://openweathermap.org/13https://www.openstreetmap.org/14https://www.wunderground.com/15https://darksky.net/dev16https://developer.accuweather.com/
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
27
focus on updates done since then. We also describe the keyword clustering task that emergedduringdevelopmentoftheJOTbusinesscasepilot.
5.1 UpdatesSinceD1.3
Multilinguality is covered in EW-Shopp by supporting cross-lingual linking with the ASIA dataenrichmenttool.WerefertodeliverableD3.2[5](Chapter3)fordetailsaboutdatalinkinginASIA.
ASIA supports cross-lingual instance-level annotations by plugging in cross-lingual reconciliationservices. Cross-lingual reconciliation services are based on multilingual indexes for the referencedata(thedatausedforreconciliation).ServicesthatcovermultilingualityusedavailableforASIAare:
• Wikifier, which covers Wikipedia entities in 130 languages (described above in thedeliverable)–seedeliverableD1.3[3]fordetails.
• GeoNames, which covers labels of spatial entities in a large variety of languages. Thecoveredlanguageschangefromentitytoentity,butusually includethelocal languageofatoponym. This means that a data provider that provides data of companies for a givenjurisdictionwheretoponymsarenamedusingthelocallanguagewouldbeabletoreconcilethetoponymagainstGeoNames.
• Wikidata,whichprovidesdifferent reconciliation services (oneper language) to import asneeded.
Wikidata and Wikifier are currently not used in in EW-Shopp data enrichment workflows, butWikifierisusedbytheEventRegistry.ThekeymultilingualdatareconciliationserviceforEW-Shoppdata enrichment workflows is GeoNames, which help reconciling location toponyms to theGeoNames knowledge basewhere locations are associatedwith geo coordinates,which, on theirturncanbeusedtofetchweatherdatafromtheMARSAPIs.
Finally,wealsoremarkthatallthedatalinkingserviceswilluseUnicodeUTF-8characterencodingtosupportinteroperabilityacrosslanguagesatthealphabetlevel.
5.2 HandlingKeywordsintheJOTDataset
Thetask in the JOTbusinesscase is toanalyseandpredict thedynamicsof impressionsofGooglekeywords in target regions based on environmental and social context, i.e. weather and events.Their pilot is unique among all project pilots in the sheer volume of text data (for details seedeliverablesD4.1 [6]andD4.2 [7]).TheJOTpilotdatasetcontains informationaboutthetimeandregionaldistributionofmillionsofGooglekeywords.Sofar,theapproachintheprojecthasbeentomodelthemindividually,butpilotexperimentshaveshownthatthisdoesnotscaletothe levelofdatainvolved.
Amorefeasibleandpracticalapproachwouldbetomodelgroupsofrelatedkeywordstogether.Thiswouldensurebetterqualityofdataandstrongersignalforthemodelsaswellasreducethenumber
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
28
ofmodelsneeded.Indeed,theoriginalideaofthepilotwastouseGooglekeywordcategories,butunfortunatelyitwasnotpossibletoobtainthembelowthetoplevel,whichistoocoarse.
To overcome this, a method for clustering the keywords together based on their semantics isneeded. A state-of-the-art approach for this is to useword embeddings –mappings of individualwordsintovectorsofrealnumbers.Thisreducesthedimensionsofthewordspacefromthenumberofallwordstosomefixed,muchsmallernumber.Byusingalargecorpusoftext,theembeddingcanbebuilt fromdata insuchaway, thatvectorsofsemanticallyrelatedwordsareclosertogether intheembeddingspacethanthosethatareunrelated.
Anadvantageof this approach is that thewordembeddings canbepre-built and then reused forseveralproblems.Sincecomputingthemisaveryintensivebatchjobthatdemandsalotofdataandhardwareresources,someinstitutionshavealsostartedofferingfreeopen-sourcedembeddingsforpublic use. For clustering of Google keywords, the embeddings released with the fastText17 textclassification library developed by Facebook [9] are well suited. They offer embeddings for 157languages[10],includingSpanishandGermanwhicharerelevantfortheJOTcase.Theembeddingsarebuilt ona combinationofCommonCrawl andWikipediadata (fordetails see [10]),whicharebothlargeandwell-curateddatasets,promisinggoodqualityembeddings.
One final non-trivial technical detail is how touse theword embeddings to represent theGooglekeywords. Though Google uses the term “keywords”, they are in fact multi-word phrases. Thismeansweneedtohaveawaytorepresenttheentirekeywordusingtheembeddingsofitswords.Ourplanistousethestate-of-the-artapproachdescribedin[11],whereaweightedaverageofthewordsiscomputedandthencorrectedusingdimensionalityreductionmethodology.
ThissectionpresentstheplannedapproachforhandlingthelargeamountofGooglekeywordsintheJOTbusinesscasetoenablescalableanalytics.Whilethisdocumentisbeingwritten,theapproachisbeing implemented18. At this point it is unclear if the embedding-based-clustering functionalityshouldbeofferedaspartoftheEW-Shopptoolkitor if itremainsasacase-specificpre-processingstep.ThatdependsatleastinpartonthefinalperformanceofthisapproachfortheJOTcaseaswellastheeffortneededtoincorporateitintothetoolkitenvironment.
Chapter6 ConclusionThisdeliverablepresentedtheevent,weatherandmultilingualdataservicesdevelopedandusedintheEW-Shoppproject.TheseserviceswerefirstintroducedinanearlierdeliverableD1.3[3]wheretheir specifications and APIs are already described. This document focuses on the updates andextensionstotheservicesbasedonexperiencefromdevelopmentanddeploymentofthebusinesscasepilots.
17https://fasttext.cc/18TheneedtoclustersimilarkeywordsemergedduringthedevelopmentofJOTpilotservices,asdocumentedinD4.2.
EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1
29
OfalltheextensionsdescribedthelargestisthedevelopmentoftheontologyforcustomeventsinSection3.2.ThisontologyprovidesbusinessesusingtheEW-ShopptoolkitthemeanstodescribeanyeventsimpactingtheirbusinessdynamicsthatarenotcapturedbytheEventRegistrydatasource.
Someaspectsofthedataservicesremainopen.ForweatherdataitisunclearwhichofthepossiblealternativedatasourceslistedinSection4.2isthebest.Itispossiblethatdifferentsourcesmaysuitdifferentbusiness.Also,thekeywordclusteringtoolfortheJOTdatadescribedinSection5.2isstillunder development. Though the methodological details of the approach are clear, its technicalimplementationisstillemergingandwithit itsroleintheEW-Shopptoolkit.Theseissuesremainatopic of ongoing work and dialogue with the business partners and will be revisited in followingdeliverables.
References
[1] D1.1:InteroperabilityRequirementsSpecification
[2] D1.2:Spatial,TemporalandProductDataFormatSpecification
[3] D1.3:Event,WeatherandMultilingualDataServicesSpecification
[4] D2.3:EW-ShoppPlatformEvaluationAssessment
[5] D3.2:EW-ShoppComponentsasaService:Transformation,LinkingandAnalytics
[6] D4.1:BusinessCaseRequirements
[7] D4.2:PilotsDeployment
[8] Peroni,S.,2016,Asimplifiedagilemethodologyforontologydevelopment.InOWL:ExperiencesandDirections–ReasonerEvaluation(pp.55-69).Springer,Cham.
[9] Joulin, A., Grave, E., Bojanowski, P. and Mikolov, T., 2016. Bag of tricks for efficient textclassification.arXivpreprintarXiv:1607.01759.
[10] Grave,E.,Bojanowski,P.,Gupta,P.,Joulin,A.andMikolov,T.,2018.Learningwordvectorsfor157languages.arXivpreprintarXiv:1802.06893.
[11] Arora, S., Liang, Y. and Ma, T., 2016. A simple but tough-to-beat baseline for sentenceembeddings.