D2.3 EW-Shopp Platform evaluation assessment. v1.1€¦ · Grant n. 732590 D2.3 - EW-Shopp Platform...

D2.3 - EW-Shopp Platformevaluationassessment

Deliverablen: 2.3Date: 31December2018Status: ReleaseVersion: 1.0Authors: LorenzoSutton (ENG),Michele Ciavotta (UNIMIB),AljažKošmerlj (JSI),

Nikolay Nikolov (SINTEF), Olga Melnyk (MEASURENCE), Patricija Orel(BB)

Contributors: ALL Distribution: Public

Grantn.732590-H2020-ICT-2016-2017/H2020-ICT-2016-1

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

HistoryofChanges

Version Date Description Contributors

0.1 22.10.2018 DefinitionofTableofContents ENG

0.2 31.10.2018 UpdateoftheToCandstructure ENG

0.3 29.11.2018

UpdatedsectionsbasedonBusinessCases flows,agreed upon and consolidated at the GeneralAssembly and assigned partner responsibility foreachsection

ENG

0.4 13.12.2018 Integrated BC1_P1 ingestion part, BC3 ingestionandanalytics,allotheranalytics,BC4enrichment

JSI,MEASURENCE,CENEJE,UNIMIB

0.41 17.12.2018 Several inputs to enrichment in all BCs, BC4ingestiondescription UNIMIB,JOT,BT

0.42 18.12.2018 IntegratedBC2_P2ingestionpart BB

0.5 20.12.2018FinalisedenrichmentandvisualisationsectionsforalltheBCs,integratedpartoftheBC4enrichment.Generalintegrationsandpolishing

ENG,UNIMIB,SINTEF

0.51 20.12.2018Various edits and minor integrations/correctionsfrom partners; version ready for internal pre-review

ENG

0.6 29.12.2018 Update according to internal quality review bySINTEF ENG,SINTEF

0.7 30.12.2018 Pre-finalversionforCoordinatorreview ENG

1.0 31.12.2018 Finalversion UNIMIB


ExecutiveSummary

InthisdeliverableanevaluationfortheEW-Shoppsetoftoolsatmonth24oftheprojectiscarriedout.Thisreleasewillbefollowedbyafinalevaluationatmonth36.EW-Shoppisastronglybusiness-oriented Innovation Action and therefore also business-driven and building its toolset on existingtechnologieswiththeaimtoservebusinesses inofferingvalue-addedweather-andevents-basedservices. To this end the EW-Shopp consortium includes five companies (‘business partners’)operating in different business sectors who are carrying out a total of five EW-Shopp Pilots toeffectivelycraftanddemonstratetheEW-Shoppservices intherealword. Indeedthismeansthat,especially as theprojectprogresses in its final phases, EW-Shopp is stronglydrivenby thevisionsand business needs of our business partners and therefore the requirements, innovations andoutcomes of the EW-Shopp Pilots. Given such user-oriented approach, evaluation at this stagemeansassessingifandhowtheEW-ShopptoolsetcompliestotherequirementsofeachPilotandifanyimprovementsorcorrectivemeasuresareneededinthenextperiod.

In thiscontext, theEW-ShoppPilotscoverawidespectrumofbusinessenvironmentsandsectorsrangingfromconsumerretailtoB2Bsalesandfrominternetadvertisingtoreal-worlddataanalyticsfor business performancemetrics.While this is a clear advantage for the project as it provides abroadarrayofrequirementsandtest-bedsforEW-Shopp,fromtheevaluationpointofviewitmeansfindingaconsistentmethodologytoassessscenarioswhicharequitedifferent.Tothisendthefirststepweadoptedhas been to agreeupon a commondataflowmethodologywhich is on theonehand flexible enough to describe all Pilots, and on the other hand consistent enough tosystematically evaluate all of them (and all of the EW-Shopp tools involved). The dataflow(described indetail in this report), foresees4 logical-functional stepswhichcover thewholePilot‘foodchain’ fromAcquisition (ingestion)of therawdata fromdifferentsources (suchascompanydata,opendata,etc.),throughEnrichment(forinstancereconciling,filtering,andliterallyenrichingdiversedata-setswitheachother),Analytics (wheresomeof the innovativeEW-Shoppalgorithmsforpredictionsandmodelsareemployed),toVisualisation(wherethedataisactuallyshowntofinalusersinaninteractive,intuitiveandeffectivemanner).

Applyingtheagreeddataflowmethodology,wehaveassessedthatallEW-ShoppPilotrequirementsas of month 24 are successfully met and tools comply with each of the logical-functional stepsdefined.Overall,EW-ShoppisprovidingausableandusefultoolsetabletoprovideaddedvaluetoeachitsBusinessCasesandusers.OfcoursethedegreetowhicheachofthetoolsisinvolvedineachPilotvaries fromcase tocase, similarly to thescopeofeachstep,which isnaturalbecauseof thedifferentusecasescarriedout.ForeachofthePilotsthereisstillroomforimprovement,especiallyinthedomainofaddingcertaindata(suchasevents’datainsomePilots),fine-tuningthetypesandgranularity of data (e.g. time scope), improving some of the predictive models or providingadditionalfunctionality(e.g.improvingandbetterintegratingsomeofthevisualisation).Atthetimeof writing EW-Shopp partners are already working on such improvements with the aim to fullymatchalloftheBusinessCasesrequirementsanddeliverafinallysuccessfulevaluationbytheendoftheproject.


Tableofcontents

Listoffigures.........................................................................................................................................5

Listoftables...........................................................................................................................................5

Acronyms...............................................................................................................................................6

IntroductionandMethodology.............................................................................................................7

1.1 ObjectivesandMethodology.................................................................................................7

1.2 RelationshiptoOtherDeliverables......................................................................................10

1.3 DocumentStructure............................................................................................................10

EW-ShoppPlatformAssessmentatMonth24....................................................................................12

1.4 BusinessCase1–Pilot1Evaluation....................................................................................12

1.4.1 LogicalandfunctionalworkflowofthePilots..............................................................12

1.4.2 EW-ShoppToolsCompliance.......................................................................................17

1.4.3 Criticalitiesandcorrectivemeasures...........................................................................17


1.5.1 LogicalandfunctionalworkflowofthePilot...............................................................18







1.7 BusinessCase3Evaluation..................................................................................................29




1.8 BusinessCase4Evaluation..................................................................................................35




ConclusionsandOutlook.....................................................................................................................49


Listoffigures

FIGURE1EW-SHOPPGENERALDATAFLOWMODELANDMAINSTEPS:INGESTION,ENRICHMENT,ANALYTICSANDVISUALISATION.....9FIGURE2BUSINESSCASE1–PILOT1LOGICALFLOW.......................................................................................................13FIGURE3DIAGRAMOFCENEJEDATAGATHERING............................................................................................................14FIGURE4BUSINESSCASE1–PILOT2LOGICALFLOW.......................................................................................................19FIGURE5REPRESENTATIONOFBIGBANGDATAGATHERINGANDSTORAGE...........................................................................20FIGURE6PRODUCTSELECTIONINTERFACE.NOTICETHEINCREMENTALSEARCH.....................................................................23FIGURE7SCREENSHOTSOFDASHBOARDCHARTELEMENTS................................................................................................24FIGURE8BC1–WIDGETSHOWINGADYNAMICRANKINGOFSELLERSBASEDONMAXIMUMPRICE.THEDATE(ORDATERANGE)CANBE

DYNAMICALLYSETBYCLICKINGONANYOTHERWIDGETREPORTINGDATES....................................................................24FIGURE9BUSINESSCASE1–PILOT3LOGICALFLOW.......................................................................................................26FIGURE10.BROWSETELPIPELINEFORMERCATORDATASET..............................................................................................27FIGURE11BUSINESSCASE3–PILOT1LOGICALFLOW.....................................................................................................30FIGURE12.MEASURENCEPIPELINEFORDAILYVISITORS....................................................................................................31FIGURE13BC3WEEKLYSTOREANALYSISVS.WEATHERCOCKPITEXAMPLE...........................................................................33FIGURE14BUSINESSCASE4–PILOT1LOGICALFLOW.....................................................................................................36FIGURE15-DATAINGESTIONFLOWFORBC4................................................................................................................38FIGURE16.DATASTRUCTUREOFTHEBC4KEYWORDDATAINTHEENRICHMENTDATABASE....................................................38FIGURE17.ENRICHMENTPIPELINEDETAILS....................................................................................................................39FIGURE18.ENRICHMENTPIPELINEDETAILS....................................................................................................................39FIGURE19.ASIARECONCILIATIONPREVIEW...................................................................................................................40FIGURE20.EXAMPLEOFAGEONAMESEXTENSIONTHATADDSANEWCOLUMNWITHTHEPARENTADM1................................42FIGURE21.WEATHEREXTENSIONCONFIGURATIONFORM................................................................................................43FIGURE22.WEATHEREXTENSIONPREVIEW...................................................................................................................44FIGURE23.BC4WEEKLYKEYWORDPERFORMANCECOCKPIT.............................................................................................46FIGURE24.BC4WEEKLYKEYWORDCOCKPIT-SELECTIONOFASINGLEKEYWORD(ABOVE)ANDTWOKEYWORDS(BELOW).............46FIGURE25.BC4WEEKLYKEYWORDCOCKPIT–FILTERINGBYDAYS......................................................................................47FIGURE26.BC4WEEKLYKEYWORDCOCKPIT–COMBININGDAYANDKEYWORDFILTERS..........................................................47FIGURE27.BC4REGIONALANALYSISCOCKPIT................................................................................................................48FIGURE28.EXAMPLESCREENSHOTSOFVARIOUSFILTERSAPPLIEDTOTHEREGIONALANALYSISDASHBOARD................................48

Listoftables

TABLE1:LISTOFACRONYMSUSEDTHROUGHOUTTHEDOCUMENT........................................................................................6TABLE2.SHORTREFERENCESFORPROJECTPARTNERS.........................................................................................................6


AcronymsTable1:listofacronymsusedthroughoutthedocument

Abbreviation DescriptionAPI ApplicationProgrammingInterfaceBC BusinessCaseBC BusinessCaseBI BusinessIntelligenceDSL DomainSpecificLanguageECMWF EuropeanCentreforMedium-RangeWeatherForecastsFTP FileTransferProtocolGLCI-RDF GoogleGeoTargetsinRDFGRIB GRIdded Binary or General Regularly-distributed Information in Binary form. A data

formatcommonlyusedinmeteorologytostorehistoricalandforecastweatherdataHDT HeaderDictionaryTriplesIPR IntellectualPropertyRightsJAR JavaARchiveJSON JavaScriptObjectNotationLOD LinkedOpenDataLOV ListOfValuesMARS TheECMWFMeteorologicalArchivalandRetrievalSystemRDF ResourceDescriptionFrameworkSQL StructuredQueryLanguageTSV Tab-SeparateValues.AtextformatforstoringdatainatabularstructureUI UserInterfaceVDP VisualDataProfilingVPN VirtualPrivateNetworkWP WorkPackage

Short references may be used to refer to project beneficiaries, also frequently referred to aspartners.ReferencesarelistedinTable2.

Table2.Shortreferencesforprojectpartners

No. Beneficiary(partner)nameasin[GA] Shortname1 UNIVERSITÀDEGLISTUDIDIMILANO-BICOCCA UNIMIB2 CENEJEDRUZBAZATRGOVINOINPOSLOVNOSVETOVANJEDOO CE3 BROWSETEL(UK)LIMITED BT4 GFKEURISCOSRL GFK5 BIGBANG,TRGOVINAINSTORITVE,DOO BB6 MEASURENCELIMITED ME7 JOTINTERNETMEDIAESPAÑASL JOT8 ENGINEERING–INGEGNERIAINFORMATICASPA ENG9 STIFTELSENSINTEF SINTEF10 INSTITUTJOZEFSTEFAN JSI


IntroductionandMethodology

1.1 ObjectivesandMethodology

ThisdeliverableD2.3providesthefirstevaluationoftheEW-ShoppmaincomponentsasofMonth24, after early integration and initial implementation of the Business Cases. Therefore, themainobjectiveofthedeliverableistodocumenthowtheEW-ShopptoolssupporttheBCrequirements,inparticular, from a functional and logical point of view. Indeed EW-Shopp is dealing with oftencomplex and multiform dataflows which include multiple data providers and manyactors/stakeholders. Therefore, it is important to break down the dataflow steps and understandclearlyboththetechnicaland‘organizational’ issuesinvolved.Aninitialversionofthedataflowwenow propose, applied to Business Case 1, was already presented in D4.2 - Pilots deployment;however,inthatversion,whilethelogicalflowwasalreadyquiteclear,wehadn’tyetconsolidatedthe current structure, which separates clearly (therefore modularizing them) all of the steps.Additionally, as is shown in more detail in Chapter 3, this structure and its principles are nowconsistentlyappliedtoalloftheEW-ShoppBusinessCases.

This is the first version of the EW-Shopp toolkit evaluation. Given the nature of EW-Shopp as anInnovation Action and its Business Cases coming from different sectors and usage scenarios, wedecidedatthisstagethattheevaluationbespecificallyscenario-andbusiness-orientedratherthantechnical.Therationaleforthischoiceis:a)thefactthatEW-Shoppisincorporatingandcustomizing(as well as improving), various existing tools; b) the fact that at this stage (M24) assessing thecapabilityof thevarioustools towork inan integratedmannedandtosupportallof thedataflowstagesisastrongpriority;3)likewise,ensuringthatbusinesspartnersaresatisfiedofthecapabilitiesandevaluatethattheseareinlinewiththeirbusinessandusagescenarios.

Because the Business Cases are quite different in objectives and nature finding a commonmethodologyforevaluationischallenging:tothisend,thefirststepwastoconsolidateacommondataflowandapplyittoalloftheBusinessCases.Inthisway,themethodologyisflexibleandopentodifferentscenarios, itsbaseline logic isconsistent.Fromanevaluationpointofviewthismeansthatweanalyseindetaileachstep(describedbelow),foreachBusinessCase,assessifthebusinessrequirementsarefulfilledandinturnplancorrectiveand/orimprovementmeasuresifneeded.Foreach Pilot in the three Business Cases we present the detailed dataflow and then assesssystematically each step (Ingestion, Enrichment, Analytics and Visualisation), and in particularminimumfeaturesandtoolscompliance.Additionally,foreachPilotwealsoanalysecriticalitiesandcorrective measures. Such corrective measures will be implemented in the next period andincorporatedinthenextreleaseoftheEW-ShoppToolkit.


Figure 1 below shows the general dataflow logical model which has been developed within EW-ShoppandhereissystematicallyanalysedforeachPilot.Inparticular,wedefine4keystepswhichareallsupportedbytheEW-Shopptoolsandinvolveallofthedifferentactors.Themainstepsintheadopteddataflowmodelare:

1. Ingestion.Thisstep,typicallyinvolvingbusinesspartners,involvesthecollection(ingestion)ofalltherelevantdataneededfortheBusinessCase.Inparticular,company/businessdatafromthevariousbusinesspartnersandstakeholders(e.g.,historicdata,real-timesensordata,datafromERPs, etc.), but also other data from different providers. In EW-Shopp these data includeweatherdataprovidedbyspecificproviders/databases (e.g.,MARShistoricaldata),1aswellasevents’ data. Somepreliminarydata filtering/manipulation is alsoperformedat this stage, forinstanceregardinganonymizationinthecasecertaindataisconfidentialorprotectedbyIPR.

2. Enrichment. This is one of the core steps carried out in EW-Shopp and represents all of theactivitiesrelatedtodatamanipulation.Giventheheterogeneityofthe inputdatacomingfromthe previous step, here the datasets are harmonised and prepared for later development ofanalyticmodels,predictionsetc.Dependingontheusecase,theenrichmentphaseforesees,forinstance,datafiltering,reconciliation,transformation,aggregation,etc.

3. Analytics.Inthisphaseofthedataflow,analyses,modelsandpredictionsareproducedbasedonthedatapreparedintheprevioussteps.SeveralalgorithmsandtoolsareuseddependingonthescenarioandBusinessCase.

4. Visualisation.Thisstepisspecificallydedicatedtothevisualisationandnavigationofthedataasoutput fromtheaboveanalysesandmodelsbymeansofspecificBusiness Intelligencereportsandvisualisationcockpits.2Thismeansthatinthisphase,theactualresultsarepresentedtofinal(final) users in ameaningful, understandable and effectiveway. The objective of this step is,therefore,toabstractdatamodelsandprovidethe‘userinterface’forpresentationofdata.

1 See https://www.eea.europa.eu/data-and-maps/data/external/time-series-of-weather-parameters andspecificallytoEW-ShoppD1.3andD1.42Weadoptheretheterminology‘cockpit’becauseitisusedwithintheKnowagetool,adoptedbyEW-Shoppforvisualisation.AcockpitisessentiallyaninteractivedatavisualisationandnavigationdashboardintegratedwithintheKnowageplatform.SeealsoD3.3formoredetails.


Figure1EW-Shoppgeneraldataflowmodelandmainsteps:Ingestion,Enrichment,AnalyticsandVisualisation

InadditiontobeinganeffectivefortheevaluationofheterogeneousBusinessCases,adoptionofacommon baseline dataflow model within EW-Shopp also presents four key advantages, whichrespondtobaselinerequirementsofallBusinesscases,inparticular:

• Multi-actorbydesign.TheEW-Shoppmodelallowsformultiplestakeholderstocollaborateandmanage/use/manipulate thedata along the various steps, spanning frombusinesspartners totechnologyproviders,dataanalystsandoperators.Additionally,eachofthefourstagescanhavearesponsibleorpointofreferencenotnecessarilyfromthesameorganisation.Thismulti-actormodel was already internally experimented within the actual datamanagement work amongEW-ShopppartnersworkingonthevariousPilots.


• Flexibility in terms of automation vs. human in the loop. The data flows in EW-Shopp areheterogeneous intermsofdataandcomplex intermsofflows. Insomestepsoftheflowhighlevelsofautomationareforeseeablebutatthesametimeatotherpointshumaninterventionisneededordesirable(e.g., forqualitycontrol,customisation,etc.).Ourmodeldoesnotenforceneitherandallowsforbothtoconsistalongtheworkflow.

• Themodelisessentiallyagnosticabouttimegranularityandtimediscreetness.Thismeansthata full dataflow couldhappen continuously in near real-time (givenof course that itwere alsoautomatic), or at fixed time intervals (e.g., once a day). The model allows both scenarios tohappen.

• Toolsmodularityallowing for thepotential inclusionwithin flowsof toolsexternalEW-Shopp.Thefirst,basicrequirementforsuchfeatureisattheingestionstagewerebusinesspartnersmayneed touse their own (potentially legacy) tools to extract data from their ownor from somebusiness partners’ systems. Additionally, at other stages of the flow this also provides for adegree of flexibility which is also highly compatible with EW-Shopp’s open source approachallowing,forinstance,tousedomain-specifictoolsforanalyticstasks.

1.2 RelationshiptoOtherDeliverables

ThisdeliverableisrelatedtoallotherdeliverablesfromWP2,WP3andWP4.ConcerningD2.2EW-Shopp Platform - v1, in the current deliverable we refer to the platform components that weredescribed there. Because the present document is delivered 12months after D2.2, it reflects theupdates to the platform and tools, and certain components that were included in D2.2 aspreliminary are not mentioned here.D2.4 EW-Shopp Platform -v2 (to be delivered at M36), willreflectthecurrentupdatesaswellastheoutcomesofthepresentevaluation.

ConcerningWP3,D3.3 EW-Shopp components as a service: data visualization, navigation andqualityassessment,whichisbeingdeliveredinparalleltoD2.3,issomewhatcomplementaryinthatit describes the current implementation of the data visualization and navigation, and dataassessmentservicesintheEW-Shopptoolkit.Toavoidoverlaps,thetwodeliverablescross-referenceeach other were appropriate. In particular, details regarding the Knowage platform features andconcepts,suchasdatasource,datasetandcockpit,areprovidedinD3.3.

D4.2 Pilots Deployment (delivered at M18) describes in detail the release of the five EW-ShoppBusiness Case Pilots and, for each of these Pilots, reports the data integration and analyticsprocesses, results obtained through these processes, challenges and expected evolutions. Wetherefore reference toD4.2directly for details about thePilots (in particular in Section 3), unlessrelevantchangeshaveoccurredsinceitsrelease.

1.3 DocumentStructure

Followingontheabovemethodology,therestofthedocumenthasbeendividedasfollows:

InChapter2weprovidethesystematicanddetailedwalkthroughofeachEW-ShoppPilotfollowingtheadopteddataflowdescribedaboveandfocusontheevaluationofeachstep,thereforematching


thecomplianceoftheEW-Shopptoolstotherequirementsaddressedbyeachstep.ForeachPilotcurrentcriticalitiesandproposedcorrectivemeasuretoguidethefurtherdevelopmentofEW-Shopparealsoprovidedinthischapter.

Finally,inChapter3wedrawthemainconclusionsandtakeawaysfromthisevaluationandprovideanoutlookforthenextperiod.


EW-ShoppPlatformAssessmentatMonth24

1.4 BusinessCase1–Pilot1Evaluation

ThegoalofthisPilotistocreatewidgetsfortheCenejewebsite,whichdeliverusefulinformationtothe end users, so as to increase traffic and engagement. Information comes from the analysis ofCeneje data and weather forecasts. The analysis focused on three different categories (AirConditioning,DryersandTV)withinterestingandnothomogeneousresults.

1.4.1 LogicalandfunctionalworkflowofthePilots

In Figure 2 below we report the logical and functional workflow for Business Case 1 - Pilot 1accordingtotheEW-ShoppdataflowmethodologydescribedinChapter1


Figure2BusinessCase1–Pilot1Logicalflow


1.4.1.1 IngestionIthecontextoftheEW-ShoppPilots,Cenejeisusingasubsetofthedatasetstobeusedinthefinalproduct. Thedatasetsusedare limitedonly to specific categories and thewebportalwidgets arelimitedtoCeneje.siwebpageonly.

Content Selection

Purchaseintent

Dataaboutconsumerclicksontheretailers'offersonCeneje.siwebpage,whichredirect to retailer’swebsiteswherehecanpurchaseaproduct.

Limited toa selectednumber of categories: AirConditioning,DryersandTV

Productspricehistory

Dataaboutpricesfromalltheretailers.Everypriceistreatedasanevent

Limitedtoaselectednumberofcategories

Weatherdata(MARS) Coherentwith time seriesof theabove contentandforecast

ThefigurebelowshowsadiagramofhowCenejedataisgathered:

Figure3DiagramofCenejedatagathering


Internal data like purchase intent and products' price history are stored in internal corporatedatabases.WeatherdataisprovidedthroughtheMARSdataset.DetailsaboutMARSdataregardingbothspecificationandreleaseoftheweatherservicearerespectivelyprovidedinD1.3andD1.4.

All thedatasetsare integratedwitheachother, soas tobeused in the followingEnrichmentandAnalyticssteps(thisisexplainedintherelatedsections).

1.4.1.2 EnrichmentTheaimof thisPilot is tomodelpageviewsanddeeplinks forcertainproductcategoriesbasedonweatherobservations.Thegoal is topredictwhen (i.e., inwhatweatherconditions) thecustomerinterest will spike (i.e., significantly increase in a short time). The dataset provided for this Pilotcontains data about pageviews and deeplinks related to Slovenia and no cleaning operations areneededbeforeextendingthedataandtrainingthepredictivemodels.

However, new features are computed and appended as new columns to the original dataset, bymeans of splitting and aggregation operations. Among those features, the set of the targetpredictionvariables(referredtoastheYvector)iscalculatedandadded.

Finally,theweatherextensionoperationhasbeenperformedoverthesedata.

Weather data areprovidedby the ECMWF service. Since the ECMWF servers have todealwith ahugenumberofdownloadandanalysis requests, they implementaqueuewithpriority systemtohandlealltherequests.Duetothequeuelength,queryingtheECMWFandobtaintheresultsinreal-timeisnotfeasible.Forthatreasons,ahugedumpofdatahasbeendownloadedatonce, inGRIBformat.3TheGRIBformatisastandardbinaryformatforstoringweatherdata.Thisformatisgrid-based,i.e.,the weather observations are available only for the intersection points (which are geographicalcoordinates)onthegrid.Ifthedesiredpointisnotavailableonthegrid,itispossibletointerpolatethefournearestpoints toobtaintheapproximatedobservationfor thepoint. In thecaseofareas(e.g.,regionsorcountries),weinterpolatealltheobservationsofallthegridpointsthatfallintotheselectedareaandobtaintheapproximatedweatherobservationforit.

Even though it is very space-efficient,GRIB file processing is not time-efficient at query time, i.e.,querying the GRIB file in real-time would take some time. Based on these considerations, thedecisionwasmadetopre-processtheGRIBdumpandtransformitinJSONorTSVfiles(i.e.,filesthatcontainalltheobservationsforallthegridpoints),whichcanbequeriedfaster.Fromthispoint,werefer to weather data considering the already pre-processed files, which contain the weatherfeaturesusefulintheEW-Shoppprojectscope4.

Since thedatasets are related to Slovenia only,within this Pilot scope,weobtain theweather byinterpolating 16 predefined points, which correspond to the weather stations in Slovenia. In

3ThisworkaroundhasbeenadoptedinallthePilots, i.e.,wedownloadedthefullweatherdatadumpforallthecountriesrelevantforthePilotsinadvance,andweextracttheweatherfeaturesneededcasebycase.4 Please refer to https://github.com/JozefStefanInstitute/weather-data/wiki/Weather-features for a detailedexplanationaboutweatherfeaturesavailableinASIA


addition, we fetched the weather forecasts for the next 6 days, so the analytics models can betrainedonthistimewindow.

After that,we aggregatedweather observations using different operators:max,min, sum,mean,diff, and total. Theseaggregationshavebeencomputed fordifferentweatherparametersand forseveraldayandtimeranges.

All the aggregatedweather features havenot been appended to theoriginal dataset providedbyCeneje,buttheyhavebeenstoredinQMineranduseddirectlytotrainmodels,asexplainedbelow.

1.4.1.3 AnalyticsTheresultofthisPilotisawidgetservicethatcultivatesweb-storeuserengagementthrough“senseof urgency” information. Some of themessages displayed by the widget inform the shoppers ofupcoming weather-related sales trends that may impact the availability of products. In terms ofanalytics, thismeanspredictingspikes in increased interestanddemandforproducts.Namely,weare predicting sudden increases in pageviews and deeplinks for certain product types (airconditioningunits,clothesdryersandTVs)ontheCenejewebsitebasedonweatherconditions.

Thispredictiontaskisformalizedbytheinput/outputpair:

Input: Historic data onpageviews anddeeplinks for target product types - air conditioningunits,clothesdryersandTVs.ThedataareatimeseriesofaggregateddailyvaluesfortheentirecountryofSlovenialinkedtoweatherconditions.Thedataisgivenintabularformatwithonerowofdataperday.

Output:Aclassificationmodelpredictingifaninterestspikewilloccurgivenanewvectorofweatherdataforthetargetday.

Tobuildaclassificationmodel,thehistoricdatahadtobelabelled.Thismeansadomainexperthadtospecifythecriterionfordeterminingwhatkindofvolumeofpageviewsanddeeplinksonadailylevel constituted a spike. This threshold value depends on the product type as, for example, adifferent volume of televisions is sold compared to clothes dryers. After determining thesethresholds, historic data could be split into positive and negative instances, forming a standardclassificationtask.ASupportVectorMachine(SVM)classifierwasbuiltusingtheEW-Shoppanalyticstools.

DuetotheseasonalnatureofsomeproductgroupsinthePilot–namelytheairconditioningunitsandtheclothesdryers–theirmodelswerealsobuiltbyjustusingthedatafromwhentheyare inhighseason.ForairconditioningunitsthismeantlimitingthedatatosummermonthsfromJunetoAugustandfordryerstoearlyautumnfromAugusttoNovember.Allotherparametersremainedthesame. By limiting the modelling to high season, we observed a significant increase in modelpredictionaccuracy.

1.4.1.4 VisualisationBecause thisPilot ismainly concernedwithaB2Cscenario, i.e.,webportaluserengagement (seeD4.2fordetails), thevisualisationanduser interfaceareprovideddirectlyontheonlinestorewebportal.Inparticular,GraphicalWidgetsareintegratedintargetwebportalsasadditionalUXfeatures


where content is delivered based on predictive information set in exactly defined stages of thepurchaseprocess.Deployment(asdescribedinD4.2),happensonhttps://www.ceneje.siwebportal.

Aback-officevisualisationscenariowhichsharesdomainanddatausedinthePilotiscarriedoutforPilot2andisdescribedbelowinSection1.5.1.4.

1.4.2 EW-ShoppToolsCompliance

TheEnrichmentstephasbeenaddressedusingad-hocscriptsthatqueryutilitylibrariesavailableinEW-Shopp.Forexample,theweatherobservationsaredownloaddailybyusingthePythonWeatherlibraryprovidedbyJSIwithintheproject.

Theweather observations have been stored in QMiner, and directly queried ormergedwith theoriginaldatasetusingthesametool.

1.4.3 Criticalitiesandcorrectivemeasures

IfwedesiretoreplicatetheEnrichmenttaskofthisPilotusingDataGraft+ASIA,someproblemsarise:

1. InthisPilotsomecomplexaveragingandcumulatingoperationshavebeenperformed.Indeed,these operations need “group by” actions,which are straightforward to efficiently implementusingad-hocscripts(e.g.,byloadingthewholedatasetinmemoryand"querying"itforeachofthe group by operations' input). The same operations easily repeatable using the Grafterizertool,which followspipeline-basedapproach (setof stepswhichoperateaspipe-and-filter) foreditingdata.However,sinceamultitudeofsuchoperationsarerequiredatthesametime,thesingle-table-manipulation approach implemented in Grafterizer is not practical as it wouldrequirethatmultipleGrafterizerpipelinesbeimplementedforeachofthegroup-byoperations.

2. Ceneje data do not contain spatial identifiers, i.e., we know from external information(metadata) that the data are related to Slovenia; the current release of ASIA allows users toextend data only if a column is already reconciled against GeoNames (spatial-based weatherextension). Even if it is currently possible to implement this operation with a workaround(adding a new column with the value “Slovenia”, which can be first reconciled againstGeoNames, and then extendedwith theweather data),we plan to provide userswith a newWeather extension (named temporal-based weather extension) starting from a column thatcontainstemporalinformation(e.g.,thedate).

3. ForthisPilot,acustomsetofinterpolationpointshavebeenprovided,i.e.,theweatherstationsavailable inSlovenia.Thecapabilityof selectinganarbitrary setofpoints to interpolate isnotimplementedinthecurrentreleaseofASIA;weplantoaddthisfeatureinthetemporal-basedweather extension function explained above (instead of selecting a single place, the samefunctionshouldallowasetofseveralpoints).


This Pilot required aggregations on weather data; ASIA currently does not provide aggregationfunctionsonweatherparameters.Wehavetoaddalsothe“aggregation”parameterintheweatherconfigurationform(seeSection2.5.1.2 fordetailsabouttheconfigurationform).


ThegoalofthisPilot istodevelopdata-drivenservicesthatRetailers,Brandmanufactures,ServiceprovidersandB2BbuyerscanusetooptimizeCategorymanagementandMarketingmanagement.Usingdataandmodels thatarecommontoPilot1and2, thePilotprovidesuseful information tomanagePricingpolicywithinTVcategoryandselloutforecastforAirConditioning.

1.5.1 LogicalandfunctionalworkflowofthePilot

In Figure 4 below we report the logical and functional workflow for Business Case 1 - Pilot 2accordingtotheEW-ShoppdataflowmethodologydescribedinChapter1.


1.5.1.1 IngestionInthefollowingtablethereisexplanationofthedatasetstobeusedinthefinalproductbyBigBang:

Content/Source Selection

Demanddata–INTERNALAggregatedbusinessfeaturesliketraffic,conversionrateandbasketvalue.

Trafficislimitedtoalocation/channel.

Productrange(stockkeepingunitlevel)–INTERNALForeveryproductintheline-upthereisa4-levelcategorizationtogetherwithdifferentproductcharacteristics.

/

Productsell-out(stockkeepingunitlevel)–INTERNALDailysell-outforeveryproduct(bylocation,channeletc.).

/

EventDataEventDataareoftwotypes:

• MarketingActivities(stockkeepingunitlevel)-INTERNAL

• CalendarEvents(dailybasis)-EXTERNAL

Marketingactivitiesarelimitedtoaselectednumberofproducts.

CalendarEventsarelimitedtoselecteddates.

Weatherdata-EXTERNAL FromMARSdatabase(seeD1.3andD1.4fordetails)

Figure5showshowBigBangdataaregatheredandstored.Allinternalandexternaldataarestoredininternalcorporatedatabases.

Figure5RepresentationofBigBangdatagatheringandstorage


1.5.1.2 EnrichmentIn thisPilot,BigBangproductdata (BBP)havebeen joinedwithproductdataprovidedbyCeneje(CEP). Both datasets are provided in CSV format and underwent a cleaning phase before theextensionstep.

Themainfieldsinthetwodatasetswere(omittingsomemetadatafields):

BBP:• Date:thedateoftherecord;containsalsotimeofdaywhentherecordwasmade• Product EAN5 number: the international identifier number of the product – the same

numberencodedinthebarcode• Brand:thenameoftheproductbrand• Price:thepriceoftheproductonthedayfromthefieldDate• Sales:NumberofsalesoftheproductonthedayfromthefieldDate

CEP:• Date:thedateoftherecord• ProductEANnumbers:CenejecollectstheproductinformationdifferentlythanBigBangand

thesameproduct recordcanhavemultiplecorrespondingEANnumbers.Thishappens forexamplewhen the same product is available inmultiple colors that each have their ownEAN,orsimilarcaseswhenthesameproduct isavailable inminorvariationsthatforsomereason have different EAN numbers. For the Pilot purposes it is fine to not differentiatebetweenthese.

• Seller:theidofthestore• Pageviews:NumberofpageviewsoftheproductonthedayfromthefieldDate;pageviews

arenotstore-specificasallstores’pricesarelistedonthesamepageintheCenejewebsite• Deeplinks:Numberofdeeplinksoftheproductcorrespondingtothesellerwiththeidinthe

Seller fieldonthedayfromthefieldDate;thiscountshowmanyshoppersclickedthe linkforthesellerontheCenejewebsite.

Thefollowingcleaningandtransformationstepshavebeenimplementedoverthetwodatasets:

- [BBP] Dates' normalization: in the BBP datasets, dates are written along with timeinformation;thisstepremovesthetimeinformationfromallthedatefields.

- [CEP]Missingvaluesreplacement:forsomeproducts,theinformationabouttheselleridismissing;thistransformationstepfillsthemissingvalueswiththe“Unknown”value.

After the cleaning phase, the two datasets aremerged into a single table. First, all records fromCenejethatdonotcorrespondtothesellerIDofBigBangaredropped.Then,therecordsfromBBPan CEP are linked on date and EAN numbers. This means that for each date we know for eachproductitsbrand,priceintheBigBangstore,howmanytimesitwasviewedontheCenejewebsite,howmanytimesthelinktoBigBangwasclickedontheCenejewebsiteandhowmanytimesitwassoldintheBigBangstore.

5EuropeanArticleNumber;seehttps://en.wikipedia.org/wiki/International_Article_Number


SincerecordsinCEPhavemultipleEANvalues,weneedtocheckallofthemandarecordfromBBPislinkedtoarecordfromCEPifitsEANisamongtheEANsoftheCEPrecord.Intheory,whenusingthis approach multiple BBP records could correspond to a single CEP record and we’d need toaggregatetheirsalesvalues,butinthedatawehad,eachCEPrecordcorrespondedtoatmostoneBBPrecord.AsmallnumberofrecordsinbothdatasetsdidnothaveanymatchingEANintheotherdatasetandwerediscardedafterconfirmingwiththebusinesspartnersthatthiswasacceptable.

In themerged dataset some extra aggregation featureswere created for the prediction task. Foreach recordwe added the average daily value of pageviews, deeplinks and sales over the 7 dayspreceding the record date, to give the analytics stage a baseline towork from. Target predictionvalues (theso-calledyvector)werealsocreated, thosebeing thenumberofpageviews,deeplinksandsalesfor1,2,3,4,5,6and7daysahead.

Finally,thesameweatherenrichmentprocessdescribedforPilot1isperformed.

1.5.1.3 AnalyticsTheaimofPilot2istopredictthedynamicsofshopperactivityusingweathercontextdata.WearepredictingthedailynumberofpageviewsanddeeplinksontheCenejewebsiteandthedailynumberof sales in the Big Bang store at the stock keeping unit (SKU) level (i.e., the level of a specificproduct).ThisisadifferentpredictiontaskthanthatofPilot1,whereweidentifywhichdayshaveaspikeinactivity.AnotherdifferencefromPilot1isthatwearetakinginformationaboutpageviewsanddeeplinksintoaccountwhenpredictingsales.Therationalebeingthatpageviewsanddeeplinksindicateshopperinterestthatmayendinapurchase.

Thispredictiontaskisformalizedbytheinput/outputpair:

Input: Historic data on pageviews, deeplinks and sales for target product types - airconditioning units, clothes dryers, TVs andmobile phones. The data is a time series ofaggregateddailyvalues forentireSlovenia linked toweatherconditions.SincepageviewanddeeplinkdatacomesfromCenejeandsalesdatacomesfromBigBang,thedataaboutthesameproductfrombothcompanieswaslinkedusingEANnumbers.Thedataisgivenintabularformatwithonerowofdataperday.

Output:Aregressionmodelpredictingthevolumeofpageviews/deeplinks/salesgivenanewvectorofweatherdataandbusinesscontextdata(forsalesonly–seeparagraphbelow)forthetargetday.

A SupportVector Regression (SVR)model is built using the EW-Shopp analytics tools to solve theprediction task described above. Due to their similar nature all three models, for pageviews,deeplinksandsaleshaveanearly identicalanalyticalpipeline6specification.Themaindifference isthat the salesmodel takes into account also the informationabout theaverage recent volumeofpageviews and deeplinks. Experiments with predicting sales were done both with and withoutpageviewsanddeeplinksdataandthelattermodelwasfoundtoperformbetter.

6https://github.com/JozefStefanInstitute/ew-shopp-public/tree/master/analytics/pipeline


1.5.1.4 VisualisationTheaimoftheVisualisationserviceforBC1-Pilot2istoprovide‘backoffice’businessuserssuchassalesmanagers, retailermanagers, business developmentmanagers, etc., ameans of viewing thehistorical pricing evolution of certain products or product categories, such as TVs. To this end acockpithasbeendeveloped inKnowagetoshowthepriceevolutionofaspecificTVmodeloveratime-range. The data sets used for this cockpit aremainly queries on the EW-Shopp Pilot-specificdata-sets (seeD4.2),whichhavebeen loaded intoan independentMariaDBdatabaseusedas thedatasource.

Usersareabletoselectthespecificmodelthroughanintuitiveselectionwidget,whichalsoprovidesincrementalsearch.FromatechnicalpointofviewthishasbeenimplementedthroughaKnowageListofValues(LOV)addedtothecockpit.Productsearchparameterscanalsobesavedinasocalled‘bookmark’sothattheusercanquicklyretrieveitagain.

Figure6Productselectioninterface.Noticetheincrementalsearch

Oncetheproducthasbeenselected,alinegraphisshownwithinthecockpitwithinformationabouttheproductpricingoverthetimespanandiscorrelatedtoeachseller.Foreachtime-pointminimum,averageandmaximumpricesareshown.Then,theusercanselectoneormorespecificsellersfromalist.Additionally,agraphshowingarankofmaximumpricepersellerisalsodisplayed.


Figure7Screenshotsofdashboardchartelements

[note:intheabovescreenshot,IDsareshowninsteadofsellernamesforconfidentialityreasons]

Additionally,thevisualisationusersneedtobeabletoseealistofsellerssellingtheproductrankedboth by maximum price or by number of pageviews: to this end, the user must also be able tospecifythedaterangeoraparticulardate.ThankstoKnowagedataassociations(seeD3.3)thiscanbedonebytheuserbysimplyclickingonanywidget(includingforexamplethechartshowingthepricevariations),whichincludesadatapoint(timestamp):i.e.,iftheuserclicksonacertainday,forexample,wherethemaximumpricedroppeddrastically,theranktablewillreportdatajustforthatday.

Figure8BC1–widgetshowingadynamicrankingofsellersbasedonmaximumprice.Thedate(ordaterange)canbedynamicallysetbyclickingonanyotherwidgetreportingdates



Allthecleaning,transformation,mergingandextensionoperationshavebeenperformedusingad-hocscripts,whichstoredandqueriedallthedatadirectlyfrom/totheQMinerdatabase.


SincetheenrichmentprocessimplementedinthisPilotissimilartotheonedescribedforPilot1,thesamecriticalitiesandcorrectivemeasuresapply.

Regarding visualisation, users would additionally like to be able to see two seller ranking tablesrelated to two different dates, to compare prices, performance etc., on different days. The rankshould also report the top 3 sellers (e.g., rankedby pageview share), and an ‘all others’ categorywith the average of the rest of the sellers. To enable these functionalities, the construction ofadditionaldatasetsisneeded,whichcanaggregatethesellershareinformation.Thiswillbedoneinthe next period also by testing the cockpits by users. Another feature currently not present anddesiredbyusers isa ‘widget’ to selectdate ranges (currentlydatesareselected throughstandardselection listswidgets). To enable this, ad-hoc date formatting data setswill be created (throughqueries)andthenlinkedtowidgetsandappropriateassociations.


Browsetel/CDE have contact history records for the past few years and within the scope of theproject the contacthistory is enrichedwithweatherelements inorder topredict the influenceofweather and external events to the number of incoming calls and success rate of outboundcampaigns.

Workforce planning inBrowsetel/CDE's Contact Canters is donebyContact CentreAdministratorsusingdedicatedCOCOSWorkforceManagementTool.WehaveaddedthePredictionCustomReporttoexistingManagementtoolsasAdd-OnModule.

Evaluationof thesolutionwillbemonitoredasacustomreportwhereAdministratorswillanalysethedifferencebetweenthepredictivetrafficandrealtraffic.




1.6.1.1 IngestionBrowsetel/CDEisusingitscontacthistory,agentworkforce,andcustomeventsdatasetsinthefinalproduct. TheContactHistory database contains all Contact Centre eventswith the date and timeand,insomecases,locationoftheevent.SomeadditionalparametersareaddedlikeContactreasonand Duration. The Agent Workforce database stores Agent availability through time and theirperformance. Custom events are events gathered from different sources that may influence thepredictionalgorithms.

All thedata isgatheredwithinexistingapplicationsandprepared forpostprocessingusingExportfunctionalities and APIs. The integration with QMiner in implemented for the purpose of datapredictioncalculations.

1.6.1.2 EnrichmentBrowsetel provides data about operations of three different call centres: Mercator, NKBM, andSimobil.Sincethedatasetshavethesameschema, thepipelinesdesignedwithinthisPilotarethesame.Forthisreason,wediscussindetailonlyoneofthem(Mercator).

The macro-pipeline designed for the Mercator dataset is depicted in Figure 10. Each block isexplained in detail in the following subsections. The dataset is provided by Browsetel alreadycleaned,thusaproperdatacleaningphaseisnotrequired.ThedatasetisgeneratedbyBrowsetel’sCOCOSCEPCustomerengagementplatformviaAPIs(seeD4.2).

Figure10.BrowsetelpipelineforMercatordataset

1.6.1.2.1 AggregatehistoricaldataThistransformationtaskconsistsofaddingsomestatisticalfeaturestothedataset,whichareusefulforthesubsequentanalytics.TheobjectiveofthisPilotistotrainseveralpredictivemodels;thus,weappend some statistics to the dataset by grouping/aggregating/averaging some column values. Indetails,thepredictionmodelwillbetrainedusingatimewindowthatrangesfrom1to7daysinthepast,foreachdatewhereatleastonecalleventoccurred.

The first aggregation in place is the sum of the total number of calls made for each day in thedataset. Furthermore, for each day,we are interested also in the total number of answered calls(i.e., callswitha conversation timegreater than0 seconds)and in the total conversation time foreachdate.These3sumsarethenusedtocomputetherealfeaturesneededfortheanalytics:

• Theaverageanswerratefortheprevious7days;• Theaveragenumberofcallsmadeintheprevious7days;• Theaverageconversationtimefortheprevious7days;• Theaveragetotalconversationtimefortheprevious7days.

Input+CleaningResult

Aggregatehistoricaldata

BuildtheYvectorforpredicwon

Extendthedatasetwithweathercolumns


1.6.1.2.2 BuildtheYvectorforpredictionAslasttransformationstep,theYvectoriscomputed:for7differenttimewindows(thenext1day,the next 2 days, and so on until the largest windows that covers the next 7 days), the followingparametersareappendedtothedataset:

• Theanswerrate;• Thetotalnumberofcalls;• Theaverageconversationtime;• Thetotalconversationtime;

Hence,theYvectorcontains28parameterstousefortrainingthepredictionmodels.

1.6.1.2.3 ExtendthedatasetwithweathercolumnsForthisPilot,thetimewindowofinterestcoversoneyear(2017).Onlythedailyweatherisfetched,withhourlyobservations(i.e.,24observationsperday).Noforecastsarerequired.

Inaddition,evenifthedatasetsprovidedbyBT(CDE)containthecoordinatesoftheplaceswherethecallreceiverlives(allfromSlovenia),theweatherdatahavebeendownloadedonlyforthecityofLjubljana,whichisthelargestcityinSlovenia;thisdecisionfitswelltheSloveniancase,becausethetotalareaofthecountryisnotlarge.Thus,theweatherofLjubljanaapproximateswell-enoughtheweatherconditionsofthewholecountry.

Thesetofalltheavailableweatherparametersisthenappendedtotheinputdataset(2d,2t,sd,sp,tcc,ws,rh,sund,tp,ssr,andsf).

1.6.1.3 AnalyticsInthisPilot,analyticsaremeanttosupportBrowsetelcallcentreoperations.Toensuregoodservicelevels,thecallcentreneedstohaveenoughstaffmanningthephonelines.Sinceextrastaffiscostly,it is inthecallcentremanagement'sinteresttoplantheminimalnumberofpeopleneeded.Todoso, they need an accurate prediction of the volume of such “incoming calls” expected. The callcentrealsoperformsmarketingcampaigns,whichmeansthestaffcalltelephonenumbersfromalist(providedbytheclient)andpresentthemarketingcampaigncontent.Wenamesuchcalls“outgoingcalls”.Inthiscase,theparametertooptimizeistheexpectedrateofansweredcalls,whichallowstoconcentrateefforts(i.e.,staff)totimeswhenthisrateisexpectedtobehigh.Theexperienceofcallcentre managers is that the volume of incoming calls and answer rate of outgoing calls areinfluencedbyweather,becauseundersomeconditions(e.g.,sunnysummerdays)peoplearemorelikelytobeengagedinactivitiesandareunavailableforcalls.

Thetwopredictiontasksarehighlyrelated,andweformalizethemwiththefollowinginput/output:

Input:Historicdataabouttheincomingandoutgoingcalls.Thevolumeofcallsisaggregatedonadailylevelandlinkedtoweatherdata.Outgoingcallsalsocontaintheinformationiftheywereanswered.Thedataisgivenintabularformatwithonerowofdataperday.

Output: A regressionmodel predicting thenumberof incoming calls or the answer rateofoutgoingcallsgivenanewvectorofweatherdataforthetargetday.


BothpredictiontasksaresolvedbybuildingaSupportVectorRegressionmodelusingtheEW-Shoppanalytics tools. Analytics pipeline specification for both is nearly identical, with just the targetvariablebeingdifferent. It is important tonote thatadifferentmodel instanceshouldbebuilt fordifferentincomingcallclient.Browsetelhandlescallcentreneedsfordifferentclients(supermarketchains,mobile telephonyoperators, banks, etc.) andeachhas adifferentdistributionof expectedcalls,whichiswhyseparatemodelsareneeded.Sincetheanalyticspipelinespecificationisthesameforallofthem,itissimpletoautomatepipelinegeneration.

1.6.1.4 VisualisationVisualisation was implemented as a set of additional custom reports in the existing COCOSWorkforceManagementApplication.No External visualization tools are planed sinceCOCOS is anestablishedCloudContactCentreSuite.


Wehave integratedwith theWeatherAPIprovidedby JSIandweareusingQMiner forpredictiveanalytics and integrate their results as custom reports in our existing Contact Centre WorkforceManagementTools.

Allthecleaning,transformation,mergingandextensionoperationshavebeenperformedusingad-hocscripts,whichstoredandqueriedallthedatadirectlyfrom/totheQMinerdatabase.


First,inordertoimprovethePredictiveanalyticstoolsintegrationtheyshouldbereal-time.Also,inorder tohavemoredetailedanalysesanddataanalysis for theBC, thecalculation shouldbeonasmaller time frame (i.e. of few hours instead of a day). But the most improvement for is theintegrationofcustomevents inthepredictionmodels: infactweatheralonecannotpredictallthechanges of customer behaviour,while eventswill provide add-value because customer behaviourcanchangequiteradicallywhencertaineventsoccur(e.g.incomingcallsboostifsomechangeinthecustomer-serviceservedcompanyoccurs).ConcerningtheEnrichmenttask,alsoforthisPilotitwasdecidedtoextendthedatasetwithweatherinformationfortheentireareaofSlovenia,informationcurrently not available within the dataset (which instead contains the coordinates of differentplaces).

1.7 BusinessCase3Evaluation

MeasurenceusesWiFisensorstocapturesmartphonesignalsandestablishthenumberofclientsofthePointofSales.GoalofthePilotistoassessanycorrelationbetweenseasonal/weathervariationsand customer visits to thePointof Sales, inorder topredict the traffic andoptimize thePointofsalesorganization.


1.7.1.1 IngestionMeasurence’s plug&playWiFi sensors are placed in chosen for Pilot locations (three sensors perlocation) collecting WiFi packets from all devices that use Wifi network (mainly people’ssmartphones). Measurence’s software filters, anonymizes and normalizes those packets,transforming them into “presence events” directly in the sensors. That software connects to thebackendservicesand transmit thoseevents to theMeasurence’sanalyticalplatform. TheexistingWiFiAnalyticPipelineestimatespeople’strafficoutsideandinsidelocation,calculatesthenumberofreturningandengagedvisitorsandthedurationofvisitswithminimalgranularityhalfanhour.TheeventdataforourPilotwastakenfromcorrespondingFacebookpagesofcardealerlocationsandKimScoutshop inMilan. Itconsists informationaboutthedateofeventandthenameoftheevent(forthefuturemodelweplantotestalsothenumberofinterestedpeople,ifrelevant).Thesedatawereingesteddirectlytotheinitialdatasetwithpeople’straffic.TheweatherdatawerepreparedandtransformedbyUNIMIB(nextsection).

1.7.1.2 EnrichmentAlong with the anonymization and normalization processes, which are executed directly on thedevice,sensorsdataprovidedbyMeasurenceunderwentanadditionalpre-processingphasebeforefeedingthePilotpipeline.Thisphaseisforreplacingallmissingdatageneratedbythesensors(theamountofreplaceddatadependsonthesensorpositionanddurationoftimewhensensorwasnotworking)andproducesthedatasetsusedasinputforthePilotpipeline(Figure12).

Figure12.Measurencepipelinefordailyvisitors

Measurenceprovided3datasets for each shop, containingdataaboutdaily,weekly, andmonthlyvisits (all built from an initial dataset that contains the half-hourly visits). The schemas of thesedatasetsaredifferent,butthetransformation/enrichmentoperationsarethesameforallof them(buttheyusedifferentcolumns).Basedonthisobservation,herewediscussonlythedailyvisitorsdataset,withsomeremarksforhelpingthereadertomaptheweeklyandmonthlycasestothedailyone.

After theabovecleaningphase, theweatherparametershavebeenaddedto thedataset.For thePilotscope,atdesigntimeMeasurenceneededonlyfewweatherparameters:10fg6(10metrewindgust in the last6hours),2t (temperature),ptype (precipitationtype), tcc (totalcloudcoverage), tp(totalprecipitation),vis(visibility).Alltheseparametersarefetchedfor3hourintervalsforeachday7(0:00-6:00-9:00-12:00-15:00-18:00-21:00),exceptforthe10fg6,whichisprovidedonlyevery

7Weskippedthe3:00observationbecauseitwasnotrelevantforthisPilot.

Input+CleaningResult

Extendthedatasetwithweathercolumns

Aggregateweatherfeatures

Aggregatethedailynumberof

visitors

Normalizethedailynumberof

visitor


6hours.Fromtheabovesetofparameters,Measurencedecidedtorelyonlyonasmallerdataset,whichcontains:

- Thetotalprecipitationat21:00;- Thetemperatureat9:00,12:00,15:00,and18:00;- Thetotalcloudcoverageat12:00

Inaddition,afewtransformationshavebeenperformedovertheweatherparameters:

- Temperatureparameter (2t) conversion fromKelvin toCelsius (x – 273.15,wherex is thetemperatureexpressedinKelvin);

- Dailytemperatureaveraging,whichistheaveragebetweenallthetemperaturecolumns(inthisway,thetemperatureduringthenightisnotconsidered).

Atlast,othernewfeatureshavebeenaddedtosupporttheanalyticstask:

- Numberofdailyvisitorsofashop,takingintoaccountonlyopeninghours:itcorrespondstothesumofhalf-hourvisitorsduringopeninghours(thesensorscountthenumberofvisitorseveryhalf-hour);

- Normalizeddailyvisitorsamount: it is the totalnumberofdailyvisitorsnormalizedontheaverage number of visitors for certain day of week (e.g., Monday, Tuesday, etc.). Thisnormalizedvaluehelpstoavoidthe"selectioneffect"relativetoacertaindayofweek(forexample,duringtheweekendsomeshopsconsistentlyhavemorevisitorsthaninotherdaysofweek).

All the above operations have been repeated for the weekly and monthly datasets, byaveraging/groupingthedailyobservations.

1.7.1.3 AnalyticsIn theMeasurence Pilot the goal is tomodel the dynamics of customermovement in (brick andmortar) stores based on data from an Internet-of-Things platform. Such a model would supportseveral optimizations of store operation, such as staff planning, marketing activity scheduling,inventorymanagementandothers.PilotdataisfromseveralcardealershipsfromalloverItalyandKimScoutshopinMilan.

Thepredictiontaskcanbeformalizedasfollows:

Input:Historicdataof visits to individualdealerships linked toweather conditions for theirlocations.Thedataisgivenintabularformatwithonerowofdataperday.

Output: Amodelpredicting thenumberof visitors for a target shopgivenanewvectorofweatherdataforitslocation.

To gain more insight a set of statistical tools (from the Python programming language libraries:scipy.stats,matplotlib,math,pandas,numpy)wasused toanalyse thedata.Statisticalapproachescan be more robust and therefore more impervious to data problems listed above. Using thestatisticaltools,itwaspossibleto:


- Determinethemostvisiteddayintheweek;- Comparedaily/weekly/monthlyvisitationlevelsovertheyear;- Compare influence of weather (temperature, precipitations and cloudiness) in 5 dealers'

locationsinNorthern/SouthernItaly;- Determinethatadifferenceinbehaviourofvisitorsinthemostsunnyandrainy/cloudydays

dependsonthelocation;- Determinethatnosignificantchangeinvisitationoccursatrapidchangesintemperature.- Impactofinternalevents(openweekendsandothermarketingevents)wasalsoanalysedby

adding information about them from dealership social media to the dataset showingpromisingresults.

1.7.1.4 VisualisationRegarding visualisation, given thatME already provides visualisation in their product, in the EW-ShoppBusinessCasetheideaistoincorporateweatherdata.Themainneedistohaveacockpitforeach store to visualise both daily visitors and daily weather features, in particular temperature,cloudiness and precipitation. Additionally, the end userswant to be able to also see theweatherforecast (again based on the above parameters) for the next week, in order to possibly planmarketing initiatives, etc. In the current implementation, the actual store andweather dataweretakenfromhistoricaldatasothecockpitassumesthat foragivenweek,week+1 is the ‘forecast’week.Adataset basedon aggregationofMeasurence sensor daily sensor data for themonitoredstores(anonymizedherewithgenericnames“Store1”etc.forconfidentialityreasons),andweatherdatafromtheMARSdatabase.Thecockpitpresentstwomainchartsshowingdailyvisitorsanddailytemperaturetogetherbothfromthe‘current’and‘future’week.Forthelatter,visitorsarezeroforeachday.Inthiswaytheusercanseevisuallythetemperatureevolutionalsofortheforecast.Otherwidgetswerealsodevelopedshowingintheupperpartdataregardingvisitorsandtemperatureforthe past week (historical), and weather forecast (including all relevant parameters) for the nextweek. The cockpit is created for each store and the specific store can be selected from a List ofValues(LOV–seeD3.3fordetails).Figure13showsascreenshotforacompletestorecockpit.

Figure13BC3weeklystoreanalysisvs.weathercockpitexample



EW-Shopptoolsareaddingtothedataanadditionallayerofintelligence:weatherandeventsdata.During the Pilot Measurence expected to build the analytical model and consider/find differentcorrelations between number of receipts/visitors/walk-by and different weather attributes liketemperature, precipitation, wind, cloudiness, etc. Measurence used both its own facilities andalgorithms written in Python, especially for weather data analysis and visualized data on theinteractive dashboard. All the cleaning, transformation, merging and extension operations havebeenperformedusingad-hocPythonscripts.EW-Shoppmoduleswereusedaswell:DataGraph(fordata basis enrichment andmanipulation),QMiner (analytics). The end application of theBusinessCase will use Knowage (for reporting and visualization) to find the most effective way of dataintegrationanddeliveringoftheanalyticstothecustomers.Thetraining(Pilot)datasetiscomposedby CSV files, the next necessary step is to automate the process and take advantage of the EW-ShoppplatformAPIforthedataintegrationprocess.


Itwasfoundthattheaccuracyofweatherpredictionsisnotenoughtoproducepreciseforecastofvisitors fornextdays,however theweathercanexplainsometrafficpeculiarities in thepastdata.Thus,wedecidedtoshowtheweatherdatatogetherwithtrafficdataas indicativeinformationonour dashboard (that was evaluated also by the Pilot service customers). We found also thateventregistry.orgdataisnotrelevantforPilotlocations(thereisnocorrelationbetweenglobalnewseventsandtrafficinconsideredstores).Therefore,theBusinessCasewillnotusethisdatainfutureanalysis.Atthesametime,theimpactoflocalmarketingeventsforthepeople’strafficissignificant,sotheBusinessCasewill trainthemachine learningmodelwithQMineronthepastdatatomakepredictionsofpeopletrafficforthefutureevents.

StatisticaltoolsarenotapartoftheEW-Shoppanalyticaltools.DuetotheirusefulnessinthisPilottheyarebeingconsideredforinclusion,howeverthiswouldrequireextensionoftheQMinerlibrarythatthetoolsarebasedon.Theinterestingresultsobtainedusingtheinternaleventsdataindicatethatasapromisingdirectionfordevelopment.Theseeventsappeartopossiblybearichersourceofsignal for customer dynamics and could enable modelling using machine learning tools. We areinvestingeffortsintodevelopingadatamodel8fortheseeventsandusingthemtobuildpredictions.

ExperimentswithbuildingamodelwithEW-Shopptoolsusingmachinelearningmethodologywereunsuccessful,asmodelperformancewasnotsatisfactory.Thiscanhappenfordifferentreasons,themost common twobeing that either there is not enough signal in the data fromwhich a reliableprediction canbe produced (either due to the biasednature of data causedby the limits ofWiFitechnologyorbecausethedatasetistoosmallorthatthemodellingmethodisnotappropriate).

The priority for the next period is to automate the process of building the dashboard for 50 andmoreofMeasurence’s locationsusing the EW-Shoppplatform components.Once the automation

8 SeedeliverableD1.4:Event,WeatherandMultilingualDataServices


process is set-up, theweekly resultswill be fed intoadedicatedMariaDBdatabase tobeusedasKnowage data source, in turn enabling to set automatized weekly cockpits similar to the onesdeveloped. Froma visualisationpointof view, itwill alsobeuseful tobuildoneormore cockpitsenabling,e.g.,tocarryoutastoreorgeographicalanalyses.

1.8 BusinessCase4Evaluation

Theaimof thisPilot is todevelopdatadrivenservices to improvethe impactofDigitalMarketingCampaignsestablishingthebestmomentto launchthecampaignandpredict itsperformanceasafunction of the weather. Several analytics models have been established on different enricheddatasetstoachievethefinalgoal.


InFigure14,wereportthelogicalandfunctionalworkflowforBusinessCase4-Pilot1accordingtotheEW-ShoppdataflowmethodologydescribedinChapter1.


1.8.1.1 IngestionThis Business Case involves the exploitation of several data sources for service generation:Marketingcampaignstatistics,weatherforecastandexternalevents.Atthisstage,theinitialPilotisusingmarketingandweather-relateddataforthecampaignschedulerservice.WechosetodevelopanalyticsforanalysingtheperformanceofkeywordsaggregatedonaregionallevelforGermany.Asa basis we use JOT's Google AdWords keyword data for 2016 and 2017, which contains dailyperformancemetricsofeachmanagedkeywordinthetheirAdWordportfolio–thegroupthatthekeyword belongs to, type of match, number of clicks/impressions, and the type of match (seeDeliverable D4.2 for details on the imported data). These keyword data are available daily percountry/region/city/date.TheAdWorddataweredeliveredasasetofarchives(ZIPfiles)toanFTPserverthatwassetupfortheprojectandrepresentGoogleAdWordsdumps–onearchivepereachmonthoftheanalysedyear.EacharchivecontainsTSVfilesforoneday'sworthofkeyworddataoftheextractedsetofkeywords.ToenabletheprocessingintheEnrichmentDatabaseandthetoolsofthe EW-Shopp platform, the datawere decompressed, transformed from TSV to CSV and split inchunks. These steps were also included in the data workflow deployed on the ProcessingComponentforfurtheronboardingofkeyworddata.

TheseconddatasetthatwasnecessaryforthePilotdeliverstheweatherenrichmentdata.Toobtainthese data, we extracted the coordinates for the polygons of all German regions as well as theboundingboxforGermany.AsdescribedinDeliverableD1.2,weuseECMWF9astheprimarysourceofweather forecasts andmeasurements.Using theboundingboxofGermany,we canobtain thebinaryGRIBfile10fortheentirecountry.ThiscanthenbequeriedusingtheECMWFweatheraccesslibrary that was developed to support the EW-Shopp platform11 to obtain weather on a regionallevel.ThelibraryenablesaccessingtheECMWFAPIstoobtainGRIBfilesforasetofcoordinatesandinterpolationoftheweatherdatawithinacertainregion.WesetupadailyautomaticprocessonthedeploymentoftheProcessingComponent(seeDeliverableD2.2)forthePilotthatusestheAPIsoftheaforementionedlibrarytogetthedataforGermany,extracttheweatherdata(actualandupto7-dayforecast)fortheregionsofGermanyandimportitintheEnrichmentDatabase.ThesedataareusedbythepredictivemodeloftheBusinessCasetodeterminethebestdatestolaunchkeywordsin the future. Additionally, as a base for developing the predictivemodels for the keywords, weextracted(inbatch)andimportedintheEnrichmentDatabasetheweatherdata(actualandupto7-dayforecast)pereachdayof2016and2017.

Tosumup,theflowandproceduresfordataingestionisthefollowing(seeFigure15):

1. Marketingcampaignstatisticsarecollecteddirectly fromtheGoogleAdWordsplatformbyAPItoJOT’scloudstorage.Asetofselecteddataforthedevelopmentofthepredictivemodels(inthecaseofthePilot–dataaboutGermany in2016and2017; ingeneral–definedbycountryandtemporalwindow)areuploadedtoanFTPserver(hostedonAmazonCloud).Then,therawdataaremadeavailabletotheProcessingComponentandpreparedtobepre-processedusing

9https://www.ecmwf.int/en/forecasts/datasets10https://www.predictwind.com/grib-files/11https://github.com/JozefStefanInstitute/weather-data


Grafterizer 2.0. This includes the preparation of the data to be imported in the EnrichmentDatabasebymappingtabularinputdatatoasetofJSONdocuments.

2. Weather data are collected from ECMWF using the API and ingested in the system to beintegratedwithmarketingdata(duringtheEnrichmentphasedescribedinthenextsection).

Figure15-DataingestionflowforBC4

1.8.1.2 EnrichmentThe first step of the Enrichment phase of the Business Case is to pre-process the data using theGrafterizer 2.0 tool. In order to create the pre-processing scripts, we manually extracted arepresentative sample of the input dumps. The full dataset is pre-processed using a scaled-upversionofthescriptsthatisdeployedontheProcessingComponent(seeDeliverableD2.2).Thepre-processing stepsaremeant toprepare thedata for importing to theEnrichmentDatabaseand tofilter out erroneous data. Any data that does not contain a valid AdWord or AdWord group getsfilteredout. Furthermore,we format the cities string names anddate strings so that they canbematched later with the enrichment data. For storage and integration purposes, we also createunique identifiers for each match entity (i.e., the set of clicks, impressions, etc. for a givencountry/region/city/date)byconcatenatingthekeyword'sADgroup,keywordID,cityofoccurrence,regionofoccurrence,anddateofoccurrence(definingfeaturesoftheMatchentity;seeFigure16).

Figure16.DatastructureoftheBC4keyworddataintheEnrichmentDatabase

Finally,wecreatedreadableEnglishlanguagelabels,basedontheinputdata,thatcanbeusedwhenexploring the data (name of the respective entity for keywords, regions and AD groups, and agenerated string for each match based on the keyword name, date of the match and place ofoccurrence).


Afterthecleaningphase(pre-processing),thenextstepistoenrichJOTdatawithweatherfeatures.TheASIAtoolcomeswithafunctionalitytodothat inaneasyway,givenacolumnthat isalreadyreconciledagainstageospatialKnowledgeGraph (KG). In thecurrent releaseofASIA,weconsiderGeoNamesas the referencegeospatialKG (i.e., theweatherextension function requiresa columnthat contains identifiers of GeoNames’ entities). In addition, this Pilot requires weather featuresrelated to spatial regions, insteadof cities, thus an additional extension sub-step is needed, sinceJOTdatacomewithcitiesnames.Inthenextsubsectionswedescribetheenrichmentsub-steps(cf.Figure18)indetails.

Figure17.Enrichmentpipelinedetails

Afterthecleaningphase(pre-processing),thenextstepistoenrichJOTdatawithweatherfeatures.TheASIAtoolcomeswithafunctionalitytodothat inaneasyway,givenacolumnthat isalreadyreconciledagainstageospatialKnowledgeGraph (KG). In thecurrent releaseofASIA,weconsiderGeoNamesas the referencegeospatialKG (i.e., theweatherextension function requiresa columnthat contains identifiers of GeoNames’ entities). In addition, this Pilot requires weather featuresrelated to spatial regions, insteadof cities, thus an additional extension sub-step is needed, sinceJOTdatacomewithcitiesnames.Inthenextsubsectionswedescribetheenrichmentsub-steps(cf.Figure18)indetails.

Figure18.Enrichmentpipelinedetails

1.8.1.2.1 ReconcileStrCitycolumntoStrCity_GNcolumnThe ASIA tool provides users with a function to reconcile column values against a KG12. Given acolumnwhich values arementionsof realworldentities (e.g., people, organizations, places, etc.),andaKGthatdescribethoseentities,ASIAisabletocreateanewcolumnwithreconciledentities,i.e.,itassociatestheURIoftheentitythatrepresentsthecorrespondingmentionintheselectedKG.Consideringacellwiththe“NewYork”valueandtheDBpediaKGasanexample,ASIAwillputtheURIhttp://dbpedia.org/resource/New_Yorkonthesamerowinthereconcilecolumn.

To enable the weather extension functionalities, ASIA requires to previously reconcile a columnagainstaspecificgeospatialKG,thatisGeoNames.

12Seehttp://inside.disco.unimib.it/index.php/asia/fordetailsaboutallASIA’sfunctionalities


Datasetsprovidedby JOT in this Pilot containdataaboutdigitalmarketing campaigns.Alongwithcampaignstrictly-related information(e.g.,campaignkeywordsandperformance indicators), thesedatasetsprovidealsoinformationabouttheplacewhereanaction(e.g.,aclickonadigitalbannerad)hasbeenperformed.Foreachrow,both thecityand theregionwhich itbelongs toaregiven(respectively,intheStrCityandStrRegioncolumns).ThemaingoalofthisPilotistoinvestigatehowtheweatherimpactsovermarketingcampaigns,attheregionlevel.Thus,theweatherinformation,whichisavailableatthecity-level13,mustbeaggregatedatregion-level.

As first step, we started reconciling the StrCity column to GeoNames. Doing this step isstraightforward using ASIA: by selecting the column we desire to reconcile, ASIA asks for thereconciliation service to use and returns a preview that allows the user to evaluate the resultsquickly (Figure19). Indeed, the reconciledcolumndisplay theURIof the reconciledentity,whichallows users to check the description page of that entity (e.g., a user can check if thehttp://sws.geonames.org/6555728 entity was correctly reconciled to the mention Freiburg byvisiting the description page of that entity at http://www.geonames.org/6555728/freiburg-im-breisgau.html).

Figure19.ASIAreconciliationpreview

Oncesatisfiedwith the resultquality, theusercanapply the reconciliationandanewcolumnwillappear in the original dataset. The new column is also annotated automaticallywith the inferred

13TheECMWFserviceisusedwithintheEW-Shoppprojectastheonlyproviderofweatherdata.DataaboutweatherarestoredinGRIBformat,whichisastandardformatforrepresentingweatherinformation.TheGRIBformatrepresentsweatherdataonagrid,where intersectionpointsarespecificcoordinatepairs.Giventhecoordinatesofacity,thecity-levelweatherdataarecomputedbyinterpolatingtheinformationavailableforthenearest4pointsinthegrid.


type(allthereconciledentitiesareofthistype),speedinguptheannotationprocessing14.Also,ASIAstores (internally) the new column as a reconciled column, and this enables the extensionfunctionalityonthatcolumn.

1.8.1.2.2 ExtendStrCity_GNColumntoparentADM1columnASIAprovidesuserswithdistinctextensionfeatures,thatallowtoextendareconciledcolumnwithadditionalinformationfrom:

1. the same KG used for the reconciliation, e.g., once a column is reconciled againstGeoNames,ausercanaddmorecolumnswithinformationfromGeoNames;

2. anexternalservice,e.g.,onceacolumnisreconciledagainstaspecificKG,ASIAallowsausertoextendthatcolumnwithdatafromcompatiblethirdservices.Asanexample,inASIAtheonlyKGcompatiblewiththeweatherextensionisGeoNames.

TheobjectiveofthisPilotistorunanalyticsattheregion-level,thus,weneedweatherdatarelatedtoregions, insteadofcities.ButsincewealreadyhavereconciledcitiesagainstGeoNames,wecanexploit the former extension feature in order to fetch additional data fromGeoNames and add anewcolumnwith theadministrative levelweare interested in. Indeed,GeoNames represents theregions of a country as entities of type A.ADMX (X-order AdministrativeDivision,whereX rangesfrom 1 to 4 and changes country-by-country; e.g., German regions are of type A.ADM1, becauseregions are the first-level administrative division inGermany). Entities of a specific administrativelevel are linked with entities of all the higher levels by means of the parentADMX property: forexample, entities of type A.ADM4 are linked with their next higher administrative level (of typeA.ADM3)throughthepropertyparentADM3,withthesecond leveladministrativedivisionthroughthepropertyparentADM2,andwith thehighest leveladministrativedivision throughthepropertyparentADM115.

With this knowledge, also thisextension taskbecomestraightforwardusingASIA.By clickingonareconciled column and choosing the “GeoNames Extension” feature, ASIA asks users for thepropertiestheyareinterestedinandthenstartsfetchingnodesintheKGthatarelinkedwitheachof the reconciled entities via the selected property. As in the reconciliation phase, also here apreview is shown,and theuser cancheck the resultsbeforeapplying themtoherdataset (Figure20).

14DetailsabouttheannotationfunctionalityofASIAaregiveninhttp://inside.disco.unimib.it/index.php/asia/15EachentityinGeoNamesislinkedwithallitshigherleveladministrativedivisions,thus,givenanentityitispossibletoreconstructitsadministrativehierarchybyreadingalltheparentADMXpropertieslinkedtoit.


Figure20.ExampleofaGeoNamesExtensionthataddsanewcolumnwiththeparentADM1

Attheendofthisprocedure,thestartingdatasetcontainsanadditionalcolumnwith identifiersofregionsrelatedtocampaignsthatcanbeusedtofetchweatherdatafromtheweatherservice16.

1.8.1.2.3 ExtendparentADM1columntoweathercolumnsNow that we have reconciled regions, we can easily fetch the weather information needed forpredictingfuturekeywordactivity.ASIAprovidesuserswithaformtoproperlycustomisethesetofweatherparameters to retrieve, asdepicted in Figure21. Inorder to getweather information, atleast two fundamental variables are needed: time and space. The spatial variable comes directlyfrom the reconciled column,while the temporal variable can bemanually selected, or read fromanothercolumn.Indetails,ausercanchoose:

• Aspecificdate,ora columnwitha listofdates toconsider (in this case,each rowwillbeextendedwithweatherobservationsbasedonthedatereadonthesamerow);ifthesecondoptionapplies,datesmustbecompliantwiththestandardISO8601;

• Alistofxaggregators(e.g.,min,max,avg,etc.)toapplytoweatherobservations;• A list of y weather parameters, which are the features relevant for the analytics (e.g.,

temperature,windspeed,etc.)17;

16Evenifitwouldbepossibletoreconcileregionnamesdirectly,forthesakeoncompleteness,andinordertoshowall thecapabilitiesofASIA,wedecidedtoreconcilecitynamesfirst,andthentoobtainthereconciledregionentitiesbyexploitingtheextensionfunctionofASIA.17Pleaserefer tohttps://github.com/JozefStefanInstitute/weather-data/wiki/Weather-features foradetailedexplanationaboutweatherfeaturesavailableinASIA


• Alistofwoffsets,whichareusedtocomputeweatherforecasts,e.g., iftheoffsetis1andthedateistoday,weareinterestedintheforecastfortomorrow.

Figure21.WeatherExtensionconfigurationform

Subsequently,asetofnewx·y·wweathercolumnsispreviewedandthenaddedtothedataset.InthisPilot,wesettheaboveparametersasfollow:

• DatesreadfromtheDatecolumn(whicharecompliantwiththeISO8601format,thankstothepre-processingexplainedbefore);

• Noaggregators;• Keepalltheavailableparameters(14parameters,thatare10u,10v,2d,2t,rh,sd,sf,sp,ssr,

sund,tcc,tp,vis,andws);• Alltheavailableoffsets(8offsets,from0to7).

Hence, the enrichment phase for this Pilot ends by appending 112 new columns to the originaldataset(Figure22showsthepreviewofthisextension18).

18Weuseapre-definedpatterntosettheheaderforeachnewcolumn,thatisWF_paramID_datetime_offset.


Figure22.WeatherExtensionpreview

1.8.1.3 AnalyticsThegoalofanalytics intheBC4Pilot istopredict futurekeywordactivity–namely,thevolumeofdailykeywordimpressions–fromcontextweatherdata.Historicimpressiondataisusedforbuildingamodel.Thenumberofimpressionsisgivenforeachkeywordonadailylevelforeachgeographicalregionandhasbeenlinkedtoitsrespectiveweatherdatainthepreviousenrichmentstep.

Thepredictiontaskcanbeformalizedasfollows:

Input:Atimeseriesofhistoricdaily impressiondataforakeywordKinaregionRlinkedtoweatherconditionsfortheregion.Thedataisgivenintabularformatwithonerowofdataperday.

Output:AregressionmodelpredictingthenumberofimpressionsofkeywordKintheregionRgivenanewvectorofweatherdatafortheregion.

UsingtheEW-ShoppanalyticstoolsthegivenpredictiontaskissolvedbybuildingaSupportVectorRegression(SVR)modelinQMiner.Theanalyticspipeline19isidenticalforallkeywordssothesamespecificationJSONfilecanbeusedforallofthem,makingautomationofthisprocesssimple.Notethat from a technical standpoint it does not matter over what region the impression data isaggregated. If necessary, the per-region data could be aggregated to the entire country in thepreceding steps and no change would be needed in the analytics specification. Such a naïve

19 https://github.com/JozefStefanInstitute/ew-shopp-public/tree/master/analytics/pipeline


approach isnot likely tohavegoodresults for thisPilot though,as it isunlikely forweather tobeveryhomogenousoveralargecountrysuchasGermany.

The approach described above solves the Pilot prediction task, but unfortunately the experiencegainedfromPilot implementationshowthatpredictionsforsinglekeywordsaretoo impracticaltouse inpractice. There are toomanydifferent keywords and the approachdoesnot scale.Weareexploring possibilities of clustering the keywords into groups by their semantics (e.g., football-relatedkeywordsorcell-phone-relatedkeywords).Thiswouldallowaggregationofimpressionsoverthe entire group and reduce the number of models needed. Per group modelling could still beperformedusingthesameapproachastheper-keywordmodellingdescribedinthissection.

1.8.1.4 VisualisationRegarding visualisation for BC4, theneed is to present in a concise and visually effectiveway the‘performance’ of certain keywords in certain regions. In this case a visualisation was set-up forGermanregionsandasetof relevantkeywords inGerman.The idea is thatusersare JOTaccountmanagers involved incampaignsand interested ingetting insightsaboutkeywordsperformance invarious regions coveredby the campaign. Starting fromweekly data the cockpit shouldpresent asnapshotofdailykeywordperformance(measuredasthesumofimpressionsperday).Additionallytheusershouldbeabletodrillintoregions.Aswellastoe.g.,filterbyday,keywordetc.

Tothisendaunifiedweeklycockpitwascreatedwithrelateddatasets(foranexplanationofadatasetpleaseseeD3.3).Themainvisualisation isaheatmap chartwere theXshowsdaysofweek,YshowskeywordsandtheZshowstheimpressionsperday(representedbyacolourscalefromyellow–lessimpressions–tored–mostimpression).Additionallyaglobaldailyperformancechart(line)isalsodisplayed:thisshowstheoverallperformanceforeachday.Finallyacross-tablelistingkeywordsandtotal impressionsperkeyword isdisplayed inthesamedashboard. In thiswayacompactandquick,yetrathercompletevisualsnapshotofweeklyperformanceisprovided(e.g.,thecross-tablevaluesarecolour-coded).Figure23showsascreenshotofthecockpitwhentheuseropensit.Whilealready this visualization is rather effective, as is would be rather static. To this end a set ofdynamicityanddatanavigation/explorationfeaturesareembeddedinthecockpit.First,asforallKnowagecockpits, foralldatasetsusedappropriateassociations (seeD3.3. fordetails)havebeencreated:inthiswaywhenevertheuserclicksonanydatapointofthewidgetalloftheothervisualelementsandwidgetsareupdatedaccordingly. If,forexample,theuserclicksonakeywordinthecross table, all the cockpit dashboardwill update and showdata relative to that specific keywordonly:forexampleinFigure24theuserselectedfirstasingleandthentwokeywords:noticehowallvisual elements reflect the user interaction. Additionally a specific checkbox widget for filteringkeywords(includingmultiselect)hasbeenadded.


Figure23.BC4weeklykeywordperformancecockpit

Figure24.BC4weeklykeywordcockpit-selectionofasinglekeyword(above)andtwokeywords(below)

Ina similar fashion,usersmightwant to filterbyday(s) todrilldown into theperformanceof thedifferent keywords on a daily basis, or compare two or more different days, etc.: to this end amultiselect checkbox has been added. In, , an examplewhere the user selected different days isshown.Ofcoursebothfilterscanbecombined(anexamplescreenshotisshowninFigure26).


Figure25.BC4weeklykeywordcockpit–filteringbydays

Figure26.BC4weeklykeywordcockpit–combiningdayandkeywordfilters

Inadditiontotheweeklykeywordcockpitanadditionaldedicatedcockpitwhichfocusesonregionalperformancewasprovided, enablingusers to focuson regional analysis of theweek. The cockpit,whichisshowninFigure27,presentsaheatmapsimilartothepreviousonebutinthiscaseshowingthedateontheXaxis,regionsontheYandagainactivationsasZ(shownthroughacolourscale).Across-table linking regions, keywords and activations is provided (with colour coding) aswell as asimple tree map visualization where regions are displayed and area is linked to the sum ofactivations.Like inthepreviouscockpit,aswellasallwidgetsbeing interactiveandlinkedthroughdata associations, a series of checkboxes to filter the visualization and drill into certain data areprovided(seeFigure28forsomeexamples).


Figure27.BC4regionalanalysiscockpit

Figure28.Examplescreenshotsofvariousfiltersappliedtotheregionalanalysisdashboard



This Pilot also adheres to the EW-Shopp dataflow methodology as shown above. The entireEnrichment process has been implemented using Grafterizer+ASIA. Regarding visualisation thecockpitshavebeenrealisedwithKnowage.


AsfarastheEnrichmentisconcerned,nocriticalitiesaretobereported.Fromthevisualisationpointof view, while this is not critical, we would like to further explore cockpits and data navigationinterfaces for users, in particular the more ‘manager’ profile, e.g., with a weekly (or monthly)performancedashboardorKPIs.Regardinganalyticsandrelatedvisualisation,asexplainedabovewewill try to cluster keywords vs. using single keywords, and therefore the idea is to upgrade thecockpits toactuallybebasedonkeywordgroups (e.g., ‘categories’).Additionallywewould like tobetterintegratethepredictionsandautomatetheupdateofthecockpits.Thiswillrequireupdatingthedataflow so that, e.g., onaweeklybasis, thepredictivemodel is fed to thedatabaseusedasKnowagedatasourceandinturna‘lastweek’and‘nextweek’cockpitautomaticallyupdated.

ConclusionsandOutlook

InthisdeliverableD2.3wereportedthemonth24evaluationoftheEW-Shopptools.Theapproachadopted (anddescribed inChapter1)was toutilizeanagreeddataflow logical-functionalwhich isconsistentforallEW-ShoppPilotsalbeitthesePilotsarequitedifferentfromeachotherandshowveryspecificfeaturesandcharacteristicssuchasbusinessscenarios,datasetsemployed,timescopesand final targetusers.ForeachPilot thedataflowwasdescribedand theEW-Shoppcoverageandcompliancewasassessedindetailproviding,whereneeded,ananalysisofcurrentshortcomingsandrelatedcorrectivemeasures.

OverallwecanconcludethatevaluationforallPilotswassuccessfulwiththenaturalimprovementsrequiredgiventhetimeframeandthefactthatafurtherreleaseofthecompletetoolset(formerlyreferred to as ‘Platform), is foreseen inone year’s time (i.e. atmonth36). To this endEW-Shoppconsortiumisalreadyatworktoimplementsuchcorrectivemeasuresandimprovethetoolsthroughvarious internal iterationsand includingall therequiredactorsandactions.Anupdatedversionofthisdeliverablereportingthefinalevaluationwillalsobereleasedatmonth36.

D2.3 EW-Shopp Platform evaluation assessment. v1.1€¦ · Grant n. 732590 D2.3 - EW-Shopp Platform...

Documents

Transcript of D2.3 EW-Shopp Platform evaluation assessment. v1.1€¦ · Grant n. 732590 D2.3 - EW-Shopp Platform...