Value-driven Approach Designing Extended Data Warehouses

13
Value-driven Approach Designing Extended Data Warehouses Laboratoire d’Informatique et d’Automatique pour les Systèmes DOLAP’2019, Lisbon, Marsh 26, 2019 Nabila BERKANI & Selma KHOURI ESI Algiers, Algeria (n_berkani, s_khouri)@esi.dz Ladjel BELLATRECHE LIAS/ISAE-ENSMA Poitiers, France [email protected] Carlos ORDONEZ University of Houston USA [email protected]

Transcript of Value-driven Approach Designing Extended Data Warehouses

Value-driven Approach Designing Extended Data Warehouses

Laboratoire d’Informatique et d’Automatique pour les Systèmes

DOLAP’2019,Lisbon,Marsh26,2019

NabilaBERKANI&SelmaKHOURI

ESIAlgiers,Algeria

(n_berkani,s_khouri)@esi.dz

Ladjel BELLATRECHE

LIAS/ISAE-ENSMAPoitiers,France

[email protected]

CarlosORDONEZ

University ofHoustonUSA

[email protected]

Impact of Big Data on DW

2

2017-Present1999-2014 2015-Present 1998-2015

DaWak Conference DOLAPWorkshop

2016:Thinking

30 years of existence: Maturity

3

DataSourcesRequirements

Mappings

MultidimensionalModeling

Field

Origin

Year

Author

Discipline

Temp1

Temp2

Join

Filter

LoadtoDSA

LoadtoDSA

ExtractStore

Extract-Transform-Load

Variety

QuestofValue

Deployment

Extract

Exploitation

Store

Relational

MappingsSources/RequirementsInstanceExtractionDWSchemaDefinitionCross-phasedesign

Designers,DataPreparators,Architects Administrators,

“Deployers”

☛ ActorsoftheDesign

DataAnalysts

☛ ActorsofExploitation

1. Designlife-cyclewellidentified

2.DiversityofActors

àAugmentationofDWbyBigDataVs

Agenda

4

qValue&Variety(2Vs)

qAugmentingDWbyLinkedOpenData

q2Vs-drivenDesignApproach

qCaseStudy

qSummary

Value: # places

5

DecisionMakers

IntegrationN.W.Paton

Deployment Exploitation

Sources

Requirements

Queries,Statistics,…

Visualisation

Decision

Analysis

Valueà userfeedback:N.Konstantinou

§ Variety & Value§ LOD & DW§ 2Vs Design Approach§ Case Study§ Summary

CM

Value

Valueà offeredservices[D.Bork]

Valueà usageofmodernarchitectures:Teradata

Valueà newprogrammingparadigms:Spark

§ FR:Valueàmoney [A.G.Sutcliffe’2018]§ NfR:Valueà satisfactionofqualities(security,privacy,…)

àinterdependencies betweenvalue(phases)&value(operationalDW)

Valueà integrationofnewresources(LOD,…)

àRecenteffortsonbuildingvalueontologies:T.P.Sales,F.A.Baião,G.Guizzardi,J.P.A.Almeida,N.Guarino,J.Mylopoulos:TheCommonOntology

ofValueandRisk.ER2018:121-135

Value increases Variety

6

Person in Charge (PiC) of Value

VALEUR

q Examplesofrequirementsrelatedtovalue1 :• Media:Hasthecoverageofmediachangedovertime?

• Politics:SpeechesEUparliamentthatcontain« human

rights »bycountry

• Finance:EvolutionofDebatesrelatedtoGreececrisisby

countryq Measurementofthevaluedepends onthestudied

domain

➡ InteractionbetweendesignersandPiC ofvalue:multidisciplinary inDW

à Usage of Linked Open Data :• Traditional Management of Variety

+• Variety of Formalisms (graphs)

VARIETY

InternalSources ExternalSources+

Designer

LibrariesDB Newspapers

1http://www.talkofeurope.eu/data

§ Variety & Value§ LOD & DW§ 2Vs Design Approach§ Case Study§ Summary

Augmenting DW by 2Vs

7

DataSourcesRequirements

Mappings

MultidimensionalModeling

Field

Origin

Year

Author

Discipline

Temp1

Temp2

Join

Filter

LoadtoDSA

LoadtoDSA

ExtractStore

ETL(Variety)

Deployment(Variety)

Extract

Exploitation(Value)

MappingsSources/RequirementsInstanceExtractionDWSchemaDefinitionCross-phasedesign

Designers,DataPreparator,ArchitectsAdministrators,

“Deployer”

☛ ActorsoftheDesign

DataAnalysts,“PiC ofvalue”

☛ ActorsofExploitation2.DiversityofActors

High Variety of Sources || Global Processing

Storen

Relational

Store1

Graph

§ Variety & Value§ LOD & DW§ 2Vs Design Approach§ Case Study§ Summary

+

Formalisationq Inputs:1. Setofinternalsources:SInt ={SI1,SI2,…,SIm}2. Setofexternalresources:SExt =={SE1,SE2,…,SEn}3. Eachsource(internal/external)Si has:

§ Itsownphysicalformat(Fi)§ ItsconceptualmodelCMi

§ IsrelatedtoadisciplineD(medicine,engineering,etc.)

4. Setofrequirementstobesatisfied5. [Optional]:AnoperationalDW([Ravat etal.2017]),where:

§ ItsconceptualmodelCMDW

§ Itsformat(s)Format(SDW)={f1,f2,…,fk}à polystore storage

q Objective:§ DefinitionofallphasesofDWaugmentingitsvalue

q Challenges:§ MetricsofValue

Value(DW)=Operator(1≤i≤n+m)[Weight(Si ,D)*Value(Si)];Si ∈ Sint∪ Sext[Ballouetal.]*

*Ballou,D.P.,&Tayi,G.K.(1999).Enhancing dataquality indatawarehouse environments.Com oftheACM, 42(1),73-78

8

§ Variety & Value§ LOD & DW§ 2Vs Design Approach§ Case Study§ Summary

# Scenarios

(b)ParallelDesign (c)Query-drivenDesign

ETL

Users

Requirements

LOD

Synchronise

InternalSources

ETLLOD

ETL

LOD

Users

InternalSources

(a)SerialDesign

Requirements

DW

ETL

Requirements

LOD

InternalSources

Users

On-demandLODgraphs

materialized

GraphQuery

MDQuery

PoolofResults

QueryResults

visualized

Merge

Materialize

DW

LODCubeInternalCube

ETL-LOD

OLAPQueries

DataCube

ED

ExtractTemp1

Temp2

Join

Filter

LoadtoDSA

LoadtoDSAExtract

Store

InternalETL ExternalETLTemp1

Temp2

Join

Filter

LoadtoDSA

LoadtoDSAExtract

Store

9

3Scenarios

☛ LOD is seen as source ☛ Two Parallel ETL ☛ On-demand ETL: data extracted from the existing DW and LOD,then potentially loaded into DWà (requirement satisfaction)

§ Variety & Value§ LOD & DW§ 2Vs Design Approach§ Case Study§ Summary

1. Pivotschema: genericschemavs.LODschema(graph)2. Redefinitionofoperators(overloading)3. Synchronisation ofinternalandexternaldata:3scenarios

Challenges?

Value Metrics

qThreemetricsrelatedto:

1. Requirementsatisfaction

𝑽𝒂𝒍𝒖𝒆(𝑹𝒆𝒒, 𝑺𝒊) =𝒏𝒖𝒎𝒃𝒆𝒓𝒐𝒇𝒓𝒆𝒔𝒑𝒐𝒏𝒔𝒆𝒔𝒐𝒇𝒓𝒆𝒒𝒖𝒊𝒓𝒆𝒎𝒆𝒏𝒕𝒐𝒏𝑺𝒊𝒏𝒖𝒎𝒃𝒆𝒓𝒐𝒇𝒓𝒆𝒔𝒑𝒐𝒏𝒔𝒆𝒔𝒐𝒇𝒂𝒍𝒍𝒓𝒆𝒒𝒖𝒊𝒓𝒆𝒎𝒆𝒏𝒕𝒔

2. Conceptualmodelling(multidimensionalconcepts)

𝑽𝒂𝒍𝒖𝒆 𝑪𝒐𝒏𝒄𝒆𝒑𝒕𝒔, 𝑺𝒊 =𝒏𝒖𝒎𝒃𝒆𝒓𝒐𝒇𝒄𝒐𝒏𝒄𝒆𝒑𝒕𝒔𝒐𝒇𝑫𝑾𝒔𝒄𝒉𝒆𝒎𝒂𝒃𝒚𝒊𝒏𝒕𝒆𝒈𝒓𝒂𝒕𝒊𝒏𝒈𝑺𝒊

𝒕𝒐𝒕𝒂𝒍𝒏𝒖𝒎𝒃𝒆𝒓𝒐𝒇𝒕𝒉𝒆𝑫𝑾𝒄𝒐𝒏𝒄𝒆𝒑𝒕𝒔

3. TargetDWpopulation

𝑽𝒂𝒍𝒖𝒆(𝑰𝒏𝒔𝒕𝒂𝒏𝒄𝒆𝒔, 𝑺𝒊) =𝒏𝒖𝒎𝒃𝒆𝒓𝒐𝒇𝒊𝒏𝒔𝒕𝒂𝒏𝒄𝒆𝒔𝒐𝒇𝑫𝑾𝒃𝒚𝒊𝒏𝒕𝒆𝒈𝒓𝒂𝒕𝒊𝒏𝒈𝑺𝒊

𝒕𝒐𝒕𝒂𝒍𝒏𝒖𝒎𝒃𝒆𝒓𝒐𝒇𝒊𝒏𝒔𝒕𝒂𝒏𝒄𝒆𝒔𝒐𝒇𝒕𝒉𝒆𝑫𝑾

Value(DW) =Operator(1≤i≤n+m)[weight(Si,D)*Value(Si)],whereSi ∈ Sint∪ SExt

10

§ Variety & Value§ LOD & DW§ 2Vs Design Approach§ Case Study§ Summary

Case Study

11

§ 4internalsourcesgeneratedfromLUBMbenchmark§ 15initialrequirements

☛ UniversityResearchAnalysis

§ Variety & Value§ LOD & DW§ 2Vs Design Approach§ Case Study§ Summary

q Analysis:§ 6requirementsarenotsatisfiedbyinternalsources(Oracle12crelease1)

àExternalsource:Dbpedia

Experiments

12

MetricsSources

Dimensions/Measures

Value(S*)MD Value(S*)Req. Value(S*)Instances Instances Responsetime

Internal Sources 6/1 31% 6% 10% 550K 1.1Serial Design 10/7 71% 80% 94% 7,7x106 3.2ParallelDesign 11/8 73% 84% 85% 3,1x106 2.6Query-drivendesign 12/8 74% 96% 84% 2,9x106 1.7

§ Variety & Value§ LOD & DW§ 2Vs Design Approach§ Case Study§ Summary

*Allsourceshavethesameweight*Operator:Avg

AugmentedSchema

Summary✔2Vs for the DW renaissance✔Value = pool of multidisciplinary expertise✔DW life cycle design revisited (new formalization)✔3 augmented scenarios☛Veracity & 2V☛More automation (query rewriting)☛Value Query Language (Thank Patrick)

13

§ Variety & Value§ LOD & DW§ 2Vs Design Approach§ Case Study§ Summary

Special issue on: Business Intelligence and Analytics for Value Creation inthe Era of Big Data and Linked Open Data: International Journal ofInformation Management, Elsevier (Q1; IF=4.810)