New Frontiers in Business Intelligence: Distribution …...network for distributed caching of OLAP...

New Frontiers in Business Intelligence:

Distribution and Personalization

Matteo GolfarelliUniversity of Bologna - Italy

2

Summary

� The challenges of BI 2.0

� Motivating scenario and envisioned architecture: Business Intelligence Networks

� Distribution� Research issues� A mapping language� Query reformulation

� Personalization → Patrick Marcel

3

From BI 1.0 to BI 2.0

� Business intelligence (BI) transformed the role of computer science in companies from a technology for storing data into a discipline for timely detecting key business factors and effectively solving strategic decisional problems

� In the current changeable and unpredictable market scenarios, the needs of decision makers are rapidly evolving

� To meet the new, more sophisticated user needs, a new generation of BI systems (BI 2.0) has been emerging

4

Issues in BI 1.0

� Performance optimization (query plans, materialized views, indexing, etc.)

� Logical design� Conceptual design methodologies and

formalisms � ETL modeling and automation� Testing the DW� ....

5

Issues in BI 2.0

� BI as a service� On-demand BI� Real-time BI� Situational BI� Collaborative BI� Pervasive BI� ....

6

Motivating scenario

� Cooperation is seen today by companies as one of the major means for increasing flexibility and innovating so as to survive in today uncertain and changing market

� Companies need strategic information about the outer world, for instance about trading partners and related business areas

� It is estimated that above 80% of waste in inter-company and supply-chain processes is due to a lack of communication between the companies involved

7

Motivating scenario

� In such a distributed business scenario, where multiple partner companies/organizations cooperate towards a common goal, traditional BI systems are no longer sufficient to maximize the effectiveness of decision making processes

� Two new significant requirements arise:� Cross-organization monitoring and decision making

Accessing local information is no more enough, users need to transparently and uniformly access information scattered across several heterogeneous BI platforms

� Pervasive and personalized access to informationUsers require that information can be easily and timely accessed through devices with different computation and visualization capabilities, and with sophisticated and customizable presentations

8

Envisioned architecture

� Business Intelligence Network (BIN):a dynamic, collaborative network of peers, each hosting a local, autonomous BI platform1. Each peer relies on a local multidimensional schema that

represents the peer's view of the business, and it offers monitoring and decision support functionalities to the other peers

2. Users transparently access business information distributed over the network in a pervasive and personalized fashion

3. Access is secure, depending on the access control and privacy policies adopted by each peer

4. Participants are collaborative, even if with different grades5. Inclination to collaboration does not reduce autonomy of

participants, who are not subject to a shared schema6. A BIN is decentralized and scalable because the number of

participants, the complexity of business models, and the workload can change

9

Envisioned architecturep

eer i

queryforwarding

queryreformulation

query resultreconciliation

local queryprocessing

peer N

peer 1

BusinessIntelligence

Network

local BI platform

access policiesresolution

local MD schema

mappings

10


peer i

queryforwarding

queryreformulation



peer N

peer 1


Network

local BI platform


local MD schema

mappings

interacts with the peer’s BI platform to obtain results from the local data

11


eer i

queryforwarding

queryreformulation



peer N

peer 1


Network

local BI platform


local MD schema

mappings

uses the semantic mappings established towards the peer neighbors to reformulate queries accordingly

12


peer i

queryforwarding

queryreformulation



peer N

peer 1


Network

local BI platform


local MD schema

mappings

applies query routing policies to select the most relevant peers to forward a query to

13


eer i

queryforwarding

queryreformulation



peer N

peer 1


Network

local BI platform


local MD schema

mappings

collects and integrates the results coming from the peers

14


peer i

queryforwarding

queryreformulation



peer N

peer 1


Network

local BI platform


local MD schema

mappings

sets policies for data sharing depending on the degree of trust between participants

15

A typical user interaction

DW

DB

DW

DB

DW DB

Milan

Bologna

Florence

Naples

Rome

A user formulates an OLAP query q by accessing the local multidimensional schema of her peer

16


DW

DB

DW

DB

DW DB

Milan

Bologna

Florence

Naples

Rome

She can annotate q by a preference that enables her to rank the returned information according to her specific interests

17


DW

DB

DW

DB

DW DB

Milan

Bologna

Florence

Naples

Rome

To enhance the decision making process, q is forwarded to the network and reformulated on the other peers in terms of their own multidimensional schemata

18


DW

DB

DW

DB

DW DB

Milan

Bologna

Florence

Naples

Rome

Each involved peer locally processes the reformulated query and returns its (possibly partial or approximate) results to the querying peer

19


DW

DB

DW

DB

DW DB

Milan

Bologna

Florence

Naples

Rome

The results are integrated, ranked according to the preference expressed by the user, and returned to the user based on the lexicon used to formulate q

20

Research issues

� Query reformulation on peers is a challenging task due to the presence of aggregation and to the possibility of having information represented at different granularities in each peer

� To optimize query answering across the network, query routingstrategies that forward queries to the most promising peers only are needed

� The strategic nature of the exchanged information and its multidimensional structure require advanced approaches for security

� Mechanisms for controlling data provenance and quality in order to provide users with information they can rely on should be devised

� A unified, integrated vision of the heterogeneous information collected must be returned to users through object fusiontechniques

21

Query reformulation

� Mapping language:�Handling the asymmetry between dimensions and

measures�Specifying the relationship between two attributes

of different multidimensional schemata in terms of their granularity

�Considering aggregation operators to avoid the risk of inconsistent query reformulations

�Expressing also mappings at the instance level to transcode data

(Golfarelli et al., 2010)

22

HOSPITALIZATION

costdurationOfStay

ward

unit

patientbirthDate

gender

segment

city region

diagnosiscategory

ADMISSIONS

totStayCosttotExamCost

totLengthnumAdmissions

ward

LHD

patientBirthYear

patientGender

patientCity patientNation

datemonthyear

disease

@Rome

@Florence

datemonthyear

week

organ

Query reformulation

� Mapping language:

23

Query reformulation

HOSPITALIZATION

costdurationOfStay

ward

unit

patientbirthDate

gender

segment

city region

category

ADMISSIONS



ward

LHD

patientBirthYear

patientGender


monthyear

disease

@Rome

datemonthyear

week

organ

@Florence

date

diagnosis

same

roll-up

equi-level

equi-level

� Mapping language:mappings can be annotated with

encoding functions

drill-down

24

Md-Schema@peeriMd-Schema@peerj

Semantic Mappings

Mappingtranslation

Schematranslation

Schematranslation

s-t tgds

OLAP Query

Relational Query

Querytranslation

Query reformulation

� Framework:�To translate semantic mappings we use a logical

formalism called source-to-target tuple generating dependencies (ten Cate & Kolaitis, 2010)

25

Example: Schema translation

ward

unit

diagnosiscategory

ADMISSIONS



patientBirthYear

patientGender


datemonthyear

@Florence

HOSPITALIZATION

costdurationOfStay

patientbirthDate

gender

segment

city region

ward

LHD

disease

@Rome

datemonthyear

week

organ

HospFT(organ,disease,date,ward,patient,cost,durationOfStay)OrganDT(organ)DiseaseDT(disease)DateDT(date,week,month,year)WardDT(ward,LHD)PatientDT(patient,birthDate,city,region,segment,gender)

AdmFT(diagnosis,date,ward,patientCity,patientBirthYear,patientGender,totStayCost,totExamCost,totLength,numAdmissions)

DiagnosisDT(diagnosis,category)DateDT(date,month,year)WardDT(ward,unit)PatientCityDT(patientCity,patientNation)PatientBirthYearDT(patientBirthYear)PatientGenderDT(patientGender)

26

Example: Query translation

� Total hospitalization costs for region and yearπregion,year,SUM(cost) (HospFT DateDT PatientDT)

q(R,Y,SUM(C)) ←HospFT(_,_,D,_,P,C,_),

DateDT(D,_,_,Y),PatientDT(P,_,_,R,_,_))

HOSPITALIZATION

costdurationOfStay

patientbirthDate

gender

segment

city region

ward

LHD

disease

@Rome

datemonthyear

week

organ

27

Example: Mapping translation

∀S,E,C (AdmFT(_,...,S,E,_,_), C=S+E→HospFT(_,...,C,_)

HOSPITALIZATION

costdurationOfStay

ward

unit

patientbirthDate

gender

segment

city region

category

ADMISSIONS



ward

LHD

patientBirthYear

patientGender


monthyear

disease

@Rome

datemonthyear

week

organ

@Florence

date

diagnosis

same

28

Example: The rewriting

� The group-by is reformulated using the roll-upmapping from region to patientCity, while measure cost is derived using the samemapping

πyear,patientCity,SUM(totStayCost+totExamCost) (AdmFT DateDT PatientCityDT)

ward

unit

diagnosiscategory

ADMISSIONS



patientBirthYear

patientGender


datemonthyear

@Florence

29

Personalization

� The goal of personalization is to deliver information that is relevant to an individual or a group of individuals in the most appropriate format and layout� Recommendation: the system suggests new queries to

support users in navigating the cube (Giacometti et al., 2009)

� Personalized visualization: the user specifies constraints that are used to determine a preferred visualization according to a user profile (Bellatreche et al., 2005)

� Ranking: query results are organized in a total or partial order so that the user visualizes only the “most relevant” tuples (Golfarelli et. al., 2011).

� Contextualization: the query is enhanced by adding predicates that depend on the context (Jerbi et al. 2008)

30

Thank you for you attention

Questions?

31

Related readings

� Abiteboul, S. Managing an XML warehouse in a P2P context. In Proc. CAISE, 2003� Banek, M., Vrdoljak, V., Min Tjoa, A., & Skocir, Z. Automated integration of heterogeneous

data warehouse schemata. IJDWM, 4(4), 2008� L. Bellatreche, A. Giacometti, P. Marcel, H. Mouloudi, D. Laurent. A personalization

framework for OLAP queries. In Proc. DOLAP, 2005� J. Chomicki. Preference formulas in relational queries. ACM TODS, 28(4), 2003� Cui, Y., & Widom, J. Lineage Tracing for General Data Warehouse Transformations. JVLDB,

12(1), 2003� da Silva, P.P., McGuinness, D.L., & McCool, R. Knowledge Provenance Infrastructure. IEEE

Data Engineering Bulletin, 26(4), 2003� Dubois, D., & Prade, H. On the use of aggregation operations in information fusion processes.

International Journal on Fuzzy Sets and Systems, 142(1), 2004� P. Georgiadis, I. Kapantaidakis, V. Christophides, E. M. Nguer, and N. Spyratos. Efficient

rewriting algorithms for preference queries. In Proc. ICDE, 2008� A. Giacometti, P. Marcel, E. Negre. Recommending MDX Queries. In Proc. DaWaK, 2009� Golfarelli, M. Rizzi., S., & Biondi, P. myOLAP: An approach to express and evaluate OLAP

preferences. To appear on IEEE TKDE, 2011� M. Golfarelli, F. Mandreoli, W. Penzo, S. Rizzi, E. Turricchia. Towards OLAP Query

Reformulation in Peer-to-Peer Data Warehousing. In Proc. DOLAP, 2010� Halevy, A. Y., Ives, Z. G., Madhavan, J., Mork, P., Suciu, D., & Tatarinov, I. The Piazza Peer

Data Management System. IEEE TKDE, 16(7), 2004� Hoang, T. A. D., & Binh Nguyen, T. State of the art and emerging rule-driven perspectives

towards service-based business process interoperability. In Proc. Int. Conf. on Computing andCommunication Technologie, 2009

32

Related readings

� H. Jerbi, F. Ravat, O. Teste, G Zurfluh. Management of context-aware preferences inmultidimensional databases. In Proc. ICDIM, 2008

� P. Kalnis, W. Siong Ng, B. Chin Ooi, D. Papadias and K.-L.Tan. An adaptive peer-to-peer network for distributed caching of OLAP results. In Proc. SIGMOD Conference, 2002

� Kehlenbeck, M., & Breitner, M. H. Ontology-based exchange and immediate application ofbusiness calculation definitions for online analytical processing. In Proc. DAWAK, 2009

� W. Kießling. Foundations of preferences in database systems. In Proc. VLDB, 2002� Mandreoli, F., Martoglia, R., Penzo, W., & Sassatelli S. SRI: exploiting semantic information

for effective query routing in a PDMS. In Proc. ACM Int. Workshop on Web Information andData Management, 2006

� Mecca, G., Papotti, P., & Raunich, S. Core Schema Mappings. In Proc. ACM SIGMOD Int.Conf. on Management of Data, 2009

� K. Stefanidis, E. Pitoura, P. Vassiliadis. Modeling and Storing Context-Aware Preferences. In Proc. ADBIS, 2006

� Sung, S., Liu, Y., Xiong, H., & Ng, P. Privacy preservation for data cubes. Knowledge andInformation Systems, 9(1), 2006

� Tatarinov, I. & Halevy, A.Y. Efficient Query Reformulation in Peer-Data ManagementSystems. In Proc. ACM SIGMOD Int. Conf. on Management of Data, 2004

� B. ten Cate and P. G. Kolaitis. Structural characterizations of schema-mapping languages.Comm. ACM, 53(1), 2010

� Torlone, R. Two approaches to the integration of heterogeneous data warehouses. Int. Journ.on Distributed and Parallel Databases, 23(1), 2008

� D. Xin, J. Han, H. Cheng and X. Li. Answering Top-k Queries with Multidimensional Selections: The Ranking Cube Approach. In Proc. VLDB, 2006

New Frontiers in Business Intelligence: Distribution …...network for distributed caching of OLAP...

Documents

Transcript of New Frontiers in Business Intelligence: Distribution …...network for distributed caching of OLAP...