New Frontiers in Business Intelligence: Distribution …...network for distributed caching of OLAP...

Click here to load reader

  • date post

    14-Jul-2020
  • Category

    Documents

  • view

    0
  • download

    0

Embed Size (px)

Transcript of New Frontiers in Business Intelligence: Distribution …...network for distributed caching of OLAP...

  • New Frontiers in Business Intelligence:

    Distribution and Personalization

    Matteo GolfarelliUniversity of Bologna - Italy

    2

    Summary

    � The challenges of BI 2.0

    � Motivating scenario and envisioned architecture: Business Intelligence Networks

    � Distribution� Research issues� A mapping language� Query reformulation

    � Personalization → Patrick Marcel

  • 3

    From BI 1.0 to BI 2.0

    � Business intelligence (BI) transformed the role of computer science in companies from a technology for storing data into a discipline for timely detecting key business factors and effectively solving strategic decisional problems

    � In the current changeable and unpredictable market scenarios, the needs of decision makers are rapidly evolving

    � To meet the new, more sophisticated user needs, a new generation of BI systems (BI 2.0) has been emerging

    4

    Issues in BI 1.0

    � Performance optimization (query plans, materialized views, indexing, etc.)

    � Logical design� Conceptual design methodologies and

    formalisms � ETL modeling and automation� Testing the DW� ....

  • 5

    Issues in BI 2.0

    � BI as a service� On-demand BI� Real-time BI� Situational BI� Collaborative BI� Pervasive BI� ....

    6

    Motivating scenario

    � Cooperation is seen today by companies as one of the major means for increasing flexibility and innovating so as to survive in today uncertain and changing market

    � Companies need strategic information about the outer world, for instance about trading partners and related business areas

    � It is estimated that above 80% of waste in inter-company and supply-chain processes is due to a lack of communication between the companies involved

  • 7

    Motivating scenario

    � In such a distributed business scenario, where multiple partner companies/organizations cooperate towards a common goal, traditional BI systems are no longer sufficient to maximize the effectiveness of decision making processes

    � Two new significant requirements arise:� Cross-organization monitoring and decision making

    Accessing local information is no more enough, users need to transparently and uniformly access information scattered across several heterogeneous BI platforms

    � Pervasive and personalized access to informationUsers require that information can be easily and timely accessed through devices with different computation and visualization capabilities, and with sophisticated and customizable presentations

    8

    Envisioned architecture

    � Business Intelligence Network (BIN):a dynamic, collaborative network of peers, each hosting a local, autonomous BI platform1. Each peer relies on a local multidimensional schema that

    represents the peer's view of the business, and it offers monitoring and decision support functionalities to the other peers

    2. Users transparently access business information distributed over the network in a pervasive and personalized fashion

    3. Access is secure, depending on the access control and privacy policies adopted by each peer

    4. Participants are collaborative, even if with different grades5. Inclination to collaboration does not reduce autonomy of

    participants, who are not subject to a shared schema6. A BIN is decentralized and scalable because the number of

    participants, the complexity of business models, and the workload can change

  • 9

    Envisioned architecturep

    eer i

    queryforwarding

    queryreformulation

    query resultreconciliation

    local queryprocessing

    peer N

    peer 1

    BusinessIntelligence

    Network

    local BI platform

    access policiesresolution

    local MD schema

    mappings

    10

    Envisioned architecture

    peer i

    queryforwarding

    queryreformulation

    query resultreconciliation

    local queryprocessing

    peer N

    peer 1

    BusinessIntelligence

    Network

    local BI platform

    access policiesresolution

    local MD schema

    mappings

    interacts with the peer’s BI platform to obtain results from the local data

  • 11

    Envisioned architecturep

    eer i

    queryforwarding

    queryreformulation

    query resultreconciliation

    local queryprocessing

    peer N

    peer 1

    BusinessIntelligence

    Network

    local BI platform

    access policiesresolution

    local MD schema

    mappings

    uses the semantic mappings established towards the peer neighbors to reformulate queries accordingly

    12

    Envisioned architecture

    peer i

    queryforwarding

    queryreformulation

    query resultreconciliation

    local queryprocessing

    peer N

    peer 1

    BusinessIntelligence

    Network

    local BI platform

    access policiesresolution

    local MD schema

    mappings

    applies query routing policies to select the most relevant peers to forward a query to

  • 13

    Envisioned architecturep

    eer i

    queryforwarding

    queryreformulation

    query resultreconciliation

    local queryprocessing

    peer N

    peer 1

    BusinessIntelligence

    Network

    local BI platform

    access policiesresolution

    local MD schema

    mappings

    collects and integrates the results coming from the peers

    14

    Envisioned architecture

    peer i

    queryforwarding

    queryreformulation

    query resultreconciliation

    local queryprocessing

    peer N

    peer 1

    BusinessIntelligence

    Network

    local BI platform

    access policiesresolution

    local MD schema

    mappings

    sets policies for data sharing depending on the degree of trust between participants

  • 15

    A typical user interaction

    DW

    DB

    DW

    DB

    DW DB

    Milan

    Bologna

    Florence

    Naples

    Rome

    A user formulates an OLAP query q by accessing the local multidimensional schema of her peer

    16

    A typical user interaction

    DW

    DB

    DW

    DB

    DW DB

    Milan

    Bologna

    Florence

    Naples

    Rome

    She can annotate q by a preference that enables her to rank the returned information according to her specific interests

  • 17

    A typical user interaction

    DW

    DB

    DW

    DB

    DW DB

    Milan

    Bologna

    Florence

    Naples

    Rome

    To enhance the decision making process, q is forwarded to the network and reformulated on the other peers in terms of their own multidimensional schemata

    18

    A typical user interaction

    DW

    DB

    DW

    DB

    DW DB

    Milan

    Bologna

    Florence

    Naples

    Rome

    Each involved peer locally processes the reformulated query and returns its (possibly partial or approximate) results to the querying peer

  • 19

    A typical user interaction

    DW

    DB

    DW

    DB

    DW DB

    Milan

    Bologna

    Florence

    Naples

    Rome

    The results are integrated, ranked according to the preference expressed by the user, and returned to the user based on the lexicon used to formulate q

    20

    Research issues

    � Query reformulation on peers is a challenging task due to the presence of aggregation and to the possibility of having information represented at different granularities in each peer

    � To optimize query answering across the network, query routingstrategies that forward queries to the most promising peers only are needed

    � The strategic nature of the exchanged information and its multidimensional structure require advanced approaches for security

    � Mechanisms for controlling data provenance and quality in order to provide users with information they can rely on should be devised

    � A unified, integrated vision of the heterogeneous information collected must be returned to users through object fusiontechniques

  • 21

    Query reformulation

    � Mapping language:�Handling the asymmetry between dimensions and

    measures�Specifying the relationship between two attributes

    of different multidimensional schemata in terms of their granularity

    �Considering aggregation operators to avoid the risk of inconsistent query reformulations

    �Expressing also mappings at the instance level to transcode data

    (Golfarelli et al., 2010)

    22

    HOSPITALIZATION

    costdurationOfStay

    ward

    unit

    patientbirthDate

    gender

    segment

    city region

    diagnosiscategory

    ADMISSIONS

    totStayCosttotExamCost

    totLengthnumAdmissions

    ward

    LHD

    patientBirthYear

    patientGender

    patientCity patientNation

    datemonthyear

    disease

    @Rome

    @Florence

    datemonthyear

    week

    organ

    Query reformulation

    � Mapping language:

  • 23

    Query reformulation

    HOSPITALIZATION

    costdurationOfStay

    ward

    unit

    patientbirthDate

    gender

    segment

    city region

    category

    ADMISSIONS

    totStayCosttotExamCost

    totLengthnumAdmissions

    ward

    LHD

    patientBirthYear

    patientGender

    patientCity patientNation

    monthyear

    disease

    @Rome

    datemonthyear

    week

    organ

    @Florence

    date

    diagnosis

    same

    roll-up

    equi-level

    equi-level

    � Mapping language:mappings can be annotated with

    encoding functions

    drill-down

    24

    [email protected] [email protected]

    Semantic Mappings

    Mappingtranslation

    Schematranslation

    Schematranslation

    s-t tgds

    OLAP Query

    Relational Query

    Querytranslation

    Query reformulation

    � Framework:�To translate semantic mappings we use a logical

    formalism called source-to-target tuple generating dependencies (ten Cate & Kolaitis, 2010)

  • 25

    Example: Schema translation

    ward

    unit

    diagnosiscategory

    ADMISSIONS

    totStayCosttotExamCost

    totLengthnumAdmissions

    patientBirthYear

    patientGender

    patientCity patientNation

    datemonthyear

    @Florence

    HOSPITALIZATION

    costdurationOfStay

    patientbirthDate

    gender

    segment

    city region

    ward

    LHD

    disease

    @Rome

    datemonthyear

    week

    organ

    HospFT(organ,disease,date,ward,patient,cost,durationOfStay)OrganDT(organ)DiseaseDT(disease)DateDT(date,week,month,year)WardDT(ward,LHD)PatientDT(patient,birthDate,city,region,segment,gender)

    AdmFT(diagnosis,date,ward,patientCity,patientBirthYear,patientGender,totStayCost,totExamCost,totLength,numAdmissions)

    DiagnosisDT(diagnosis,category)DateDT(date,month,year)WardDT(ward,unit)PatientCityDT(patientCity,patientNation)PatientBirthYearDT(patientBirthYear)PatientGenderDT(patientGender)

    26

    Example: Query translation

    � Total hospitalization costs for region and yearπregion,year,SUM(cost) (HospFT DateDT PatientDT)

    q(R,Y,SUM(C)) ←HospFT(_,_,D,_,P,C,_),DateDT(D,_,_,Y),PatientDT(P,_,_,R,_,_))

    HOSPITALIZATION

    costdurationOfStay

    patientbirthDate

    gender

    segment

    city region

    ward

    LHD

    disease

    @Rome

    datemonthyear

    week

    organ

  • 27

    Example: Mapping translation

    ∀S,E,C (AdmFT(_,...,S,E,_,_), C=S+E→HospFT(_,...,C,_)

    HOSPITALIZATION

    costdurationOfStay

    ward

    unit

    patientbirthDate

    gender

    segment

    city region

    category

    ADMISSIONS

    totStayCosttotExamCost

    totLengthnumAdmissions

    ward

    LHD

    patientBirthYear

    patientGender

    patientCity patientNation

    monthyear

    disease

    @Rome

    datemonthyear

    week

    organ

    @Florence

    date

    diagnosis

    same

    28

    Example: The rewriting

    � The group-by is reformulated using the roll-upmapping from region to patientCity, while measure cost is derived using the samemapping

    πyear,patientCity,SUM(totStayCost+totExamCost) (AdmFT DateDT PatientCityDT)

    ward

    unit

    diagnosiscategory

    ADMISSIONS

    totStayCosttotExamCost

    totLengthnumAdmissions

    patientBirthYear

    patientGender

    patientCity patientNation

    datemonthyear

    @Florence

  • 29

    Personalization

    � The goal of personalization is to deliver information that is relevant to an individual or a group of individuals in the most appropriate format and layout� Recommendation: the system suggests new queries to

    support users in navigating the cube (Giacometti et al., 2009)

    � Personalized visualization: the user specifies constraints that are used to determine a preferred visualization according to a user profile (Bellatreche et al., 2005)

    � Ranking: query results are organized in a total or partial order so that the user visualizes only the “most relevant” tuples (Golfarelli et. al., 2011).

    � Contextualization: the query is enhanced by adding predicates that depend on the context (Jerbi et al. 2008)

    30

    Thank you for you attention

    Questions?

  • 31

    Related readings

    � Abiteboul, S. Managing an XML warehouse in a P2P context. In Proc. CAISE, 2003� Banek, M., Vrdoljak, V., Min Tjoa, A., & Skocir, Z. Automated integration of heterogeneous

    data warehouse schemata. IJDWM, 4(4), 2008� L. Bellatreche, A. Giacometti, P. Marcel, H. Mouloudi, D. Laurent. A personalization

    framework for OLAP queries. In Proc. DOLAP, 2005� J. Chomicki. Preference formulas in relational queries. ACM TODS, 28(4), 2003� Cui, Y., & Widom, J. Lineage Tracing for General Data Warehouse Transformations. JVLDB,

    12(1), 2003� da Silva, P.P., McGuinness, D.L., & McCool, R. Knowledge Provenance Infrastructure. IEEE

    Data Engineering Bulletin, 26(4), 2003� Dubois, D., & Prade, H. On the use of aggregation operations in information fusion processes.

    International Journal on Fuzzy Sets and Systems, 142(1), 2004� P. Georgiadis, I. Kapantaidakis, V. Christophides, E. M. Nguer, and N. Spyratos. Efficient

    rewriting algorithms for preference queries. In Proc. ICDE, 2008� A. Giacometti, P. Marcel, E. Negre. Recommending MDX Queries. In Proc. DaWaK, 2009� Golfarelli, M. Rizzi., S., & Biondi, P. myOLAP: An approach to express and evaluate OLAP

    preferences. To appear on IEEE TKDE, 2011� M. Golfarelli, F. Mandreoli, W. Penzo, S. Rizzi, E. Turricchia. Towards OLAP Query

    Reformulation in Peer-to-Peer Data Warehousing. In Proc. DOLAP, 2010� Halevy, A. Y., Ives, Z. G., Madhavan, J., Mork, P., Suciu, D., & Tatarinov, I. The Piazza Peer

    Data Management System. IEEE TKDE, 16(7), 2004� Hoang, T. A. D., & Binh Nguyen, T. State of the art and emerging rule-driven perspectives

    towards service-based business process interoperability. In Proc. Int. Conf. on Computing andCommunication Technologie, 2009

    32

    Related readings

    � H. Jerbi, F. Ravat, O. Teste, G Zurfluh. Management of context-aware preferences inmultidimensional databases. In Proc. ICDIM, 2008

    � P. Kalnis, W. Siong Ng, B. Chin Ooi, D. Papadias and K.-L.Tan. An adaptive peer-to-peer network for distributed caching of OLAP results. In Proc. SIGMOD Conference, 2002

    � Kehlenbeck, M., & Breitner, M. H. Ontology-based exchange and immediate application ofbusiness calculation definitions for online analytical processing. In Proc. DAWAK, 2009

    � W. Kießling. Foundations of preferences in database systems. In Proc. VLDB, 2002� Mandreoli, F., Martoglia, R., Penzo, W., & Sassatelli S. SRI: exploiting semantic information

    for effective query routing in a PDMS. In Proc. ACM Int. Workshop on Web Information andData Management, 2006

    � Mecca, G., Papotti, P., & Raunich, S. Core Schema Mappings. In Proc. ACM SIGMOD Int.Conf. on Management of Data, 2009

    � K. Stefanidis, E. Pitoura, P. Vassiliadis. Modeling and Storing Context-Aware Preferences. In Proc. ADBIS, 2006

    � Sung, S., Liu, Y., Xiong, H., & Ng, P. Privacy preservation for data cubes. Knowledge andInformation Systems, 9(1), 2006

    � Tatarinov, I. & Halevy, A.Y. Efficient Query Reformulation in Peer-Data ManagementSystems. In Proc. ACM SIGMOD Int. Conf. on Management of Data, 2004

    � B. ten Cate and P. G. Kolaitis. Structural characterizations of schema-mapping languages.Comm. ACM, 53(1), 2010

    � Torlone, R. Two approaches to the integration of heterogeneous data warehouses. Int. Journ.on Distributed and Parallel Databases, 23(1), 2008

    � D. Xin, J. Han, H. Cheng and X. Li. Answering Top-k Queries with Multidimensional Selections: The Ranking Cube Approach. In Proc. VLDB, 2006