G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, T. Sellis 1,4, Y. Vassiliou 1 (1) National...

21
G. Papastefanatos 1 , P. Vassiliadis 2 , A. Simitsis 3 , T. Sellis 1,4 , Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas, yv}@dblab.ece.ntua.gr (2) University of Ioannina, Ioannina, Hellas (Greece) [email protected] (3) HP Labs, Palo Alto, California, USA [email protected] (4) Institute for the Management of Information Systems (Greece) Rule-based Management of Schema Changes at ETL sources
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    3

Transcript of G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, T. Sellis 1,4, Y. Vassiliou 1 (1) National...

G. Papastefanatos1, P. Vassiliadis2, A. Simitsis3, T. Sellis1,4, Y. Vassiliou1

(1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas, yv}@dblab.ece.ntua.gr

(2) University of Ioannina, Ioannina, Hellas (Greece) [email protected]

(3) HP Labs, Palo Alto, California, USA [email protected]

(4) Institute for the Management of Information Systems (Greece) [email protected]

Rule-based Management of Schema Changes at ETL sources

MEDWa ‘09, Riga, September 2009 2

Outline

• Motivation

• Graph-based representation of ETL processes

• Regulating ETL Evolution

• Hecataeus Internals

• Conclusions

MEDWa ‘09, Riga, September 2009 3

Outline

• Motivation

• Graph-based representation of ETL processes

• Regulating ETL Evolution

• Hecataeus Internals

• Conclusions

Data Warehouse Environment

MEDWa ‘09, Riga, September 2009 4

Data Warehouse Schema Evolution

MEDWa ‘09, Riga, September 2009 5

Data warehouses are evolving environments, e.g.:

A dimension is removed or renamed

The structure of a dimension table is updated

A fact table is completely decoupled from a

dimension

The measures of a fact table change

An ETL source is modified, etc

Evolving ETL sources…

• Schema Changes on the sources of ETL processes. Design constructs are– Added, Removed, Modified

• ETL processes affected:– SyntacticallySyntactically – i.e., become invalid– SemanticallySemantically – i.e., must conform to the new source

database semantics

• Adaptation of ETL flows– time-consuming task, – treated in most of the cases manually by the

administrators/developers

MEDWa ‘09, Riga, September 2009 6

We would like to know...

• What part of the process is affected and how if e.g., an attribute is deleted?

• Can we predict and handle the impact of changes?

• To what extent can readjustment be automated?

MEDWa ‘09, Riga, September 2009 7

Hecataeus Framework

MEDWa ‘09, Riga, September 2009 8

Mechanism for performing what-if analysis for potential changes of ETL sources

Graph based representation of ETL workflows

Annotation of graph with rules for adapting ETL processes to source schema evolution

Evolution events are mapped to changes on the graph constructs

MEDWa ‘09, Riga, September 2009 9

Outline

• Motivation

• Graph-based representation of ETL processes

• Regulating ETL Evolution

• Hecataeus Internals

• Conclusions

ETL Workflow representation

MEDWa ‘09, Riga, September 2009 10

Query representation

MEDWa ‘09, Riga, September 2009 11

Q: SELECT EMP.Emp#, Sum(WORKS.Hours) as T_Hours

FROM EMP, WORKS

WHERE EMP.Emp# = WORKS.Emp#

GROUP BY EMP.Emp#

JoinJoin, GB, GB

MEDWa ‘09, Riga, September 2009 12

Outline

• Motivation

• Graph-based representation of ETL processes

• Regulating ETL Evolution

• Hecataeus Internals

• Conclusions

Graph Annotation with rules

According to prevailing policy, the proper action is taken graph evolution

MEDWa ‘09, Riga, September 2009 13

We annotate

For reacting toW

ith rule

Set of graph elements· Query Node: Q1· Attribute Node: EMP.E_TITLE· View Node: Emps_Prjs, etc.

Set of rules· Propagate· Block· Prompt

Set of evolution events· Add Attribute· Delete Attribute· Rename View, etc.

1

3

2

Example

MEDWa ‘09, Riga, September 2009 14

Emp#

Name

Emp# Name

EMP

QS

S

map-select

map-select

S S

from PolicyOn attribute addition To EMP

then propagate

Emp#

Name

Emp# PhoneName

EMP

QS

S

map-select

map-select

S S S

from PolicyOn attribute addition To EMP

then propagate

Phone

S

map-select

...

...Status: Add_Child

Status: Add_Child

Q: SELECT EMP.Emp#, EMP.Name

FROM EMP

Q: SELECT EMP.Emp#, EMP.Name, Phone

FROM EMPEvent

Add attribute Phone to relation EMP

MEDWa ‘09, Riga, September 2009 15

Outline

• Motivation

• Graph-based representation of ETL processes

• Regulating ETL Evolution

• Hecataeus Internals

• Conclusions

System architecture

MEDWa ‘09, Riga, September 2009 16

DDL filesSQL scripts

DB Catalog

Parser

Create DB

Schema

Evolution Manager

Workload representation

Evolution Semantics

ValidateWorkload

Graph Viewer

DB Schema representation

XML, jpegImport/

Export ScenariosGraph Visualization

MetricManager

Evolution Manager Architecture

MEDWa ‘09, Riga, September 2009 17

MEDWa ‘09, Riga, September 2009 18

Outline

• Motivation

• Graph-based representation of ETL processes

• Regulating ETL Evolution

• Hecataeus Internals

• Conclusions

Research in DB Evolution

• DB Schema Evolution– OODB evolution– Schema versioning

• DW Schema Evolution– Taxonomy of evolution events– Versioning– Materialized Views Evolution– View adaptation & synchronization

• Evolution wrt Model Mappings

MEDWa ‘09, Riga, September 2009 19

Summarizing

• The problem of adaptation of ETL workflows to evolvable data sources

• Graph –based representation of ETL activities• Graph enrichment with semantics for evolution

events• Graph annotation with rules for handling a priori

evolution events• Hecataeus: Framework for performing and

evaluating evolution scenarios in DW environments

MEDWa ‘09, Riga, September 2009 20

Thank you ...

MEDWa ‘09, Riga, September 2009 21

http://www.cs.uoi.gr/~pvassil/projects/hecataeus/

Hecataeus: A tool for visualizing and performing what-if analysis for evolution scenarios