Transformation Lifecycle Management with Nautilus · 2015-07-30 · Transformation Lifecycle...
Transcript of Transformation Lifecycle Management with Nautilus · 2015-07-30 · Transformation Lifecycle...
Transformation Lifecycle Management with Nautilus
Melanie Herschel, Torsten Grust, and Tim BelhommeUniversity of TübingenGermany
Workshop on Quality in Databases 2011collocated with VLDB
1
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
What is Nautilus?
“The deepest parts of the ocean are totally unknown to us[...] What goes on in those distant depths? What creatures inhabit, or could inhabit, those regions twelve or fifteen miles beneath the surface of the water? It's almost beyond conjecture” Jules Verne, 20.000 Leagues under the Sea, Chapter 2.
2
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
What is Nautilus?
“The deepest parts of the ocean are totally unknown to us[...] What goes on in those distant depths? What creatures inhabit, or could inhabit, those regions twelve or fifteen miles beneath the surface of the water? It's almost beyond conjecture” Jules Verne, 20.000 Leagues under the Sea, Chapter 2.
2
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
What is Nautilus?
“The deepest parts of the ocean are totally unknown to us[...] What goes on in those distant depths? What creatures inhabit, or could inhabit, those regions twelve or fifteen miles beneath the surface of the water? It's almost beyond conjecture” Jules Verne, 20.000 Leagues under the Sea, Chapter 2.
2
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
What is Nautilus?
What happens within transformation?What data?
How is data combined?
2
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Developing data transformations - state of the art
• Manual trial-and-error process
• No systematic tool exists that supports the complete Analyze-Fix-Test (AFT) cycle.
• New requirements lead to further cycles.
3
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Why Transformation Lifecycle Management?
• Tool-supported help for developing and evolving transformations.
• Management, sharing, or documentation of a transformation throughout its entire lifecycle.
• Faster development or reaction to requirement changes.
• Easier transformation development for non experts.
4
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Agenda
Ongoing workArchitectureWorkflow
5
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Agenda
Ongoing workArchitectureWorkflow
5
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
The Complete Workflow
6
SQL developer
Analyze
Fix
Test
Nautilus
(1) debugging scenario(2) explanations
(3) explanation annotations(4) modification request(5) query modifications
(6) modification annotations(7) modification decision(8) modification impact
time
(9) impact annotation
1 debugging scenario2 explanations3 explanation annotations4 query modification request5 query modifications6 modification annotations7 modification decision8 modification impact9 impact annotation
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
....
Sample Workflow
7
Analyze
Fix
Test
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
....
Sample Workflow
Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
DB
7
Analyze
Fix
Test
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
....
Sample Workflow
Stuttgart ?Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
DB
7
Analyze
Fix
Test
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
....
Sample Workflow
Stuttgart ?Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
DB
7
Analyze
Fix
Test
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
....
Sample Workflow
Stuttgart ?
σinhabitants >= 1,000,000
Stuttgart ∈ DB,#inhabitants < 1 Mio
Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
DB
7
Analyze
Fix
Test
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
....
Sample Workflow
Stuttgart ?
σinhabitants >= 1,000,000
Stuttgart ∈ DB,#inhabitants < 1 Mio
Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
DB
7
JOIN -> LEFT OUTER JOIN
Stuttgart ∈ DB
Analyze
Fix
Test
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Sample Workflow
Stuttgart ?
σinhabitants >= 1,000,000
Stuttgart ∈ DB,#inhabitants < 1 Mio
Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
DB
Analyze
Fix
Test
8
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Sample Workflow
Stuttgart ?
σinhabitants >= 1,000,000
Stuttgart ∈ DB,#inhabitants < 1 Mio
Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
DBAnalyze
Fix
Test
8
(2) Explanations
• Explain existing data
• Explain missing data
(1) Debugging Scenario
• Queries to be analyzed
and source data
• Description of the result
• constraints
(3) Explanation Annotation
• Guidance and input
for fixing phase
• suggestion from the user
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Sample Workflow
Analyze
Fix
Test
9
Stuttgart ?
σinhabitants >= 1,000,000
Stuttgart ∈ DB,#inhabitants < 1 Mio
Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
DB
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Sample Workflow
Analyze
Fix
Test
9
Stuttgart ?
σinhabitants >= 1,000,000
Stuttgart ∈ DB,#inhabitants < 1 Mio
Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
DB
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Sample Workflow
Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
Stuttgart ?
σinhabitants >= 1,000,000
Stuttgart ∈ DB,#inhabitants < 1 Mio
DB
Analyze
Fix
Test
10
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Sample Workflow
Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
Stuttgart ?
Stuttgart ∈ DB,#inhabitants < 1 Mio
DB
Analyze
Fix
Test
10
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Sample Workflow
Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
Stuttgart ?
Stuttgart ∈ DB,#inhabitants < 1 Mio
DB
Q2’
Analyze
Fix
Test
10
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Sample Workflow
Analyze
Fix
Test
11
Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
Stuttgart ?
Stuttgart ∈ DB,#inhabitants < 1 Mio
DB
Q2’
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Sample Workflow
Analyze
Fix
Test
11
query modificationrequest
query modificationannotation
• request to generatemodifications
query modifications
• computing modifications• w. r. t. previous annotations
and constraints
• rewritten SQL withhighlighted changes
• analogous toexplanation
annotation
Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
Stuttgart ?
Stuttgart ∈ DB,#inhabitants < 1 Mio
DB
Q2’
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Analyze
Fix
Test
Sample Workflow
12
Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
Stuttgart ?
Stuttgart ∈ DB,#inhabitants < 1 Mio
DB
Q2’
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Analyze
Fix
Test
Sample Workflow
12
Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
Stuttgart ?
Stuttgart ∈ DB,#inhabitants < 1 Mio
DB
Q2’
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Analyze
Fix
Test
Sample Workflow
Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
DB
Q2’
13
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Analyze
Fix
Test
Sample Workflow
Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
DB
Q2’
13
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Analyze
Fix
Test
Sample Workflow
Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
DB
Q2’
Stuttgart
Frankfurt
...
13
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Analyze
Fix
Test
Sample Workflow
Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
DB
Q2’
Stuttgart
Frankfurt
...
70567
...
13
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Analyze
Fix
Test
Sample Workflow
Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
DB
Q2’
Stuttgart
Frankfurt
...
70567
...
Side-effects in Q3, Q4caused by Q2 → Q2’
13
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Analyze
Fix
Test
Sample Workflow
Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
DB
Q2’
Stuttgart
Frankfurt
...
70567
...
Side-effects in Q3, Q4caused by Q2 → Q2’
14
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Analyze
Fix
Test
Sample Workflow
Q1 Q3
Q2 Q4
City
Berlin
Hamburg
München
ZIP
10179
20095
80331
DB
Q2’
Stuttgart
Frankfurt
...
70567
...
Side-effects in Q3, Q4caused by Q2 → Q2’
14
impact annotation
• acceptable changes• effect on further
AFT-cycles and debugging scenario
modification decision
• submit modification(s)to the query based on
the annotations
modification impact
• test the query• calculating the impact
(verifying constraints and
report statistics)
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Agenda
Ongoing workArchitectureWorkflow
15
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Agenda
Ongoing workArchitectureWorkflow
15
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Nautilus Architecture
GUI
DBMetadatarepository
Eclipse Views & Editors
Explanation manager
Development cycle manager
Query modification manager
Explanation generator
Explanation annotator
Explanation annotation analyzer
Modification generator
Modification annotator
Modification annotation analyzer
AFT-inference engine
Modification impact analyzer
Explanation ranker Modification ranker
Modificationimpact annotator
Debugging scenario manager
16
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Nautilus Architecture
GUI
DBMetadatarepository
Eclipse Views & Editors
Explanation manager
Development cycle manager
Query modification manager
Explanation generator
Explanation annotator
Explanation annotation analyzer
Modification generator
Modification annotator
Modification annotation analyzer
AFT-inference engine
Modification impact analyzer
Explanation ranker Modification ranker
Modificationimpact annotator
Debugging scenario manager
16
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Nautilus Architecture
GUI
DBMetadatarepository
Eclipse Views & Editors
Explanation manager
Development cycle manager
Query modification manager
Explanation generator
Explanation annotator
Explanation annotation analyzer
Modification generator
Modification annotator
Modification annotation analyzer
AFT-inference engine
Modification impact analyzer
Explanation ranker Modification ranker
Modificationimpact annotator
Debugging scenario manager
16
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Nautilus Architecture
GUI
DBMetadatarepository
Eclipse Views & Editors
Explanation manager
Development cycle manager
Query modification manager
Explanation generator
Explanation annotator
Explanation annotation analyzer
Modification generator
Modification annotator
Modification annotation analyzer
AFT-inference engine
Modification impact analyzer
Explanation ranker Modification ranker
Modificationimpact annotator
Debugging scenario manager
16
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Agenda
Ongoing workArchitectureWorkflow
17
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Agenda
Ongoing workArchitectureWorkflow
17
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Conseil - Hybrid explanations
Goal: Ideally pointing out both, the problem of missing source data and the problem of problematic query operators of a non-monotonous query.
Idea:
• Conseil follows a positive example to generate explanations.
• A positive example is an existing tuple with a high similarity to the missing answer.
• Annotate the canonical logical tree of the positive example and the missing tuple with passing properties.
18
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Conseil - Idea (continuation)
Idea (continuing):
• Transform the tree to a passing tree, by changing blocking nodes to passing nodes.
• The choice of pointing out the problematic operator or the problematic data tuple(s) is based on a cost model.
• While generating explanations dependencies to previous taken decisions can lead to non optimal global costs (Branch&Bound).
19
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Summary & Outlook
Transformation Lifecycle Management with Nautilus
• Semi-automatic tool to support the three phases (AFT) of the currently manual development process
• For various components of the architecture, solutions how to implement the desired functionality were presented.
• A possible benchmark is proposed to evaluate different steps of the AFT cycle.
20
Outlook
• Evaluate algorithms relevant to different steps with the benchmark.
• Boost the debugging scenario expressiveness and the explanation scope.
• Involve developers through user studies to measure usability.
• Build a real system and evaluate it.
August 29, 2011 | QDB 2011 | Tim Belhomme | University of Tübingen
Summary & Outlook
Transformation Lifecycle Management with Nautilus
• Semi-automatic tool to support the three phases (AFT) of the currently manual development process
• For various components of the architecture, solutions how to implement the desired functionality were presented.
• A possible benchmark is proposed to evaluate different steps of the AFT cycle.
20
Outlook
• Evaluate algorithms relevant to different steps with the benchmark.
• Boost the debugging scenario expressiveness and the explanation scope.
• Involve developers through user studies to measure usability.
• Build a real system and evaluate it.
Nautilushttp://nautilus-system.org
Thank you for your attention.