1
Comparison and Versioning of Scientific
Workflow Eduardo Ogasawara1 Pablo Rangel1 Leonardo Murta2 Cláudia Werner1 Marta Mattoso1
1Federal University of Rio de Janeiro 2Fluminense Federal University
2
Summary
l Scientific Workflows l Versioning of Scientific Workflows l Diff/Merge of Scientific Workflows l Conclusion
3
Scientific Workflows and
Scientific Workflows Management Systems 1
4
1. In vitro experiment
2. Data analyzed by program X
3. Large Volume of Data Produced ...
4. ...which need to be processed by program Y in a cluster
5. Results are analyzed by program Z
Experiment Scenario Laboratory
In silico experiments assisted by scientific workflows
5
In Silico Experiment Process
6
Sharing and collaborating scientific workflows
7
Comparison with software development Programmer’s IDE E-Scientist’s IDE
Version control system with a repository that includes diff/merge facilities for collaborative software development
Absence of repository offering adequate version control and diff/
merge infra-structure
8
Goals l Define a version model for scientific
workflows l Define a diff/merge strategy for
scientific workflows
9
Versioning of Scientific Workflows 2
10
Versioning of scientific workflows
l Software process can be compared to software (Fusaro et. al., 1998) → workflows can be compared to software
l CM for workflows demands: l Repository with access control to register
workflows and separate stable from under development versions
l Mechanism to represent and store versions l Presence of workspace concept to support
the modeling and maintenance of workflows
11
Version model Version space
Product space
Objects to be versioned and version identification (Conradi, 1998)
12
Product space
l Coarse-grained units l Structural information of the workflow. The
graph decomposition of the workflow l Inherit “VersionedElement” l Workflow, Activity, Relationship, Ports
l Fine-grained units l Internal information of each class
13
Version space
l Each ConfigurationItem is composed by Version
l Each VersionedElement has a version identifier
l Version have next* and previous version l A version may have branches and may be
merged
14
A
Workflow representation in version space
1
C A→C
1 1 1
OR node
AND node
OR edge
AND edge sample selections
A Version 1
C
Interplay between version and product space
15
A
Workflow representation in version space
1
B C A→C C→B
1 1 1
OR node
AND node
OR edge
AND edge sample selections
2 2 2 2
2
A Version 1
C
A
C B
Version 2 Workflow structure
Workflow evolution
16
Diff / Merge of Scientific Workflows 3
17
collaborative scenario
A
E
F
A
B E
J
D
H
I
B
C
E User 2 workspace
Baseline (v1)
User 1 workspace
A
X
G
K
D B
C
X
D
V
T
check-out check-out
18
User 1 makes check-in
User 1 tries to check-in first. No problem.
A
E
F
A
B E
J
D
H
I
B
C
E User 2 workspace
Baseline (v1)
Current version (v2) A
X
G K
D
B
C
X
D
V T
check-out
19
User 2 tries to check-in
A
E
F
A
B E
J
D
H
I
B
C
E User 2 workspace
Baseline (v1)
Current version (v2) A
X
G K
D
B
C
X
D
V T
check-out
When User 2 tries to check-in, a diff/merge needs to be executed.
20
diff / merge l Configuration management tools usually
supports: l 2-way merge l 3-way merge
line 1 line 2 line 3
line 1’ line 2
<line 1> ou <line 1’>? line 2 <line 3> ou nothing?
line 1 line 2 line 3
line 1’ line 2
line 1’ line 2 line 3
line 1 line 2
21
2 way merge
A
B E
D
H
I
F
G
J
K
C
X
V
T
All conflicts need to be solved by the user.
A
E
F
A
B E
J
D
H
I
User 2 workspace Current version (v2)
G K
B
C
X
D
V T Conflicts to be solved during a check-in
22
3 way merge
A
B E
D
H
I
F
G
J
K
V
T
Conflicts to be solved during a check-in
A
E
F
A
B E
J
D
H
I
B
C
E User 2 workspace
Baseline (v1)
Current version (v2)
A
X
G K
D
B
C
X
D
V T
Lesser conflicts need to be solved by the user. The conflict involving activities
C and X is solved.
23
3-way sub graph diff/merge
l Workflows have a dual behavior of being a model and executable code at the same time
l Goal is to support a syntax merge, which means that a candidate conflict is not just a coarse grain unit, but a sub graph from the initial coarse grain conflict unit
24
3-way sub graph diff/merge
A
B E
D
H
I
F
G
J
K
V
T
step 3 A
B E
D
H
I
F
G
J
K
V
T
Exploring from E: No simplification
step 1
A
B E
D
H
I
F
G
J
K
V
T
step 2
Exploring from B: Simplification by
syntax merge Re-exploring from E:
decrease in conflict size
25
3-way sub graph diff/merge - two final conflicts
A
B E
D
H
I
F
G
J
K
V
T
Conflict one
Exploring from E In forward direction
A
B E
D
H
I
F
G
J
K
V
T
Conflict two
Exploring from E in backward direction
26
Conclusions 4
27
Conclusions
l Contributions l Version Model for scientific workflows l Syntax diff/merge for scientific workflows
l Prototype under development: l Evaluate presented concepts l Developed on top of Java Workflow Editor
(http://www.enhydra.org) using Postgresql DBMS
28
Eduardo Ogasawara [email protected]
Visit our Web site http://gexp.nacad.ufrj.br
Comparison and Versioning of Scientific Workflows
Thank you!
Top Related