On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects
-
Upload
dgarijo -
Category
Technology
-
view
153 -
download
1
description
Transcript of On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects
![Page 1: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/1.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
On Specifying and Sharing Scientific WorkflowOptimization Results Using Research Objects
17. November 2013 | Sonja Holl*, Daniel Garijo+, Khalid Belhajjame$, Olav Zimmermann*, Renato De Giovanni#, Matthias Obst~, Carole Goble$
*Jülich Supercomputing Centre (JSC),Forschungszentrum Juelich, Germany+Ontology Engineering Group, Facultad de Informática Universidad Politécnica de Madrid, Spain$School of Computer Science University of Manchester, UK#Reference Center on Environmental Information Campinas SP, Brazil~Department of Biological and Environmental Sciences University of Gothenburg, Sweden
8th Workshop On Workflows in Support of Large-Scale Science
![Page 2: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/2.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
• Popular choice to design, manage, and execute in silico experiments
• Sharing and reuse via workflow repositories
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 2
Scientific Workflows
![Page 3: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/3.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 3
Ecological Niche Modeling
23
4 5
Perform species adaptation to environmental changes (BioVeL Project)
1
![Page 4: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/4.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 4
Ecological Niche Modeling Workflow
createModel
testModel
calcAUC
Environmental Layer
Occurrence Data
Geographic Mask
AUC
Parameter
![Page 5: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/5.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 5
Designing workflow (from scratch)
Reusing workflow
Execution
Sharing & Analysis
in silico experiment
REFINE
Planning
![Page 6: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/6.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 6
Ecological Niche Modeling Workflow
createModel
testModel
calcAUC
Environmental Layer
Occurrence Data
Geographic MaskGamma
AUC
Cost NumberOfPseudoAbsences
SVMMaxentGARP
![Page 7: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/7.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 7
Ecological Niche Modeling Workflow
createModel
testModel
calcAUC
Environmental Layer
Occurrence Data
Geographic MaskGamma
AUC
Cost NumberOfPseudoAbsences
SVMMaxentGARP
1
12
100
‐3.2
a
gaussian
1.5
‐bt
0
6.7
/
2.3
84
‐2.91.3
1.94251
10
‐3
11
13
1
4.55
0.56.788
Select Algorithms
Select Parameters
BLAST
![Page 8: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/8.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 8
Common strategies to handle this challenge
• Default parameters & applications• Trial and error• Parameter sweeps
But: • Increasing complexity of scientific workflows• Raising number parameters• Work time & compute intensive
![Page 9: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/9.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 9
Designing workflow (from scratch)
Reusing workflow
Execution
Sharing & Analysis
in silico experimentREFINE
Optimization
Planning
![Page 10: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/10.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Goal:• Automated way to find workflow settings that optimizes
the output
• Define workflow output(s) as fitness value• Use fitness value for evaluation (e.g. AUC or correlation
coefficient)• Use heuristic search algorithm to find best
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 10
Intelligent automated optimization techniques
![Page 11: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/11.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 11
How does it work?
Taverna Optimization Layer
WMS Framework PluginsParameter OptimizationA
PI
Component Optimization
• Development of optimization framework that extends Taverna workflow management system
• Abstracts optimization process (e.g. parallel execution, security)
• Developer API allows rapid adaption of new optimization methods
• Optimization plugins can be added independently
![Page 12: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/12.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 12
Display the optimization
result
(1) Define sub-workflow (2) Specify input parameters (constraints)(3) Select fitness output parameters (e.g. AUC)(4) Define optimization method parameters (population size, termination criteria)
Taverna Optimization Framework & Plugin
.
.
.
Best Fitness: 0.34
Best Fitness: 0.42
Best Fitness: 0.48
Best Fitness: 0.49
1
2
x
Genetic Algorithm Parameter Optimization Plugin
![Page 13: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/13.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
• Workflow optimization starts from scratch each time• Optimization meta-data are lost
Idea: Capture optimization meta-data next to traditionalprovenance data
⇒ learn from/extend prior optimization runs⇒ improve and accelerate optimization process
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 13
Status quo
![Page 14: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/14.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
• Aligned with W3C standards• Aggregates various resources • Describes scientific processes in machine readable
format • Specified by several ontologies
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 14
Research Objects
ore:aggregates
…
![Page 15: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/15.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 15
Display the optimization
result
(1) Define sub-workflow (2) Specify input parameters (constraints)(3) Select fitness output parameters (e.g. AUC)(4) Define optimization parameters (population size, termination criteria)
Taverna Optimization Framework & Plugin
.
.
.
Best Fitness: 0.34Best Fitness: 0.42Best Fitness: 0.48
Best Fitness: 0.49
1
2
x
Genetic Algorithm Parameter Optimization Plugin
![Page 16: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/16.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 16
ro:ResearchObject
opt:OptimizationResearchObject
Describes the dependencies and parameter constraints
Describes the dependencies and parameter constraints
opt:SearchSpace
Describes the optimization algorithm and its parameters
Describes the optimization algorithm and its parameters
opt:Algorithm
Describes the fitness
functions
Describes the fitness
functions
opt:Fitness
The workflow that was optimized
The workflow that was optimized
opt:Workflow
Defines the population size and generation number for an Optimization
Run
Defines the population size and generation number for an Optimization
Run
opt:Generation
Describes the termination condition
defined by the user
Describes the termination condition
defined by the user
opt:TerminationCondition
Represents one result set: sub‐
workflow, parameters and obtained fitness
values
Represents one result set: sub‐
workflow, parameters and obtained fitness
values
opt:OptimizationRun
rdfs:subClassOf rdf:Property
ore:aggregates
Optimization Research Object Ontology
![Page 17: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/17.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 17
Algorithm
• Genetic Algorihm• Mutation rate: 0.1• Crossover rate 0.7
![Page 18: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/18.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 18
Search Space
Gamma:• Double• 0 - 10
• Cost/2 < Gamma (fictional)
![Page 19: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/19.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 19
Optimization Run
• Origin of result• Parameter setting• Fitness value
![Page 20: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/20.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Genetic Algorithm Parameter Optimization Plugin
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 20
Display the optimization
result
Taverna Optimization Framework & Plugin
.
.
.
Best Fitness: 0.34
Best Fitness: 0.42
Best Fitness: 0.48
Best Fitness: 0.49
1
2
x
Fitness: 0.05Fitness: 0.05
(1) Define sub-workflow (2) Specify input parameters (constraints)(3) Select fitness output parameters (e.g. AUC)(4) Define optimization parameters (population size, termination criteria)
Generation 1 Iteration 1
![Page 21: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/21.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Genetic Algorithm Parameter Optimization Plugin
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 21
Display the optimization
result
Taverna Optimization Framework & Plugin
.
.
.
Best Fitness: 0.34
Best Fitness: 0.42
Best Fitness: 0.48
Best Fitness: 0.49
1
2
x
Fitness: 0.05Fitness: 0.05
(1) Define sub-workflow (2) Specify input parameters (constraints)(3) Select fitness output parameters (e.g. AUC)(4) Define optimization parameters (population size, termination criteria)
Generation 1 Iteration 1
Fitness: 0.22
Generation 1 Iteration 2
Fitness: 0.27Generation 1 Iteration 3
Fitness: 0.19
Generation 1 Iteration 4
Fitness: 0.31Generation 1 Iteration 5
Fitness: 0.34Generation 1 Iteration 6
![Page 22: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/22.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Genetic Algorithm Parameter Optimization Plugin
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 22
Display the optimization
result
Taverna Optimization Framework & Plugin
.
.
.
Best Fitness: 0.34
Best Fitness: 0.42
Best Fitness: 0.48
Best Fitness: 0.49
1
2
x
Fitness: 0.05Fitness: 0.05
(1) Define sub-workflow (2) Specify input parameters (constraints)(3) Select fitness output parameters (e.g. AUC)(4) Define optimization parameters (population size, termination criteria)
Generation 1 Iteration 1
Fitness: 0.22
Generation 1 Iteration 2
Fitness: 0.27Generation 1 Iteration 3
Fitness: 0.19
Generation 1 Iteration 4
Fitness: 0.31Generation 1 Iteration 5
Fitness: 0.34Generation 1 Iteration 6
Fitness: 0.05
Fitness: 0.22
Fitness: 0.34
Fitness: 0.19
Fitness: 0.31
Fitness: 0.33
Generation 2 Iteration 1
Generation 2 Iteration 2
Generation 2 Iteration 3
Generation 2 Iteration 4
Generation 2 Iteration 5
Generation 2 Iteration 6
Fitness: 0.05
Fitness: 0.22
Fitness: 0.34
Fitness: 0.19
Fitness: 0.31
Fitness: 0.46
Generation 3 Iteration 1
Generation 3 Iteration 2
Generation 3 Iteration 3
Generation 3 Iteration 4
Generation 3 Iteration 5
Generation 3 Iteration 6
![Page 23: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/23.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 23
ExampleResult
Name Value
Gamma 2.36
Cost 8
NumberOfPseudoAbsences
363
Fitness 0.9207
![Page 24: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/24.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
• What is the optimal setting? - Reuse optimized settings• What ranges have been explored? - Adopt used parameter
ranges• What algorithm settings were used? - Reuse algorithm
settings• Are there similar optimizations? - Reuse existing results• Resume the optimization
• Embed optimization provenance into workflow infrastructures to be reused by other scientists
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 24
Benefits of sharing and exploiting Optimization Research Objects
![Page 25: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/25.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
• Scientific workflows are hard to configure• Optimization can help but meta-data get lost• Extend Research Objects• Build new Optimization Research Object Ontology• Reuse of optimization meta-data to speed up
optimization• Shareable with the community in workflow infrastructures
• Outlook: How to learn from similar workflows?
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 25
Conclusion
![Page 26: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/26.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
http://purl.org/net/ro-optimizationhttp://purl.org/net/svm-opt-research-object
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 26
Links
![Page 27: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects](https://reader034.fdocuments.in/reader034/viewer/2022052410/554f4bb3b4c905524c8b49db/html5/thumbnails/27.jpg)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Questions?Thank you!