Using Provenance to Improve Workflow Design Frederico Tosta Leonardo Murta Claudia Werner Marta...

16
Using Provenance to Improve Workflow Design Frederico Tosta Leonardo Murta Claudia Werner Marta Mattoso {ftoliveira, murta, werner, marta}@cos.ufrj.br COPPE – Federal University of Rio de Janeiro - Brazil UFRJ

Transcript of Using Provenance to Improve Workflow Design Frederico Tosta Leonardo Murta Claudia Werner Marta...

Page 1: Using Provenance to Improve Workflow Design Frederico Tosta Leonardo Murta Claudia Werner Marta Mattoso {ftoliveira, murta, werner, marta}@cos.ufrj.br.

Using Provenance to Improve Workflow Design

Frederico TostaLeonardo MurtaClaudia WernerMarta Mattoso

{ftoliveira, murta, werner, marta}@cos.ufrj.br

COPPE – Federal University of Rio de Janeiro - Brazil

UFRJ

Page 2: Using Provenance to Improve Workflow Design Frederico Tosta Leonardo Murta Claudia Werner Marta Mattoso {ftoliveira, murta, werner, marta}@cos.ufrj.br.

2

Summary

•Motivation

• Introduction & Background

•Goal

•Approach & Implementation

•Conclusion

COPPE/UFRJ

Page 3: Using Provenance to Improve Workflow Design Frederico Tosta Leonardo Murta Claudia Werner Marta Mattoso {ftoliveira, murta, werner, marta}@cos.ufrj.br.

3

Motivation

Pieces of workflows that occurred in the past may occur again in the future.

COPPE/UFRJ

Page 4: Using Provenance to Improve Workflow Design Frederico Tosta Leonardo Murta Claudia Werner Marta Mattoso {ftoliveira, murta, werner, marta}@cos.ufrj.br.

4

Motivation

• The number of services and bioinformatics operations are growing: Taverna has over 3500 (2007). VisTrails has over 1200 Modules (2008).

WorkflowServicesWorkflow

ServicesWorkflowServicesWorkflows and

WF Services

COPPE/UFRJ

Page 5: Using Provenance to Improve Workflow Design Frederico Tosta Leonardo Murta Claudia Werner Marta Mattoso {ftoliveira, murta, werner, marta}@cos.ufrj.br.

5

Motivation

How can we find the pieces or services that are useful during the design of a new workflow in an automatic and systematic way?

COPPE/UFRJ

Page 6: Using Provenance to Improve Workflow Design Frederico Tosta Leonardo Murta Claudia Werner Marta Mattoso {ftoliveira, murta, werner, marta}@cos.ufrj.br.

6

Software Reuse

• Is the process of creating software systems from existing software [Krueger, 1992].

Quality

Reliability Reduced Cost

Productivity

SoftwareReuse

COPPE/UFRJ

Page 7: Using Provenance to Improve Workflow Design Frederico Tosta Leonardo Murta Claudia Werner Marta Mattoso {ftoliveira, murta, werner, marta}@cos.ufrj.br.

7

Recommendation Systems

• E-Commerce: Apply data mining techniques to the problem of

helping user finding the items they would like to purchase.

Domain Concepts

E-commerce Customer Product* Cart Preference

Scientific Experiment

Scientist Component / Actor

Workflow(Goble, 2007)

Context

E-commerce concepts mapped into scientific experiment concepts

* what is recommended by e-commerce sites

COPPE/UFRJ

Page 8: Using Provenance to Improve Workflow Design Frederico Tosta Leonardo Murta Claudia Werner Marta Mattoso {ftoliveira, murta, werner, marta}@cos.ufrj.br.

8

Goal

• Propose a proactive recommendation service that aims at suggesting frequent combinations of scientific programs for reuse.

COPPE/UFRJ

Page 9: Using Provenance to Improve Workflow Design Frederico Tosta Leonardo Murta Claudia Werner Marta Mattoso {ftoliveira, murta, werner, marta}@cos.ufrj.br.

9

Approach

Workflow specification

Workflow specification

DB

Design

Design for reuse and recommendation

Provenance

COPPE/UFRJ

Page 10: Using Provenance to Improve Workflow Design Frederico Tosta Leonardo Murta Claudia Werner Marta Mattoso {ftoliveira, murta, werner, marta}@cos.ufrj.br.

10

Approach

Workflow specification

Workflow specification

DB

Design

ProactiveRecommendation

Design with reuse and recommendation

Provenance

COPPE/UFRJ

Page 11: Using Provenance to Improve Workflow Design Frederico Tosta Leonardo Murta Claudia Werner Marta Mattoso {ftoliveira, murta, werner, marta}@cos.ufrj.br.

11

Implementation

• Populating the database: VisTrails workflows:

- Parse provenance xml files to extract the relations.

MySQL database:- The relations are mapped into a database.- Each relation contains the modules and how

they are connected.

COPPE/UFRJ

Page 12: Using Provenance to Improve Workflow Design Frederico Tosta Leonardo Murta Claudia Werner Marta Mattoso {ftoliveira, murta, werner, marta}@cos.ufrj.br.

12

Implementation

VisTrails workflow design with recommendation

Source Destination Source Port Dest Port

HmmBuild HmmCalibrate DestinationDir SourceDir

HmmBuild Cat DestinationDir Dir

HmmBuild HmmCalibrate DestinationDir HmmPath

HmmBuild HmmCalibrate StdOut HmmPath

HmmBuild HmmCalibrate StdOut HmmPath

Ports 1 and 2 are the output ports DestinationDir and StdOut, respectively. Ports 3, 4 and 5 are the input ports SourceDir, HmmPath and Dir, respectively

•Recommendation Metric:From the example, we can infer that port StdOut of HmmBuild has been connected to port HmmPath of HmmCalibrate in 40% of previously designed workflows.

COPPE/UFRJ

Page 13: Using Provenance to Improve Workflow Design Frederico Tosta Leonardo Murta Claudia Werner Marta Mattoso {ftoliveira, murta, werner, marta}@cos.ufrj.br.

13

Implementation

VisTrails workflow design with recommendationCOPPE/UFRJ

Page 14: Using Provenance to Improve Workflow Design Frederico Tosta Leonardo Murta Claudia Werner Marta Mattoso {ftoliveira, murta, werner, marta}@cos.ufrj.br.

14

Conclusion

• We expect that this approach may help to propagate the benefits of software reuse to the context of scientific workflows.

• Reduce the time to design workflows.

• Increase the quality of workflows designed.

COPPE/UFRJ

Page 15: Using Provenance to Improve Workflow Design Frederico Tosta Leonardo Murta Claudia Werner Marta Mattoso {ftoliveira, murta, werner, marta}@cos.ufrj.br.

15

Conclusion

•Limitations: The current version of our prototype recommends

only a subsequent component based on previously used connection.

• Future works: Improve the approach recommending a

component investigating the whole path. Specify a context to each workflow. Apply weight to each relation based on workflow

usage.

COPPE/UFRJ

Page 16: Using Provenance to Improve Workflow Design Frederico Tosta Leonardo Murta Claudia Werner Marta Mattoso {ftoliveira, murta, werner, marta}@cos.ufrj.br.

16

Using Provenance to Improve Workflow Design

UFRJ

Frederico TostaLeonardo MurtaClaudia WernerMarta Mattoso

{ftoliveira, murta, werner, marta}@cos.ufrj.br

COPPE – Federal University of Rio de Janeiro - Brazil