Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

25
Date: 22/10/2014 Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users Daniel Garijo *, Oscar Corcho *, Yolanda Gil Ŧ , Meredith N. Braskie , Derrek Hibar , Xue Hua , Neda Jahanshad , Paul Thompson , and Arthur W. Toga * Universidad Politécnica de Madrid, Ŧ USC Information Sciences Institute, USC Laboratory of Neuroimaging

description

eScience 2014, Guarujá (Brasil). Abstract: Workflow reuse is a major benefit of workflow systems and shared workflow repositories, but there are barely any studies that quantify the degree of reuse of workflows or the practical barriers that may stand in the way of successful reuse. In our own work, we hypothesize that defining workflow fragments improves reuse, since end-to-end workflows may be very specific and only partially reusable by others. This paper reports on a study of the current use of workflows and workflow fragments in labs that use the LONI Pipeline, a popular workflow system used mainly for neuroimaging research that enables users to define and reuse workflow fragments. We present an overview of the benefits of workflows and workflow fragments reported by users in informal discussions. We also report on a survey of researchers in a lab that has the LONI Pipeline installed, asking them about their experiences with reuse of workflow fragments and the actual benefits they perceive. This leads to quantifiable indicators of the reuse of workflows and workflow fragments in practice. Finally, we discuss barriers to further adoption of workflow fragments and workflow reuse that motivate further work.

Transcript of Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Page 1: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Date: 22/10/2014

Workflow Reuse in Practice:

A Study of Neuroimaging Pipeline Users

Daniel Garijo *, Oscar Corcho *, Yolanda Gil Ŧ, Meredith N. Braskieⱡ, Derrek Hibarⱡ, Xue Huaⱡ, Neda Jahanshadⱡ, Paul

Thompsonⱡ, and Arthur W. Togaⱡ

* Universidad Politécnica de Madrid, Ŧ USC Information Sciences Institute,

ⱡ USC Laboratory of Neuroimaging

Page 2: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Main Contributions

•Highlight the benefits of workflows and workflow fragments reported by users in a neuroscience research lab •Survey of workflow users

•Quantitative perspective on the identified benefits.

IEEE eScience 2014. Guarujá, Brasil

2

repurpose

reuse

repository

Create, collaborate

Page 3: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Background

• Workflows are software artifacts that capture computational experiments • Addition to paper publication • Provenance of results • Reuse

• Existing repositories of workflows

(Galaxy, myExperiment, the LONI Pipeline, CrowdLabs, etc.) • Sharing workflows • Exploring existing workflows

• PROBLEMS to address: •How does workflow reuse happen in a research lab environment? •Are workflow fragments more useful than workflows?

3 IEEE eScience 2014. Guarujá, Brasil

Page 4: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Use case: The LONI Pipeline

Workflow system for neuroimaging analysis http://pipeline.loni.usc.edu/explore/library-navigator/

IEEE eScience 2014. Guarujá, Brasil

4

Page 5: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Why LONI Pipeline?

•Need for reuse •Grouping Tools

•Manual annotation of workflow fragments •Workflow Miner

5 IEEE eScience 2014. Guarujá, Brasil

Page 6: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Approach

IEEE eScience 2014. Guarujá, Brasil

6

Discussions with scientists User survey

Collect responses from users

21 responses

Discuss results

Page 7: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Possible benefits of workflows and workflow fragments

•Sharing workflows with collaborators •Time savings

•Copy & paste fragments of workflows •Reuse existent workflows

•Teaching •Reduce the learning curve of new students

•Visualization

•Simplify workflows

•Design for modularity •Highlight the most relevant steps on a workflow

IEEE eScience 2014. Guarujá, Brasil

7

Page 8: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Possible benefits of workflows and workflow fragments (2)

•Design for understandability •Design for standardization •Debugging

•Provenance exploration

•Paper writing •Linking papers to pipelines

•Reproducibility and inspectability

IEEE eScience 2014. Guarujá, Brasil

8

Page 9: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Survey Analysis

9 IEEE eScience 2014. Guarujá, Brasil

Page 10: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Writing and Sharing Code

•Writing code is considered very important for this area of research. •Sharing code is not considered to be as important.

10 IEEE eScience 2014. Guarujá, Brasil

Page 11: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Adopting a Workflow System

The overwhelming majority of responders found the workflow system useful.

•Creation of workflows.

IEEE eScience 2014. Guarujá, Brasil

11

Page 12: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Adopting a workflow system: workflow size

•Workflows of fewer than 10 steps seem to be the most preferred by scientists

IEEE eScience 2014. Guarujá, Brasil

12

0

2

4

6

8

10

12

14

1 2 3 41-5 5-10 10-20 >20

Numberofworkflowcomponents

Page 13: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Reusing workflows

•Respondents answered that creating workflows is very useful •Reuse of workflows was seen as less useful

•Reuse is not the only reason why workflows are created

•Reusing workflows from a user’s prior work is considered as useful as reusing workflows from others

IEEE eScience 2014. Guarujá, Brasil

13

Page 14: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Reusing workflows (2)

According to the respondents, the major benefits of workflows include: • Time savings •Organizing and storing code • Having a visualization of the overall analysis •Facilitating reproducibility

IEEE eScience 2014. Guarujá, Brasil

14

Workflows save time 13

Easier to track and debug complex code 9

Convenient way to organize/store code 11

Help write more organized code 6

Help make code more modular/reusable 4

Help make methods more understandable 8

Visualization of overall analysis 11

Workflows facilitate reproducibility 10

Page 15: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Reusing workflows (3)

•The overwhelming majority of respondents said workflows are useful for both non-programmers and for teaching new students

IEEE eScience 2014. Guarujá, Brasil

15

Non-programmers can use them 20

New students can easily learn 19

No need for others to re-implement code 14

Adoption of standard ways to do things 9

Page 16: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Reusing workflows (4)

•Respondents did not offer very overwhelming reasons for not sharing workflows •Respondents did not offer very overwhelming reasons for not reusing workflows from others

IEEE eScience 2014. Guarujá, Brasil

16

Others would not want to use them 1

Others ask too many questions of the creators 2

Workflows from others are difficult to understand 3

It is difficult to understand how to prepare data for a workflow 3

Workflows from others are difficult to understand 4

It is difficult to understand how to prepare data for a workflow 2

Workflows created by others are too specific 1

It is hard to take workflows created by others and make them work 2

Page 17: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Reusing groupings

•Reuse is not the only reason why groupings are created. Unlike workflows, reusing groupings from one’s own work is more useful than reusing groupings from others

IEEE eScience 2014. Guarujá, Brasil

17

Page 18: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Reusing groupings (2)

•Most respondents agreed that groupings help simplify workflows. Groupings also make workflows more understandable by others •Other grouping benefits:

•Time savings •Help making modular and understandable code, more so than workflows •Seen as useful to non-programmers and students

IEEE eScience 2014. Guarujá, Brasil

18

Visualization of the analysis 10

To simplify workflows that are complex overall 12

To make workflows more understandable to others 12

Groupings save time 12

Help make code more modular/reusable 10

Help make methods more understandable 7

Page 19: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Reusing groupings (3)

Very few responses motivated any reasons for not sharing groupings or not reusing groupings from others

In general, workflows are considered generally more useful than groupings. On the other hand, more respondents said that groupings help make their code more modular and understandable

IEEE eScience 2014. Guarujá, Brasil

19

Others would not want to use them 0

Others ask too many questions of the creators 1

Workflows from others are difficult to understand 4

It is difficult to understand how to prepare data for a grouping 1

Groupings from others are difficult to understand 2

It is difficult to understand how to prepare data for a grouping 3

Groupings created by others are too specific 1

It is hard to take groupings created by others and make them work 4

Page 20: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Paper Writing

Workflows are not systematically linked to publications

•Most responders believe that the link between a workflow and a publication is kept in private laboratory notes, rather than in a publicly accessible manner

IEEE eScience 2014. Guarujá, Brasil

20

Page 21: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Discussion

Workflows have a clear benefit to the lab. There are important directions of future research suggested by this work: •Improve the use of groupings.

•If users had more assistance in specifying and finding groupings, it is possible that workflows and fragments would be more reused

•Debugging and checking results •Better mechanisms to handle checking intermediate execution results would allow users to define larger workflows

•Better documentation of workflows. •Documentation of workflows tends to be private and scattered, and not usually linked to papers

•Facilitating workflows publication and linking to papers •Papers provide important context and documentation for workflows

IEEE eScience 2014. Guarujá, Brasil

21

Page 22: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Conclusions

•Contributions:

•Highlight the benefits of workflows and workflow fragments reported by users in a neuroscience research lab •Quantitative survey of the benefits by workflow users

•Our work can be expanded by •Validating our findings with more respondents •Reflecting the experience level of the respondents on the questionnaire •Including statistics of the groupings usage on the workflows they create

•There are clear opportunities to develop best practices for designing workflow components and modularizing code, encouraging standards adoption, and facilitating understanding by other users

IEEE eScience 2014. Guarujá, Brasil

22

All materials used and the survey are available at:

http://purl.org/net/wfSurvey-eScience2014

Page 23: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

23

Who are we?

•Daniel Garijo, Oscar Corcho Ontology Engineering Group, UPM •Yolanda Gil Information Sciences Institute, USC •Meredith N. Braskie, Derrek Hibar, Xue Hua, Neda Jahanshad, Paul Thompson

Arthur W. Toga. USC Laboratory of Neuro Imaging

IEEE eScience 2014. Guarujá, Brasil

Page 24: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

24

Questions?

IEEE eScience 2014. Guarujá, Brasil

Page 25: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Date: 22/10/2014

Workflow Reuse in Practice:

A Study of Neuroimaging Pipeline Users

Daniel Garijo *, Oscar Corcho *, Yolanda Gil Ŧ, Meredith N. Braskieⱡ, Derrek Hibarⱡ, Xue Huaⱡ, Neda Jahanshadⱡ, Paul

Thompsonⱡ, and Arthur W. Togaⱡ

* Universidad Politécnica de Madrid, Ŧ USC Information Sciences Institute,

ⱡ USC Laboratory of Neuroimaging