Importance and Challenges of Reproducible Research

Post on 16-Apr-2017

203 views 5 download

Transcript of Importance and Challenges of Reproducible Research

May 2016© 2016 IEEE

Importance and Challenges of Reproducible Research

Vladimir Kanchevvladimir.kanchev@ieee.org

*

* http://www.software.ac.uk/blog/2014-03-21-reproducible-research-impossible-dream

Slide 2

Slide 3

Agenda

1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion

Slide 4

Agenda

1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion

Slide 5

Personal Introduction

• Defense of my Ph.D. thesis at TU-Sofia is pending• Research in image/MR image segmentation• Publications in peer-reviewed journals• Some experience in industry

Slide 6

Agenda

1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion

Slide 7

Introduction to Reproducible ResearchDefinitions

Reproducible Research (RR) is an approach aiming at complementing classical printed scientific articles with everything required to independently reproduce the results they present *. "Everything" covers:

• data• computer codes• a precise description of how the code was applied to the data

* Delescluse, Matthieu, et al. "Making neurophysiological data analysis reproducible: Why and how?" Journal of Physiology-Paris 106.3 (2012):159-170.

Introduction to Reproducible ResearchDefinitions

Another definition (Signal Processing): An article about computational science in a

scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures*.D. Donoho

* D. Donoho et al., “Reproducible Research in Computational Harmonic Analysis,” Computing in Science & Eng., vol. 11, no. 1, 2009, pp. 8–18

Slide 6

Slide 9

Introduction to Reproducible ResearchDefinitions

• Replication – independent people going out and collecting new data to verify research* (Roger Peng). It is considered the scientific golden standard.

• Reproduction – independent people analyze the same data and produce the same result* . Focus on validity of data analysis. (Roger Peng)

* http://simplystatistics.org/2011/12/02/reproducible-research-in-computational-science/

Introduction to Reproducible ResearchDefinitions

*

* Peng, R. D. (2011). Reproducible research in computational science. Science (New York, Ny), 334(6060), 1226.

Slide 8

Slide 11

Introduction to Reproducible ResearchHistory

The RR “movement" started with what economists have been calling replication since the early 1980s to reach what is now called reproducible research in computational data analysis. Currently, it is influenced by the open science and open source movement.

Slide 12

Introduction to Reproducible Research Relation to scientific method

Steps of a scientific method *:1. Define a question2. Observe – gather information and resources3. Form an explanatory hypothesis4. Test the hypothesis by performing an experiment and

collecting data in a reproducible manner5. Analyze the data6. Interpret the data and draw a conclusion7. Publish results8. Retest (reproduce) from other researchers

 * Crawford S, Stucki L (1990), "Peer review and the changing research record", "J Am Soc Info Science", vol. 41, pp. 223–228

The steps related to the Reproducible Research are in italic type

* https://scischol102.wordpress.com/category/science/

* *

Slide 11

Slide 14

Introduction to Reproducible Research Relation to scientific method

Principles of a scientific method:1. Empirically testable2. Replicable3. Objective4. Transparent5. Falsifiable6. Logically consistent

Slide 15

Introduction to Reproducible Research Scheme

*

* http://www.biostat.jhsph.edu/~rpeng/research.html (mod.)

Slide 16

Introduction to Reproducible ResearchCurrent situation

Current situation with RR in different fields:• Medicine (cancer research), social sciences

(psychology), etc.Replication/Reproducibility crisis – the results of scientific experiments are impossible to replicate

• Natural sciences • Computer science

* Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature,533(7604), 452-454.

*

Slide 15

Slide 18

Introduction to Reproducible ResearchCurrent situation

Reproducibility in Medical imaging &Computer vision & Machine learning:• Public test sets available• Most method codes are available (papers from

major conferences and journals)• High pressure/workload on researchers to

make their work reproducible

Slide 19

Introduction to Reproducible ResearchCurrent situation

Reproducibility in Medical imaging &Computer vision & Machine learning (cont.):• Benchmark comparison with other methods -

compulsory• Experiment automation• Differences between Medical imaging vs.

Computer vision & Machine learning fieldsExample: IPOL journal

Slide 20

Introduction to Reproducible ResearchReasons

Reasons for reproducibility/replication crisis:• “Publish or perish” culture - pressure to obtain

publishable results• Uneasiness to make method codes public –

additional time and efforts to improve its quality• Most graduate non-CS students are not taught in

software engineering and statistics courses

*

* Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature,533(7604), 452-454.

Slide 21

Slide 22

Other problems:• Insufficient description of the experiment in the

publications• Test datasets and paper method codes not publicly

available – common in social sciences• The used mathematical methods are inclined to

malpractices – p hacking (data dredging), failing to report non-significant tests, inclusion/exclusion of points/results until achieving the desired result

Introduction to Reproducible ResearchReasons

Slide 23

Introduction to Reproducible ResearchReasons

Problems with method code:• Reproducibility issues – missing method data

and code, method code errors, not all figures and tables are reproduced

• Documentation issues – missing README file, bad code documentation

• Programming style issues – bad coding style

*

* Wolkovich, E. M., Regetz, J., & O'Connor, M. I. (2012). Advances in global change research require open science by individual researchers. Global Change Biology, 18(7), 2102-2110.

Slide 24

Introduction of Reproducible Research Guidance (Biostatistics journal)

Authors should provide all data code inorder to reproduce all results, images andtables with:

• README file• Consistent coding style and documentation• Test data sets• Simulations and random numbers• General advice

* Peng, R. D. (2009). Reproducible research and biostatistics. Biostatistics,10(3), 405-408.

Slide 25

Slide 26

Agenda

1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion

Slide 27

Software tools

Recommended programs to use to achievereproducibility:• Latex (Tex editor)• Version control systems - Git software systems• Make – pipeline

Literate programming concept (Knuth).

Slide 28

Software tools

Matlab programming language:• Matlab file exchange• Proprietary Matlab toolboxes - disadvantages• Examples of RR toolboxes - Wavelab,

Sparselab• Matlab publish – no literate programming

support

Slide 29

Software tools

R programming language:• R studio – development environment for R

programming language• Graphic packages, such as ggplot2• Packages as knitr or rmarkdown – literate

programming support

Slide 30

Software tools

Python programming language:• Many open scientific libraries available – scipy,

numpy, etc.• IPython notebook • Sumatra package – save parameter values,

code state, output results and files

* ISMB/ECCB 2013 Keynote

*

Slide 31

Slide 32

Agenda

1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion

Slide 33

The context – personal experience

Making a current research project reproducible at the end of the process is not the best way ….

* http://www.idiap.ch/~marcel/professional/BTAS_SS_2015.html

*

The context – personal experience

Difficulties with:• Exact reproduction of all figures and results• Exact parameter values setting• Time to improve code quality and add

documentation

Slide 34

Slide 35

The context – personal experience

Motivation for achieving reproducibility:• Better visibility of research• More citations and higher impact• Increased trust in research quality (outside

academia, e.g. from industry)• Help from readers of the publication with the

improvement of the developed method

Slide 36

Agenda

1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion

Slide 37

The situation in Bulgaria and abroad

RR in Bulgaria:• Its introduction in the scientific community is still

at the beginning • Its principles need to be taught at under- graduate and graduate level• Paper code and test datasets, in general, are not available online in most fields

Slide 38

The situation in Bulgaria and abroad

Advances of RR implementation would:• Increase the impact of research conducted by

Bulgarian researchers abroad • Improve reputation and applicability – especially

to people from industry• Faster distinction of quality work and steady

improvement of lower quality papers

Slide 39

The situation in Bulgaria and abroad

Advances of RR implementation (cont.):• Profit from the fast development of scientific

computing, machine learning, data science, and AI• Attract more bright young people in research (open source movement and open data)

Slide 40

The situation in Bulgaria and abroad

RR abroad:• A great issue in social and biomedical sciences• An important criterion for manuscript evaluation

from reviewers in many CS fields• One of major requirements of funding agencies

abroad for the evaluation of project proposals

Slide 41

Agenda

1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion

Slide 42

Additional resources for research and RR methods

MOOC courses:1. Data science specialization (www.coursera.org) (John

Hopkins University) – course 5 Reproducible research2. Methods and Statistics in Social Sciences Specialization

(www.coursera.org) (University of Amsterdam) 3. Research Methods: An Engineering Approach

(www.edx.org) (Wits University )4. Research Data Management and Sharing

(www.coursera.org) (The University of North Carolina at Chapel Hill & The University of Edinburgh)

Slide 43

Additional resources for research and RR methods

Software tools for RR:1. Software carpentry (www.Software-carpentry.org) – basic

computing skills for researchers2. Bootcamps - one or two day long courses – teaching coding

and professional skills for researchers.3. MOOC courses - www.coursera.org, www.edx.org,

www.udacity.org - for programming skills in R, Python, Matlab.

Slide 44

Additional resources for research and RR methods

Books:1. Stodden, V., Leisch, F., & Peng, R. D. (Eds.)

(2014). Implementing Reproducible Research. CRC Press 2. Gandrud, C. (2013). Reproducible Research with R and R

Studio. CRC Press3. Subramanian, G. (2015). Python Data Science Cookbook.

Packt Publishing Ltd4. Milovanovic, I., Foures, D., & Vettigli, G. (2015). Python Data

Visualization Cookbook. Packt Publishing Ltd

Slide 45

Agenda

1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion

Slide 46

Discussion

Topics for discussion:• What do you think about reproducibility,

in general?• Have you already met RR in your work?• How the application of reproducibility might

impact your work as researchers, engineers, or programmers?

Slide 47

End