Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and...

31
Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015

Transcript of Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and...

Page 1: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

Data Repositories and Science Gateways for Open Science

Presenter: Roberto Barbera – UNICT and INFN

EGI Community Forum Bari – 11 November 2015

Page 2: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

2

Outline

Introductory concepts, definitions and driving considerations

A viable approach to Open Science

Summary and conclusions

Page 3: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

3

The Scientific Method

• Examples of IR: • Classical

Mechanics• Newton’s

Gravitation Theory

• Examples of DR: • General

Relativity• Standard

Model of Particle Physics

G. Galilei

Page 4: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

4

The Pillars of the Scientific Method

• Repeatability• The closeness of agreement between independent

results obtained with the same method on identical test material, under the same conditions (same operator, same apparatus, same laboratory and after short intervals of time)

• Affected by random errors

• Reproducibility• The closeness of agreement between independent

results obtained with the same method on identical test material but under different conditions (different operators, different apparatus, different laboratories and/or after different intervals of time)

• Affected by systematic errors

Is science really reproducible ?

Page 5: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

5

Challenges in irreproducible research(http://www.nature.com/nature/focus/reproducibility/index.html)

Page 6: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

6

The “reproducibility crisis”

18

Out of 18 microarray papers, results

from 10 could not be reproduced

1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 142. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950

Page 7: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

7

Repeatability and Reproducibility are not all

Page 8: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

8

How e-Infrastructures support the (e-)Scientific Method

Data InfrastructuresOpen Access Doc. Repos.

Data Repos.

Sem

an

tic-w

eb

en

rich

men

t of

lin

ked

data

Data

pre

serv

ati

on

HTC

/HP

C C

luste

rsG

rid

s,

Clo

ud

s

Challenge: «walk» across the knowledge path both ways

Page 9: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

Open Science

Page 10: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

10

An INFN approach to Open Science:the “grand” view

Digital Repository of Research Products(pilot: www.openaccessrepository.it)

arX

iv

CN

RS

&T D

LC

INEC

A

VQ

R

INFN

M

ult

im

ed

ia

SINGLE – MANDATORY - DEPOSITSCIENCE PRODUCTS REPRODUCIBILITY

ORCID

INFN

G

ray

Lit

.S

CO

AP

3

Page 11: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

The INFN Open Access Repository(www.openaccessrepository.it)

papers

data

Automatic ingestion in place from:

federatedauthentication

Page 12: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

12

Alternative reputation systems:possibility to add researcher ID’s

Page 13: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

13

Examples of document and data resources

Data stored on:

Page 14: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

14

Example of software resources: the ALICE Virtual Research Environment

Page 15: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

15

Example of research “package”

Page 16: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

16

The OAR Knowledge Workflow

Page 17: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

17

The OAR Knowledge Workflow:ALEPH data search & discovery

Page 18: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

18

1. From OAR it is possible to select an “analysis” as simply as any other resources in the archive

The OAR Knowledge Workflow:ALEPH “packages” inspection

2. Clicking on RUN PAGE, the researcher can either reproduce or extend that particular analysis using a Catania Science Gateway

Page 19: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

19

The OAR Knowledge Workflow:ALEPH data analysis (1/2)

The Science Gateway collects from the OAR, and allows user browse, the metadata associated to the dataset(s) needed to run that particular analysis

Page 20: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

20

The OAR Knowledge Workflow:ALEPH data analysis (2/2)

Data are retrieved from

Using the JSAGA adaptor for all OCCI-compliant cloud-middleware, the Science Gateway starts a dedicated VM already configured with the all the experiment software

Both the CHAIN-REDS Cloud Testbed and the EGI Federated Cloud can be used as e-Infrastruc-tures

Jobs run both on and

Page 21: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

21

Remember: repeatability and reproducibility are not all

Reusability and «extensibility» matter!

Page 22: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

22

1. From within the CHAIN-REDS Science Gateway entitled researchers can start VMs already configured to re-use/extend ALICE data analyses

2. The VMs are deployed both on the CHAIN-REDS Cloud Testbed and on the EGI Federated Cloud using the features of the EGI AppDB

Reusability of ALICE data with the CHAIN-REDS Science Gateway (1/3)

Page 23: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

23

Reusability of ALICE data with the CHAIN-REDS Science Gateway (2/3)

1. The VM is available tor a customizable amount of time during which the user has full access to the dataset(s) and analysis algorithm(s) and source code(s) of the experiment

2. The user can access the VM using different protocols (e.g., SSH, VNC); clicking on the SSH or VNC icons the user can directly access the VM instantiated on the cloud from within the Science Gateway

Page 24: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

24

Reusability of ALICE data with the CHAIN-REDS Science Gateway (3/3)

New stable analyses (and their results), generated running the VM,

may be registered in the OAR (with DOIs) to further extend the analysis catalogue shared within the Virtual

Research Community

Page 25: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

25

“Who’s this science of ?”

How to provide authorship to research products?

Page 26: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

26

ORCID (www.orcid.org – becoming a “de facto” standard)

More than 1.74 million ORCID IDs so far

Page 27: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

27

ORCID: search & link your works in/from DataCite

Page 28: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

28

ORCID: add your research products to your profile

v

<a

Page 29: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

29

Summary and conclusions

Open Science vision can be implemented only if the “openness” paradigm becomes pervasive in research

Science outputs’ reproducibility, but also re-usability and extensibility, are key to walk through the “knowledge path” in both directions

The INFN Open Access Repository is a pilot knowledge preservation repository meant to serve both researchers and citizen scientists

What makes the INFN OAR different from other repositories is: Its capability to connect to Science Gateways and exploit

cloud resources worldwide to easily reproduce/extend scientific analyses

Its capability to provide full authorship (and hence credit, reputation and visibility) for all products of a scientist this is key for a correct evaluation of research (…and of researchers)

Page 30: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

30

Authors

R. Barbera (University of Catania and INFN, Italy)

S. Bianco (INFN LNF, Italy) T. Boccali (INFN Pisa, Italy) C. Carrubba (University of Catania, Italy) G. Inserra (University of Catania, Italy) M. Maggi (INFN Bari, Italy) D. Menasce (INFN Milano Bicocca, Italy) R. Ricceri (University of Catania, Italy)

Page 31: Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.

31

Thank you !