Make the World a Better Place through Reproducible Research Roger D. Peng Department of...

27
Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall of Wonder 2006-05-12

Transcript of Make the World a Better Place through Reproducible Research Roger D. Peng Department of...

Page 1: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

Make the World a Better Place through Reproducible Research

Roger D. PengDepartment of Biostatistics

Johns Hopkins Bloomberg School of Public Health

Wall of Wonder

2006-05-12

Page 2: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

Trends in Scientific Research

• Signal-to-noise in many investigations is getting smaller

• Smaller relative risks– e.g. relative risk of mortality is 1.005 per 10

ppb of ozone

• High-throughput measurement technologies

• Powerful computers

Page 3: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

Trends in Computing: Then...

Page 4: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

...And Now

Page 5: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

The Result?

• Large databases for investigating subtle associations

• Interactive computing with advanced statistical algorithms

• Sophisticated searches across models and variables to identify important risks

• Bigger and better studies

Page 6: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

Replication: The Standard

• Scientific evidence is strengthened when important findings are replicated by independent investigators, data, methods, laboratories, instruments, etc.

• Replication is often not possible because of time, funding constraints

• Policy decisions must often be made with evidence at hand

Page 7: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

Reproducible Research: A Minimum Standard

Published research where the following are made available:

• Analytic data

• Computer code implementing methods

• Documentation about code/data

All are distributed using standard means

Page 8: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

Benefits of Reproducible Research

• Published findings can be verified

• Alternative analyses conducted

• Challenge uninformed criticisms (“put up or shut up”)

• Expedite exchange of ideas among investigators

Page 9: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

Challenges to RR

• “If I give away my data, others will publish results and scoop me”

• “I own my data and ideas, other people don’t necessarily have any rights to them”

Why should I just give away my intellectual property?

Page 10: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

Ya see, it’s what I call the “ownership society”

Page 11: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

Property

[Automatt] [JRodrigues]

[james.thompson]

[nervsappy]

Page 12: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

“Intellectual Property”

• “the intangible value created by human creativity and invention” – from JHSPH Office of Technology Transfer

(emphasis added)

• How can something that is intangible be property?

Page 13: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

There’s No Such Thing as “Intellectual Property”

• If I copy your book, you still have your book• If I use your idea, you still have your idea• If I copy your data, you still have your data• If I use your statistical model, you still have

your statistical model• If I implement your algorithm, you still have

your algorithm• etc.

Page 14: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

Research done by youregardless of

sharing

What are the Potential Gainsand Losses from Sharing Data?

Research done by youregardless of

sharing

Data

Research done by others

Research youwould have done

if you hadn’tshared dataDon’t share

Share = Y(1)

= Y(0)

(a) D = 0(b) D < 0(c) D > 0

D = Y(1) - Y(0)

Page 15: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

What is a Dataset?

• Represents already published findings and ideas

• Contains potential findings and ideas yet to be discovered and exploited

• Datasets do not fit well into the framework of copyrights and patents

Page 16: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

What Do We Need?

• Infrastructure– Tools for researchers, developers– Repositories for datasets– Rights framework for datasets

• Privacy preservation• Handle computer language Babel• Structured research modularity

Page 17: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

“WWKD”

Page 18: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

“WWKD”What Would Karl Do?

Page 19: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

Models for Reproducibility

Page 20: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

Models for Reproducibility

Page 21: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.
Page 22: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

Models for Reproducibility

Page 23: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

Models for Reproducibility

Page 24: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

Models for Reproducibility

Page 25: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

An Example

http://www.biostat.jhsph.edu/MCAPS/

Page 26: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

Partial Rights for Data?A First Cut

• Full access: the data can be used for any purpose

• Attribution: the data can be use for any purpose with a specific citation

• Share-alike: the data can be used for any purpose but any “improvements” must be made available under the same license

• Reproduction-only: the data can only be used for reproducing published results and commenting via a letter to the editor

Page 27: Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.

Thank you!