BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012...

41
BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed Central iain.hrynaszkiewicz @biomedcentral.com @iainh_z

Transcript of BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012...

Page 1: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

BioMed Central’s open data initiatives

Alliance for Permanent Access conference7th November 2012

Iain HrynaszkiewiczPublisher (Open Science), BioMed [email protected]

@iainh_z

Page 2: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

About BioMed Central

• Launched in 2000, largest global publisher of peer-reviewed open access journals (>240)

• >136,000 peer-reviewed open access articles published

• Part of Springer Science+Business Media since 2008

• Publish using Creative Commons (CC-BY) licenses• Non-journal products include ISRCTN database• Interested in innovation and recognise the growing

need for data sharing and publicationhttp://blogs.biomedcentral.com/bmcblog/tag/Open-Data/

Page 3: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

BioMed Central and open data

• Increasing transparency in scientific research and scholarly communication is at the core of strategy

• Data are an increasingly integral part of scholarly communication, with many opportunities for increasing the pace of knowledge discovery

• Publishers, particularly open access publishers, are well-placed to share information across domain boundaries http://www.biomedcentral.com/about/access

“By ‘open data’ BioMed Central means that these data are freely available on the public internet permitting any user to download, copy, analyse, re-process, pass them to software or use them for any other purpose without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. BioMed Central encourages the use of fully open formats wherever possible.”

Page 4: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

BioMed Central open data initiatives

• Data journals and article types• Open Data Award• Data hosting, citation, deposition and linking• Lab notebook-journal integration (LabArchives)• Data licensing• Guidance and best practice e.g. human subjects – confidentiality and

consent• Data formats and standards – efficient reuse• Facilitation of data/text mining research

Page 5: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Problem: Lack of credit/recognition for data sharing and publication

• In science credit is everything but incentives for data publication are still emerging

• Datasets are not generally as discoverable and citable as journal articles – yet

• Requirements for data sharing are field/location-specific

• Need more empirical evidence of the benefits of data publication for individual scientists

Page 6: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Data notes: “[B]riefly describe a biomedical data set or database, with the data being readily accessible and attributed to a source” http://bit.ly/y3Jb3b

Data notes: “[E]xceptional datasets deposited in our GigaScience repository that have been selected for further peer review” http://bit.ly/yPBsAA

Research: E.g. The International Stroke Trial database http://www.trialsjournal.com/content/12/1/101

Solution #1: Journals and article types enabling data publication

Page 7: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Solution #2: Open Data Award

“We ... recognize researchers who have ... have demonstrated leadership in the sharing, standardization, publication, or re-use of biomedical research data.”

http://www.biomedcentral.com/researchawards/opendata

Page 8: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Solution #3: Enable and encourage/require data citation

“References...Only articles, datasets and abstracts that have been published or are in press, or are available through public e-print/preprint servers, may be cited…“Dataset with persistent identifierZheng, L-Y; Guo, X-S; He, B; Sun, L-J; Peng, Y; Dong, S-S; Liu, T-F; Jiang, S; Ramachandran, S; Liu, C-M; Jing, H-C (2011): Genome data from sweet and grain sorghum (Sorghum bicolor). GigaScience. http://dx.doi.org/10.5524/100012."

http://blogs.biomedcentral.com/bmcblog/2012/01/19/citing-and-linking-data-to-publications-more-journals-more-examples-more-impact/

Page 9: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Problem: Where can data be stored – permanently?

• Publishers not best placed to run repositories for long term preservation of large datasets

• Mirrors of publisher content not able to accept arbitrary amounts of additional data

• Many data repositories exist but most are domain/location specific and there are many different types of funding model, license agreement and persistent identifiers in use

Page 10: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Solution #1: Journal with integrated database

Page 11: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Editor-in-Chief:

Laurie Goodman, BGI

(USA)

www.gigasciencejournal.com www.biomedcentral.c

om

• The BGI is covering all APCs for the first year after launch

GigaScience publishes ‘big-data’ studies from the entire spectrum of life

sciences

• Novel publishing format -manuscript publication and data hosting

Editor:

Scott Edmunds, BGI

(China)

Assistant Editor:

Alexandra Basford, BGI

(China)

• Assignment of data DOIs allows separate data citation

Benefits

Page 12: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

http://gigadb.org/

Page 13: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

GigaDB is a new database integrated with the GigaScience journal to meet the needs of a new generation of biological and biomedical research as it enters the era of “big-data”… (see more)

Page 14: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

http://gigadb.org/

Page 15: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Anatomy of a GigaScience Publication

Data

Idea

Study

Analysis

Answer

Metadata

Page 16: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Solution #2: Comprehensive author information on available data

repositories

http://datacite.org/repolist

http://www.biomedcentral.com/about/supportingdata

Page 17: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Solution #3: Research on repositories

http://publicationethics.org/files/u661/EthicalEditing_Autumn2012_final.pdf We are looking for repositories with interests in clinical research data – can you help?

Page 18: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Problem: Data are not consistently linked to publications

• Data deposition policies are not established in all fields

• Even where they are links/accession numbers tend to be inconsistently presented and rarely cited

• Researchers may, independently of journal requirements, deposit data in repositories

• A missed opportunity to enhance the literature

Page 19: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Solution #1: ‘Availability of supporting data’ article section

• A tool to put data deposition policies – encouraged or mandated – into practice

• Provides links in a consistent place within an article to supporting data, regardless of the location or format of the data

• Data must be permanently available (DOI or equivalent)

• ~50 journals including GigaScience, BMC series

http://www.biomedcentral.com/about/supportingdata

Page 20: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Availability of supporting data

BMC Res Notes 2012, 5:21 http://www.biomedcentral.com/1756-0500/5/21/

GigaScience 2012, 1:3 http://www.gigasciencejournal.com/content/1/1/3

Page 21: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Solution #3: Lab notebook integration

• BMC authors entitled to LabArchives’ (http://www.labarchives.com/bmc) online lab notebook with 100Mb of free storage

• Features include:- Data publishing with DOIs assignment- Citable, linkable data supporting publications- Reusable/integrate-able data with CC0 waiver- Integrated manuscript submission to BMC journals- Additional free storage (standard is 25Mb)http://blogs.openaccesscentral.com/blogs/bmcblog/entry/labarchives_and_biomed_central_a

Page 22: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

LabArchives partnership

Page 23: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

24 Oct 2012

Open data partnership leads to release of data from Nobel Prize-winning laboratory for public usehttp://www.biomedcentral.com/presscenter/pressreleases/20121024c

Page 24: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

“The data should be released in standardized formats without intellectual property constraints.” Conway PH, VanLare JM: Improving Access to Health Care Data: The Open Government Strategy. JAMA 2010;304(9):1007-1008.

http://pantonprinciples.org/

http://www.isitopendata.org/

“[P]eople mis-use copyright licenses on uncopyrightable materials and data sets: the confusion of the legal right of attribution in copyright with the academic and professional norm of citation of one's efforts.” John Wilbanks, VP, Science, Creative Commons, http://bit.ly/djl5Fa August 11, 2010

“...any restrictions on use should be strongly resisted and we endorse explicit encouragement of open sharing.” Schofield et al.: Post-publication sharing of data and tools. Nature 2009, 461:171.

Problem: Licensing that restricts data integration and (re)use

efficiently

Page 25: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Why Creative Commons CC0?

• interoperability: CC0 is human and machine-readable

• universality: CC0 is global and universal and widely recognized

• simplicity: no need for humans to make, and respond to, individual data requests – avoids “attribution stacking” with CC-BY licenses

Schaeffer P: Why does Dryad use CC0? http://blog.datadryad.org/2011/10/05/why-does-dryad-use-cc0/

http://creativecommons.org/publicdomain/zero/1.0/

Page 26: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Solution: Stakeholder engagement and community collaboration,

leadership

Page 27: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Public consultation on implementing CC0 for data published in open access journals: closes 10th November 2012http://blogs.biomedcentral.com/bmcblog/2012/09/10/put-the-open-in-open-data/

Hrynaszkiewicz I, Cockerill MJ: Open by default: a proposed copyright license and waiver agreement for open access research and data in peer-reviewed journals. BMC Research Notes 2012, 5:494 http://www.biomedcentral.com/1756-0500/5/494

Page 28: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Implementing CC0 in journals – how?

• Specify a date from which the new license would apply to data (CC-BY remains for other content)

• Only applies to data submitted to the journal• Some relatively minor technical and

operational implications• Cultural change may be the biggest challenge• Consultation is identifying common concerns,

FAQs, and further definitions and use cases for open data in journal publicationsHrynaszkiewicz I, Cockerill MJ: Open by default: a proposed copyright

license and waiver agreement for open access research and data in peer-reviewed journals. BMC Research Notes 2012, 5:494 http://www.biomedcentral.com/1756-0500/5/494

Page 29: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Problem: Lack of guidance, exemplars, incentives to make date

reusable• Sharing/publishing detailed human subjects

data, in the absence of explicit consent, can potentially infringe privacy (ethically and legally)

• Data are more (re)usable if published in community endorsed, standard formats

• Standards and appropriate guidance do not yet exist in all domains

• Few incentives to follow data standards

Page 30: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Solution #1: Work with journal editors to produce guidance where it

is needed

BMJ 2010;340:c181Co-published in:Trials 2010, 11:9

Page 31: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Solution #2: Publish exemplars

Page 32: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Solution #2: Publish exemplars

Page 33: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Solution #3: Incentivize, promote and share best practice and

standardshttp://www.biomedcentral.com/bmcresnotes/series/datasharing

http://biosharing.org/standards_view

Page 34: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Problem: Adding value to data of use to researchers, readers and

publishers• Text/data mining applications often are

research project or research specific and not always attractive to commercial publishing platforms and their customers

• Value to the non-expert can be limited• Makes business model/case challenging for

publishers

Page 35: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

http://www.biomedcentral.com/about/datamining/

Page 36: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

www.casesdatabase.com

Page 37: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

www.casesdatabase.com – coming soon

Page 38: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

www.casesdatabase.com – coming soon

Page 39: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

www.casesdatabase.com – coming soon

Page 40: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

The future...

Image adapted from Gillam et al: The Healthcare Singularity and the Age of Semantic Medicine. In The Fourth Paradigm (2009)

Page 41: BioMed Central’s open data initiatives Alliance for Permanent Access conference 7 th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed.

Questions?

Iain HrynaszkiewiczPublisher (Open Science), BioMed [email protected]

http://www.mendeley.com/profiles/iain-hrynaszkiewicz/

http://uk.linkedin.com/in/iainhz@iainh_z