Ngsp

52
Next Generation Scientific Publishing: Challenges and Directions European Bioinformatics Institute 21 June 2013 Tim Clark Massachusetts General Hospital MassGeneral Institute of Neurodegenerative Disease Harvard Medical School © 2013 Massachusetts General Hospital

description

 

Transcript of Ngsp

Page 1: Ngsp

Next Generation Scientific Publishing:

Challenges and Directions

European Bioinformatics Institute 21 June 2013

Tim ClarkMassachusetts General Hospital

MassGeneral Institute of Neurodegenerative DiseaseHarvard Medical School

© 2013 Massachusetts General Hospital

Page 2: Ngsp

Contents•Historical background•What is a scientific article?•Some problems in scientific communication•Next generation scientific publishing

(NGSP)•Taking NGSP forward•Conclusion

Page 3: Ngsp

Historical background

Page 4: Ngsp

Linear document format

1665 2012

Page 5: Ngsp

Origins of linear format

•Linear format originated pre-1665 with personal correspondence amongst experimentalists & mathematicians.

•1665 scientific paper format was transported to the Web, PDFs

•Lives in a complex ecosystem• Incomplete Web exploitation & transition•Tension between linear & object formats

Page 6: Ngsp

circle @ Oxford 1640-59

circle @ Gresham College, London 1645-60

Royal Society 1660-present

“Invisible Colleges”

Page 7: Ngsp

Scientific journals

Royal Society 1660-present

Académie des Sciences 1666-present

Jan 1665

Mar 1665

Page 8: Ngsp

Then and nowprintin

g

c. 1450

Scientific

Journal166

5

General PostOffice

1660

IBM S/360

Internet Web

1964

1980s

1991

Printcultur

e

Webcultur

e

Page 9: Ngsp
Page 10: Ngsp

Information

Technology

the Web

the Internet

Page 11: Ngsp

Incomplete transition to Web

•Scientific article information model is limited, because it is mostly narrative.

•Critical information should ideally be computationally extractable and re-mixable.

•Yet as humans we require narratives.•We need narratives + computable

objects.

Page 12: Ngsp

What is a scientific article?

Page 13: Ngsp

Definition: A scientific article is a defeasible argument for assertions, based on a detailed narrative of observations, which are reproducible in principle, supported by exhibited data and supporting methods, and contextualized with other relevant findings in the domain. It exists in a complex ecosystem of technologies, people and activities.

Page 14: Ngsp

Defeasible argument

•May be challenged and proven wrong.•May be “true” today but not tomorrow.•Inference to best explanation (IBE),

abductive reasoning (Peirce), etc.•Defeasible reasoning is a big topic in

AI.

Page 15: Ngsp

Exhibited data...

Philos Trans R Soc Lond 1(4):56 Brain. 2010 Nov;133(Pt 11) 3336-3348.

(at least, enough to be convincing!)

Page 16: Ngsp

...and reproducible methods

Boyle’s air pump, from New Experiments (1660) Illumina NGS system

Page 17: Ngsp

Scientific communications

ecosystem

Page 18: Ngsp

Interlocking systems of activity

Page 19: Ngsp

Some problems in scientific

communication c. 2013

Page 20: Ngsp

Some problems in the ecosystem• Intractable publication volumes [1]

• Invalid, distorted and copied citations [3,4,5]

• Growing volume of retractions [5,6]• 2/3 of retractions due to misconduct [7]• Research non-reproducibility [8]• Lack of transparency in publication

process [9]• Methods non-re-usability [10]• Flawed assessment metrics [11-12]

Page 21: Ngsp

Non-reproduciblity

11%

Begley CG and Ellis LM, Nature 2012, 483(7391):531-533

Page 22: Ngsp

Citation distortionadapted from supporting data, Greenberg SA, British Medical Journal 2009, 339:b2680

Page 23: Ngsp

The copied citation• Citation analysis of one sample of publications

(in ethnobotany) found that “the majority of citing texts do not consider the theoretical contributions made by the articles cited”.

• I.e., author of Work A makes statement, cites Work B, and then copies several references, unread, from Work B as well, assuming they are relevant too.

• Ramos et al. Scientometrics 2012, 92(3):711-719

Page 24: Ngsp

Not to mention...•Closed access publishing model•Walled garden systems,•Text mining & remixing prohibitions,

and•Insane rising costs imposed on

libraries.•Open access publishing model•Researcher cost burden unaccounted

for by funding agencies.

Page 25: Ngsp

Some efforts at coping

• Mandatory open access (US, UK, Universities)

• Data access: archiving and citation, institutional data policies, “data papers”, etc. (various)

• Methods: cataloging & annotation (NIF, publishers)

• Open annotation (W3C Community) & tools

• Velocity: Alzforum, StemBook, Open Wetware, blogs, webinars, Wikipedia coordination, etc.

• Velocity: preprint servers (ArXiv, DASH, PMC, etc.)

• Advocacy groups: FORCE11, DELSA, DORA, Amsterdam Manifesto, etc.

Page 26: Ngsp

Next Generation Scientific Publishing

Page 27: Ngsp

What does NextGen Scientific Publishing look

like?•There is transparency of all data & methods. •Big data + small data (the very long

tail).•Articles are deconstructable * text-

minable * remixable * computable.•Information moves quickly and is

verifiable. •Open annotation for narrative + objects.•There are no walled gardens: a service-

oriented open-access economy.

Page 28: Ngsp

Data re-usability• The main reason to exhibit data is not

necessarily to reuse it...it is (minimally) to prove that1. you have it and are willing to show it,2. it is reasonable to think that you derived

it as you say you did, and you openly share these methods.

• Data that is re-usable is special:• Re-usable data is itself a research method

with its own special requirements.• See: Data Papers.

Page 29: Ngsp

Data papers•Data should be surfaced in a re-usable

way.•Incentivize the extra effort required.•Concept being developed by a few

publishers with differing implementation ideas.

•Questions: what is reusability? at what level?

Page 30: Ngsp

Our Data Papers requirements

•Only inherently reusable data is published as a Data Paper•Normalize identifiers

•Reverse normal “ratio” of text:data•Amsterdam data citation principles

•All data is searchable w/ or w/o the paper•Global metadata catalog in stable

archive

Page 31: Ngsp
Page 32: Ngsp
Page 33: Ngsp

Methods re-usability

•Open methods are the basis of science.•“Standing on the shoulders of giants” = •reusing maths, software, instruments,

reagents, models, protocols, etc. •But method citations can be very obscure;•you cannot reuse a secret. •See: alchemy, necromancy, divination.

Page 34: Ngsp
Page 35: Ngsp

Computational semantics

•Entity-extraction: NIF, Utopia, etc. •Topic-based: Threads •Statement-based: SWAN,

nanopublications•Argument-based:

micropublications

Page 36: Ngsp
Page 37: Ngsp

Open annotation• Open model• Annotate any web document• Transferable, selectively sharable• Highlights, comments, semantics, video• Entities, topics, statements, arguments• W3C Open Annotation Community• http://www.w3.org/community/

openannotation/

Page 38: Ngsp

Open annotation model

Page 39: Ngsp

Complex annotation

Page 40: Ngsp

Discussion as annotation

Page 41: Ngsp

Annotation tools

Page 42: Ngsp
Page 43: Ngsp

Creating digital abstracts in Domeo

Page 44: Ngsp

Digital article summary

Page 45: Ngsp

Digital article summary{:MP3 rdf:type mp:Micropublication; mp:name "MP(a3)"; mp:description "Digital summary of Spillman et al. 2010"; pav:authoredBy [ a foaf:Person ; foaf:name "Tim Clark" ]; pav:createdBy [ a foaf:Person ; foaf:name "Tim Clark" ]; pav:createdOn "2013-03-06T09:49:12-05:00"^^xsd:dateTime ; mp:argues :C3; mp:supportedBy <info:doi:10.1371/journal.pone.0009979> .} .

:MP3 = {:S1 rdf:type mp:Statement; mp:hasContent "Rapamycin [is] an inhibitor of the mTOR pathway." ; mp:supportedBy <info:doi/10.1038/nature08221> .:S2 rdf:type mp:Statement; mp:hasContent "PDAPP mice accumulate soluble and deposited Aβ and develop AD-like synaptic deficits as well as cognitive impairment and hippocampal atrophy." ; mp:supportedBy <info:doi/10.1073/pnas.96.6.3228> .

:S3 rdf:type mp:Statement; mp:hasContent "Rapamycin-fed transgenic PDAPP mice showed improved learning (Figure 1a) and memory (Figure 1b). We observed significant deficits in learning and memory in control-fed transgenic PDAPP animals." ; mp:supportedBy <http://www.jneurosci.org/content/20/11/4050> .

:M1 rdf:type mp:Procedure; mp:hasName "Rapamycin-supplemented mouse diet protocol" ; mp:hasContent "We fed a rapamycin-supplemented diet... or control chow to groups of PDAPP mice and littermate non-transgenic controls for 13 weeks. At the end of treatment (7 mo), learning and memory were tested using the Morris water maze." .

:M2 rdf:type mp:Material; mp:hasName "PDAPP J20"; mp:hasDescription "Lennart Mucke's PDAPP J20 transgenic mice, as obtained from JAX, stock#006293" ; mp:describedBy: <http://jaxmice.jax.org/strain/006293.html> .

:D1 rdf:type mp:Data; pav:retrievedFrom <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0009979#pone-0009979-g001>; mp:supportedBy :M1, :M2 .

:C3 rdf:type mp:Claim; mp:hasContent "Inhibition of mTOR by rapamycin can slow or block AD progression in a transgenic mouse model of the disease." ; mp:supportedBy :S1, :S2, :S3, :D1.} .

Page 46: Ngsp

Mixing nano, micro, entities, topics

Page 47: Ngsp

Navigable citation networks

Figure from Greenberg SA, British Medical Journal 2009, 339:b2680

Page 48: Ngsp

Taking NGSP forward

Page 49: Ngsp

The Future of Research Communications and eScholarship • Open community of scholars, librarians,

archivists, publishers and research funders.• Goal is to facilitate more rapid change &

improvement in scholarly communications through effective use of information technologies.

• Founded 2011 at a workshop held at Leibniz Zentrum für Informatik, Schloss Dagstuhl, DE.

• Check it out & join online at http://force11.org

Page 50: Ngsp

Summary•Incomplete transition of scientific

publishing to the Web•Big problems with the current system•NextGen Scientific Publishing will be: •open, transparent, remixable, fast•and we will annotate it on the Web.

Page 51: Ngsp

Acknowledgements• Lab: Paolo Ciccarese, Stephane Corlosquet, Sudeshna Das,

Patti Davis, Emily Merrill, Marco Ocana

• Collaborators: Brad Allen, Neil Andrews, Anita Bandrowski, Phil Bourne, Suzanne Brewerton, Monika Byrne, Merce Crosas, Anita De Waard, Lisa Girard, Carole Goble, Tudor Grosza, Paul Groth, Keith Gutfreund, Hamed Hassanzadeh, Ivan Herman, Brad Hyman, Adrian Ivinson, Derek Marren, Maryann Martone, Pat McCaffery, Steve Pettifer, Brock Reeve, Rob Sanderson, Holly Schmidt, Herbert Van de Sompel and Thomas Wilkin; and our colleagues at the Mass. Alzheimer Disease Research Center

• Funding: Eli Lilly, Elsevier, Harvard Neuro Discovery Center, Harvard Stem Cell Institute, EMD Serono, NIH (NIA, NIDA), and two anonymous foundations.

• Very special thanks to: Carole Goble & Brad Hyman

Page 52: Ngsp

References1. Hunter L, Cohen KB: Biomedical language processing: what's beyond PubMed?

Molecular cell 2006, 21(5):589-594.2. Greenberg SA: How citation distortions create unfounded authority: analysis

of a citation network. British Medical Journal 2009, 339:b2680.3. Greenberg SA: Understanding belief using citation networks. Journal of

Evaluation in Clinical Practice 2011, 17(2):389-393.4. Ramos, M., J. Melo, and U. Albuquerque, Citation behavior in popular scientific

papers: what is behind obscure citations? The case of ethnobotany. Scientometrics, 2012. 92(3): p. 711-719.

5. Lawless J: The bad science scandal: how fact-fabrication is damaging UK's global name for research. In: The Independent. 2013.

6. Noorden RV: Science publishing: The trouble with retractions. Nature 2011, 478:26-28.

7. Fang FC, et al: Misconduct accounts for the majority of retracted scientific publications. Proceedings of the National Academy of Sciences 2012, 109(42):17028-17033.

8. Begley CG, Ellis LM: Drug development: Raise standards for preclinical cancer research. Nature 2012, 483(7391):531-533.

9. Marcus A, Oransky I: Bring On the Transparency Index. In: The Scientist. Midland, Ontario, CA: LabX Media Group; 2012.

10.Bandrowski AE, et al: A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework. Database 2012: bas005.

11.Randy S, Mark P: Reforming research assessment. eLife 2013, 2.12.Alberts B: Impact Factor Distortions. Science 2013, 340(6134):787.