Science20brussels osimo april2013
-
Upload
osimod -
Category
Technology
-
view
1.173 -
download
0
Transcript of Science20brussels osimo april2013
Science 2.0: discussing the best available evidenceDavid Osimo, Katarzyna Szkuta
Tech4i2 limited for DG RTD
23rd January 2013
1
Three stories• Galaxyzoo: Galaxyzoo let users classify galaxies – 150K
volunteers had already classified more than 10 million images of galaxies. “as accurate as that done by astronomers“. 25+ scientific articles by Galaxy Zoo project (from 2009)
• Synaptic Leap: to find an alternative drug treatment for schistosomiasis with fewer side effect. All data and experiments published on Electronic Lab Notebook; social network activated. About 30 people, half from industry, participated. Identified new process and resolving agent.
• Excel-gate: Reinhart & Rogoff, 2010: “as countries see debt/GDP going above 90%, growth slows dramatically”. Paper was used as main theoretical justification for austerity. 2013: after getting the original excel file, Herndon et al. discover coding error + data gaps + unconventional weighting. 2
Science 2.0: much more than Open Access
Open access
3
Open access
Scientific blogs
Collaborative bibliographies
Alternative Reputation
systems
Citizens science Open
code
Open labbooks / wflows
Open annotati
on
Open data
Pre-print
Data-intensive
4
Open access
Scientific blogs
Collaborative bibliographies
Alternative Reputation
systems
Citizens science Open
code
Open labbooks / wflows
Open annotati
on
Open data
Pre-print
Data-intensive
5
Datadryad.org
Myexperiment.org
Runmycode.org
ArXiv
Sci-starter.com
Openannotation.orgAltmetric.com
Mendeley.comResearchgate.com
Figshare.com
Roar.eprints.org
An emerging ecosystem of services
and standards
Growing at different speedTrend Status DataPre-print Mature 694.000 articles in arXiv
Open access Fast growing Exponential growth of OA journals. 8/10% of scientific output is OA
Data intensive Fast growing 52% of science authors deals with datasets larger than 1Gb
Citizen scientist Medium growth 650K Zoouniverse users500 similar projects on SciStarter
Open data Medium growth 20% scientists share data15% journals require data sharing
Reference sharing
Medium growth 2 Million users of Mendeley reference-sharing tools
Open code Sketchy growth 21% of JASA articles make code available7% journals require code
Open Notebook Sketchy growth Isolated projects
Natural sciences outrank social science across all trends 6
> 50 My Papers2 M scientists
2 M papers/year
Where The Data Goes Now:
Majority of data(90%?) is stored
on local hard drivesDryad:
7,631 filesDataverse:
0.6 M
Datacite: 1.5 M
Some data (8%?) stored in large,
generic data repositories
MiRB: 25k
PetDB: 1,5 k
TAIR: 72,1 k
PDB: 88,3 k
SedDB: 0.6 k
A small portion of data (1-2%?) stored in small,
topic-focuseddata repositories
Source: Anita De Waard 2013http://www.slideshare.net/anitawaard/making-data-sharing-happen
Deep implications• New scientific outputs and players: nanopublications,
data and code; vertical disintegration of the value chain• Greater role for inductive methods: everything
becomes a Genome Project• Scaling serendipity: Big linked data, collaborative
annotation, social networking and knowledge mining detect unexpected correlations on a massive scale
• Better science: reproducible and truly falsifiable research findings; earlier uncovering of mistakes
• More productive science: reusing data and products, crowdsourcing work, reduce time-to-publication
8
Europe can lead• European scientific publishers are leading on
experimentation with new kind of open and data-intensive services
E.g. “Article of the Future project, AppsForScience competition (Elsevier) Thieme ( a small German publisher) data integration
• Home to world class science 2.0 startups: Mendeley and ResearchGate are global players in social networking
for scientists, Digital Science that recently acquired FigShareMendeley used by about 2 million researchers, covering 65 million
documents vs 49 by commercial databases by Thomson Reuters. Elsevier just bought Mendeley for 50 M Euros.
• Home to top citizen science initiatives (GalaxyZoo was launched in Oxford, ExCiteS group and Citizen
Cyberscience Centre)• Funding agencies are active in new mandates on openness
(e.g. Wellcome Trust, FP7) – open access, open data9
BUT the institutional framework is a bottleneck
• Researchers are reluctant to share data and code [1], and to provide open peer review
• Current career mechanisms are “publish or perish”. No reward for sharing.
• Publishing data and code requires additional work
• Publishing intermediate products can actually hinder publication/patenting: sharing is difficult in patent-intensive domains
• Funding mechanisms are too rigid, roadmap-based and evaluated on articles and patents
[1] Wicherts et al., 2011 ; Research Information Network, 2008 ; Campbell , 200210
Institutional failure and the case for public intervention
BENEFITS Individual Researchers
Institutions Business Publishers Societal benefits
Open access ++ + + -- ++
Open data -- -- -- + ++
Open code -- -- -- = ++
Citizen science
+ = + = +
Alternative reputation systems
+ - + - +
Data-intensive
+ + + + ++
Social media + = = = + 11
• Contradictions emerge between individuals’ and societal benefits • Research funders (and publishers) have high leverage on scientific institutions
How to grasp this opportunity?
• It’s not about adding a science 2.0 top-down roadmap-based initiative in existing programmes
• It’s not about simply letting a thousand flowers flourish bottom-up
• It’s about nudging the right institutional re-arrangement (Perez) and right system of incentives for the scientific value chain
12
Towards research policy 2.0Recommendation Inspiring example
Adopt more flexible reputation mechanisms for scientists
From 2013, NSF requires PI to list research “products” rather than “publications”
Encourage sharing by regulation Wellcome Trust mandatory data planCover the costs of sharing intermediate output such as data
Gold access publication costs to be covered in Horizon2020
Develop Innovative infrastructure, tools , methods and standards
Alternative reputation system, Openannotation, Datadryad
Make IPR more flexible Innocentive.com, Peertopatent.comIncrease open-ended funding system
FET open, UK Arts council, Inducement prizes
Collect better evidence Dedicated data-gathering exercise (a’ la PEW) 13
Thanks
• Continue the discussion at science20study.wordpress.com
• Collect evidence and cases at groups.diigo.com/group/science-20
• Contact [email protected] ; [email protected] ; @osimod
14
Backup
15
Emerging impact: a) more productive science
– using the same data sets for multiple research. 50% of Hubble papers came from data re-users [1].
– Crowdsourcing work: “thousands recruited in months versus years and billions of data points per person, potential novel discovery in the patterns of large data sets, and the possibility of near real-time testing and application of new medical findings.” [2].
– “cut down the time it takes to go from lab to medicine by 10 15 ‐years with Open Notebook Science”. “because of poor literature analysis tools 20-25% of the work done in his synthetic chemistry lab is unnecessary duplication or could be predicted to fail” [3]
– Faster circulaton of high-quality ideas: 70% of publications discussed in blogs are from high-impact journals
– Open research solved one-third of a sample of problems that large and well-known R & D-intensive firms had been unsuccessful in solving internally [4]
[1] http://archive.stsci.edu/hst/bibliography/pubstat.html[2] http://www.jmir.org/2012/2/e46/[3] http://science.okfn.org/category/pubs/ [4] Lakhani et al., 2007) 16
b) Better science• Greater falsifiability (Popper): move towards reproducible
science thanks to publishing data + code in addition to article,
• Rapidly uncover mistaken findings (Climategate 2009 or microarray-based clinical trials underway at Duke University)
• Data sharing is associated with greater robustness of findings [1]. Sharing data and notes applies to failures, as well as successes
• Especially important for computational science “Computational science cannot be elevated to a third branch of the scientific method until it generates routinely verifiable knowledge” [2]
[1] Wicherts et al., 2011[2] Donoho, Stodden, et al. 2009 17
c) Greater role of inductive methods
• “The end of theory”: “Here’s the evidence, now what is the hypothesis?”
• All science becomes computational. 38% of scientists spend more than 1/5 of their time developing software (Merali, 2010).
• Greater availability of data collection and datasets increases the utility of inductive methods. Genome project as new paradigm
18
d) Scaling serendipity• From penicillin to theory of relativity, serendipity has
always been a core component of science• Big linked data, collaborative annotation and knowledge
mining of OA articles allow to detect unexpected correlation on a massive scale. Mendeley manages the bibliographies of 2 Million scientists and uses them for suggest further reading.
• Emerging evidence that for scholars recommendation is more important than search for references. Social networking and recommendation systems allow scientists to “stumble upon” new evidence
• Open research successful solvers solved problems at the boundary or outside of their fields of expertise [1]
[1] Lakhani et al., 2007 19
e) New outputs and players #beyondthepdf
• Nanopublications, datasets, code• Integration of data and code with articles• Reproducible papers and books
20
Emerging policies• Funders and publishers have high leverage on researchers• Increasing push towards Open Access from funders• Journals and funding agencies increasingly require data
submission and data management plans• From 14 January 2013, NSF grants forms requires PI to list
research “products” rather than “publications”• Alternative metrics emerge such as altmetrics and
download statistics
21
Towards research policy 2.0
22
Features • Simplified proposals• Rewarding solutions, not proposal• Multi-stage• Open priorities• Flexible and open ended (allowing for • serendipity)• Peer-selection Reputation-based • (funding not to the proposal but to the person) • Multidisciplinarity by design• Flexible IPR• Short project time• Accepting failure • transparency (open monitoring)• Based on social network analysis
Examples• Inducement prizes e.g.
http://www.heritagehealthprize.com• Seed Capital
http://www.ibbt.be/en/istart/our-istart• toolbox/iventure)• ERC http://erc.europa.eu• SBIR http://www.sbir.gov• FET OPEN • http://cordis.europa.eu/fp7/ict/fetopen/
home_en.html• SME • htt://cordis.europa.eu/fetch?
CALLER=PROGLINK_PARTNERS&AC• TION=D&DOC=1&CAT=PROG&QUERY=012e7c32
4da6:39b1:49a0• 957c&RCN=862• IBBT www.ibbt.be• Arts council
http://www.artscouncil.org.uk/funding/grants
• arts• Banca dell’innovazione / Innovation Bank• http://italianvalley.wired.it/news/altri/
perche-ci• serve-una-banca-nazionale-dell-
innovazione.html