The Future of Research (Science and Technology)

62
The Future of Research (Science and Technology) Carole Goble carole.goble@manchester .ac.uk University of Manchester, UK OMII-UK British Library Board Awayday 23rd September 2008

description

Talk by Carole Goble at the British Library Board Awayday 23rd September 2008

Transcript of The Future of Research (Science and Technology)

Page 1: The Future of Research (Science and Technology)

The Future of Research(Science and Technology)

Carole [email protected]

University of Manchester, UKOMII-UK

British Library Board Awayday 23rd September 2008

Page 2: The Future of Research (Science and Technology)
Page 3: The Future of Research (Science and Technology)

Acknowledgements• David De Roure• Michael McLennan• Noshir Contractor• Christine Borgman• Tony Linde• Cameron Neylon• Duncan Hull• Geoffery Fox• Malcolm Atkinson• Jean Claude Bradley• Anne Trefethen• Graham Cameron• Phil Bourne• Bertram Ludaescher• Tim Wess• Roger Barga• Paul Fisher

• Jane Hunter• Jeremy Frey• Tony Hey• Jim Hendler• Bob Jones• Liz Lyon• Juliana Friere• Domenico Talia• Michael Nielsen• Marco Roos• Doug Kell• Anthony Finkelstein• Peter Murray-Rust• Robert Tansley• Michael Wilson• Rob Tansley

Page 4: The Future of Research (Science and Technology)

http://www.genomics.liv.ac.uk/tryps/trypsindex.html

Trypanosomiasis in Africa

An

dy Brass

Steve

Ke

mp

Pa

ul Fishe

r

Page 5: The Future of Research (Science and Technology)

Hypothesis driven

researchNow we add

Data driven Simulation / prediction driven

Automated experimentsOpen “as you go” communicationTeam research

New types of research output

Page 6: The Future of Research (Science and Technology)

Data intensive Science

Data from observationsData from predictions through

simulations and computer modelsIndustrialised science

Page 7: The Future of Research (Science and Technology)

1070 databases, Nucleic Acids Research Jan 2008

(96 in Jan 2001)

• Proteomics• Genomics• Transcriptomics• Protein sequence prediction• Phenotypic studies• Phylogeny• Sequence analysis• Protein Structure prediction• Protein-protein interaction• Metabolomics• Model organism collections• Systems Biology• Epidemiology ….

Page 8: The Future of Research (Science and Technology)

Growth of data, regardless of discipline• Raw, predicted, derived,

combined, aggregated• Curated to be annotated

and enriched manually or automagically

• Interlinked

Page 9: The Future of Research (Science and Technology)

Large Hadron Collidor

[Norbert Neumeister]

Page 10: The Future of Research (Science and Technology)

Why Data intensive Science?• New high throughput experimental

methods (microarrays, combinatorial chemistry, sensor networks, earth observation, sky surveys, heroic experiments ….)

• Increasing scale, diversity and complexity of digital material processed separately and in combination.

• Commons based production• über accessibility• Heterogeneous, Autonomous and

Volatile

Page 11: The Future of Research (Science and Technology)

Why Data Intensive Science?• Small data.• Spreadsheets.• Personal lab books.

• Privately held.• Increasingly publicly shared. • Through the web• Millions of them.• Born digital.

Page 12: The Future of Research (Science and Technology)

Raw and Interpretive Data• What is fact?• Revision is constantly

occurring. Even primary data can be revised.

• Science is interpretation• Much of scientific data is

secondary datasets of interpretative, information.

Primary Data

Primary Data

PrimaryData

SecondaryCurated

Data

ProcessedData

SecondaryCurated

Data

Secondary Data

Integrateddata

Processing details

Capturedetails

Update

revise

Update

revise reviserevise

Page 13: The Future of Research (Science and Technology)
Page 14: The Future of Research (Science and Technology)

Data collection management• Large scale community-wide

global data centres – EBI, DDBJ, NCBI, NCI, CERN

• Institutional data centres and labs and individuals – precarious and uncertain.

• Role for data stewardship and preservation on behalf of the community

• Cloud data• research-

[email protected]

Page 15: The Future of Research (Science and Technology)

Not the end of theory!

• The prevalence of data and the rise of data intensive science and data driven science adds to the pool of hypothesis driven and theory driven research.

• It doesn’t replace it.

Data

Theory

Prediction

Hypothesis

Page 16: The Future of Research (Science and Technology)

200

Genotype Phenotype

Metabolic pathways

Literature

[Paul Fisher]

Page 17: The Future of Research (Science and Technology)

• Large scale data collection from multiple sites throughout the world.

• The team’s own data and personal data sets.

• Analytical pipelines and automated workflows with intelligent intervention.

• Literature auto found and mined• If manual: its logged• If automated: faster, systematic,

repeatable, reduced bias, auto-logged, explicit, shareable

• Born digital

Page 18: The Future of Research (Science and Technology)

http://www.myexperiment.org/workflows/172

Automated processing of library content

• PubMed contains ~17,787,763 articles to date

• Manually searching is tedious and frustrating

• Can be hard finding links between data and articles

• Conclusion? Machines will be reading the library.

• Link between cholesterol , patient trauma and parasite resistance in cattle revealed.

Paul Fisher

Page 19: The Future of Research (Science and Technology)

Data driven research• Was: Hypothesis to

experiment to analyse the data

• Now: start with the data.

There is so much data that is accessible.

Ideas

Data

Synthesis / Induction

Hypothesis Analysis / Deduction

[Kell and Oliver]

Page 20: The Future of Research (Science and Technology)

Published. Eventually.

Page 21: The Future of Research (Science and Technology)

Methods

Lab Books

Preprints

DataVideo

Blogs

Podcasts

Codes

Reproducible, or rather “fully supported”Transparent science, Composite research components

Algorithms

Models

Presentations

OntologiesIntermediateResults

Related Articles

Comments& Reviews

Plans

Models

Page 22: The Future of Research (Science and Technology)

Methods

Lab Books

Preprints

DataVideo

Blogs

Podcasts

Codes

Reproducible, or rather “fully supported”Transparent science, Composite research components

Algorithms

Models

Presentations

OntologiesIntermediateResults

Related Articles

Comments& Reviews

Page 23: The Future of Research (Science and Technology)

Reproducible Sciencemeans context, quality, trust

means easy access to the sources

Page 24: The Future of Research (Science and Technology)

Methods are Scientific

commodities• Scripts, workflows,

simulations, experimental plans statistical models, ...

• Repeatable, reproducible, comparable and reusable research.

• Sharing to propagates expertise and build reputation.

,

http://myexperiment.org

Page 25: The Future of Research (Science and Technology)

120 Simulation tools

1,200 Seminars, podcasts, etc.

77,000 Users worldwide

550 Contributors

Developed by the NSF Network for Computational NanotechnologyOnline since October 2002[Michael McLennan]

http://nanoHUB.org

Page 26: The Future of Research (Science and Technology)

[Jean-Claude Bradley]

http://usefulchem.wikispaces.com/

Page 27: The Future of Research (Science and Technology)

1. A link brings up figures from the paper

0. Full text of PLoS papers stored in a database

2. Clicking the paper figure retrievesdata from the PDB which is

analyzed

3. A composite view ofjournal and database

content results

BioLit

Seamless integration between data and publications

From the Public Library of Science people.

4. The composite view haslinks to pertinent blocks

of literature text and back to the PDB

1.

2.

3.

4.

The Knowledge and Data Cycle

http://biolit.ucsd.edu

[Phil Bourne]

Page 28: The Future of Research (Science and Technology)

ICTP Trieste, December 10, 2007

[Phil Bourne]

Page 29: The Future of Research (Science and Technology)

[Phil Bourne]

Page 30: The Future of Research (Science and Technology)

The reproducible and interactive research documents*

Mixed stewardship research documents

The recombinant, compound research documents

The virtual research document

Multi-versioned, dynamic research document

*Papers, Books, whatever.

2020

Page 31: The Future of Research (Science and Technology)

Data, image, model, process, workflow, podcast, slideset*

Finding, citation, peer review, preservation, identity, versioning, security, privacy, copyright

management, format authority

Authority on metadata descriptions Propagation of descriptions

* Insert new research commodity type here

2020

Page 32: The Future of Research (Science and Technology)

What does this mean for library services?

Seamless interlinking of data, literature and other research commodities

Integrated search across external resourcesSelective quality curation

Hell is other people’s (lack of) semantic metadata

2020

Page 33: The Future of Research (Science and Technology)

Collaboration

Page 34: The Future of Research (Science and Technology)

(Virtual) Team Research• Research increasingly team-based• Teams produce more highly cited research• Team science is increasingly composed of co-

authors located at different universities. • “virtual communities of scholars” produce

higher impact work than comparable co-located teams or solo scientists.

• True for all fields and team sizes.

Studies of 19.9 million research articles over 5 decades as recorded in the Web of Science database, and an additional 2.1 million patent records from 1975-2005.Using the Web of Science database to analyze the collaboration arrangements of over 4,000,000 papers over a 30 year period Sources: Wuchty, Jones, and Uzzi

Noshir Contractor

Page 36: The Future of Research (Science and Technology)

• Personal: log books and spreadsheets, file stores

• Group: shared data, methods, protocols, information, failures, insights, observations, know-how

• Born digital but not very digitally processable.

[Helen Hulme]

Page 37: The Future of Research (Science and Technology)

Virtual Research Environments 1

CollaborationEnvironments

Science Gateways to data and computing grids

Multi-authored document preparation

Page 38: The Future of Research (Science and Technology)

Multi-disciplinary

Proteomics

Classical Genetics / QTL studies

Animal Experts

Transcriptomics

Parasite Experts

Statistical modelling

Text Analysis

Image analysis

Health Epidemiology

Page 39: The Future of Research (Science and Technology)

Crossing boundariesInterdisciplinary Support

• Expert finding • Complementary experts swarming around a problem• Transferring data, methods and know-how from one

discipline to another e.g. astronomy image analysis applied to cancer tissue

microarrays

• How do you find relevant material that uses a different jargon in a different discipline organised to only suit its experts?

• Overlay and virtual journals are few and far between – e.g. the Virtual Journal of Quantum Information.

• Where is the overlay library?

Page 40: The Future of Research (Science and Technology)

Virtual Research Environments 2

Social Professional NetworkingExpert finding

Page 41: The Future of Research (Science and Technology)

The BL’s Research Information Centre

Page 42: The Future of Research (Science and Technology)

Open Science

Collective IntelligenceResearcher participation

Commons based productionSharing

Accelerated disseminationEmbedded in the researchers environment

and work practices

Page 43: The Future of Research (Science and Technology)

“Long Tail” Science. “Hypo” Science• Increased scale and diversity of

scientific participation – The small research team.– Niche experts.– The citizen.

• Easier to work with, and get hold of, digital output.– Better tools.

• Scaling effects of peer review, social working and community curation.

Page 44: The Future of Research (Science and Technology)

Open content, services and software.

Social tools for the social process of science.

Page 45: The Future of Research (Science and Technology)

http://www.wikipathways.org/

Page 46: The Future of Research (Science and Technology)

[Duncan Hull]

Page 47: The Future of Research (Science and Technology)

Growth of open access scientistsdigital natives, always online, hybrids

catalysts for change

[Phil Bourne]

Page 48: The Future of Research (Science and Technology)

Cameron Neylon’s

chemistry notebook

Page 49: The Future of Research (Science and Technology)

Paul

Jo

Sharing reusable methods

Page 50: The Future of Research (Science and Technology)

Competitive advantage.Academic vanity.

Reputation.Adoption.

Scrutiny.Being scooped.

Misinterpretation.

New Reward Schemes

Rew

ards

Fear

s

Page 51: The Future of Research (Science and Technology)

What is the role of the library?Trusted curator

Trusted data managerQuality arbiter

Knowledge disseminatorFormat authority

Add value content providerMetadata / controlled vocabulary provider

Add value service provider

2020

Page 52: The Future of Research (Science and Technology)

Services

Embedding into the Researchers WorkflowThe Cloud

Page 53: The Future of Research (Science and Technology)

Personal Scientist-centric tooling• We don’t come to

the library, it comes to us.

• We don’t use just one library or one source.

• We don’t use just one tool!

• Library services embedded in our toolkits, workbenches, browsers, authoring tools.

Zotero Firefox plug-in

Page 54: The Future of Research (Science and Technology)

Hypothesis Construction from the Literature

Marco Roos, Scott Marshall, University of Amsterdam

Page 55: The Future of Research (Science and Technology)

http://info.scopus.com/scsearchapi/geoCitations/index.html

Page 56: The Future of Research (Science and Technology)

What does this mean for library services? With not For

Opening up to researcher’s tools and research environments for discovery, management

and curation of research commoditiesEnabling and encouraging new services and

new content to add new valueRemove obstacles to interoperate and share

Collaborate, don’t control

Page 57: The Future of Research (Science and Technology)

Give researchers tools and access to content – They control their

own software/data apparatus and their experiments.

– They are creative

Pervasive devices and the mixing up of virtual and real worlds

Page 58: The Future of Research (Science and Technology)

Prior to leaving home Paul, a Manchester graduate student, syncs his IPhone with the latest papers, delivered overnight by the library via a news syndication feed. On the bus he reviews the stream, selecting a paper close to his interest in HIV-1 proteases.

The data shows apparent anomalies with his own work, and the method, an automated script, looks suspect.

Being on-line he notices that a colleague in Madrid has also discovered the same paper through a blog discussion and they Instant Message, annotating the results together.

By the time the bus stops he has recomputed the results, proven the anomaly, made a rebuttal in the form of a pubcast to the Journal Editor, sent it to the journal and annotated the article with a comment and the pubcast.

Based on an original idea by Phil Bourne

Page 59: The Future of Research (Science and Technology)

http://research.microsoft.com/towards2020science/

Questions?

Page 60: The Future of Research (Science and Technology)

Extras

Page 61: The Future of Research (Science and Technology)

Other References• Duncan Hull, Steve Pettifer, Doug Kell, Defrosting the digital library: bibliographic tools for the next

generation web to appear in PLoS Computational Biology• Michael Nielsen, The Future of Science http://michaelnielsen.org/blog/?p=448• Philip Bourne Will a biological database be different from a biological journal, PLOS Computational

Biology 1(3) www.ploscompbiol.org• James A. Evans Electronic Publication and the Narrowing of Science and Scholarship Science 18 July

2008: Vol. 321. no. 5887, pp. 395 - 399http://www.sciencemag.org/cgi/content/abstract/321/5887/395

• James Hendler Reinventing Academic Publishing, Editorials for IEEE Intelligent Systems http://www.mindswap.org/blog/2007/08/14/reinventing-academic-publishing-%E2%80%93-part-i/ http://www.mindswap.org/blog/2007/11/23/reinventing-academic-publishing-%E2%80%93-part-ii/ http://www.mindswap.org/blog/2008/01/03/reinventing-academic-publishing-%E2%80%93-part-iii/

• Cameron’s suggested open science blogs• http://www.earlham.edu/~peters/fos/2008/07/online-researchers-have-access-to-more.html • http://scienceblogs.com/clock/2008/07/electronic_publication_and_the.php• http://www.sennoma.net/main/archives/2008/07/an_open_access_partisans_view.php • http://openwetware.org/wiki/Science_2.0/Brainstorming

• http://sciencex2.org/en/user/113/track

Page 62: The Future of Research (Science and Technology)

http://research.microsoft.com/towards2020science/