The Future of Research (Science and Technology)
-
Upload
duncan-hull -
Category
Technology
-
view
106 -
download
2
description
Transcript of The Future of Research (Science and Technology)
The Future of Research(Science and Technology)
Carole [email protected]
University of Manchester, UKOMII-UK
British Library Board Awayday 23rd September 2008
Acknowledgements• David De Roure• Michael McLennan• Noshir Contractor• Christine Borgman• Tony Linde• Cameron Neylon• Duncan Hull• Geoffery Fox• Malcolm Atkinson• Jean Claude Bradley• Anne Trefethen• Graham Cameron• Phil Bourne• Bertram Ludaescher• Tim Wess• Roger Barga• Paul Fisher
• Jane Hunter• Jeremy Frey• Tony Hey• Jim Hendler• Bob Jones• Liz Lyon• Juliana Friere• Domenico Talia• Michael Nielsen• Marco Roos• Doug Kell• Anthony Finkelstein• Peter Murray-Rust• Robert Tansley• Michael Wilson• Rob Tansley
http://www.genomics.liv.ac.uk/tryps/trypsindex.html
Trypanosomiasis in Africa
An
dy Brass
Steve
Ke
mp
Pa
ul Fishe
r
Hypothesis driven
researchNow we add
Data driven Simulation / prediction driven
Automated experimentsOpen “as you go” communicationTeam research
New types of research output
Data intensive Science
Data from observationsData from predictions through
simulations and computer modelsIndustrialised science
1070 databases, Nucleic Acids Research Jan 2008
(96 in Jan 2001)
• Proteomics• Genomics• Transcriptomics• Protein sequence prediction• Phenotypic studies• Phylogeny• Sequence analysis• Protein Structure prediction• Protein-protein interaction• Metabolomics• Model organism collections• Systems Biology• Epidemiology ….
Growth of data, regardless of discipline• Raw, predicted, derived,
combined, aggregated• Curated to be annotated
and enriched manually or automagically
• Interlinked
Large Hadron Collidor
[Norbert Neumeister]
Why Data intensive Science?• New high throughput experimental
methods (microarrays, combinatorial chemistry, sensor networks, earth observation, sky surveys, heroic experiments ….)
• Increasing scale, diversity and complexity of digital material processed separately and in combination.
• Commons based production• über accessibility• Heterogeneous, Autonomous and
Volatile
Why Data Intensive Science?• Small data.• Spreadsheets.• Personal lab books.
• Privately held.• Increasingly publicly shared. • Through the web• Millions of them.• Born digital.
Raw and Interpretive Data• What is fact?• Revision is constantly
occurring. Even primary data can be revised.
• Science is interpretation• Much of scientific data is
secondary datasets of interpretative, information.
Primary Data
Primary Data
PrimaryData
SecondaryCurated
Data
ProcessedData
SecondaryCurated
Data
Secondary Data
Integrateddata
Processing details
Capturedetails
Update
revise
Update
revise reviserevise
Data collection management• Large scale community-wide
global data centres – EBI, DDBJ, NCBI, NCI, CERN
• Institutional data centres and labs and individuals – precarious and uncertain.
• Role for data stewardship and preservation on behalf of the community
• Cloud data• research-
Not the end of theory!
• The prevalence of data and the rise of data intensive science and data driven science adds to the pool of hypothesis driven and theory driven research.
• It doesn’t replace it.
Data
Theory
Prediction
Hypothesis
200
Genotype Phenotype
Metabolic pathways
Literature
[Paul Fisher]
• Large scale data collection from multiple sites throughout the world.
• The team’s own data and personal data sets.
• Analytical pipelines and automated workflows with intelligent intervention.
• Literature auto found and mined• If manual: its logged• If automated: faster, systematic,
repeatable, reduced bias, auto-logged, explicit, shareable
• Born digital
http://www.myexperiment.org/workflows/172
Automated processing of library content
• PubMed contains ~17,787,763 articles to date
• Manually searching is tedious and frustrating
• Can be hard finding links between data and articles
• Conclusion? Machines will be reading the library.
• Link between cholesterol , patient trauma and parasite resistance in cattle revealed.
Paul Fisher
Data driven research• Was: Hypothesis to
experiment to analyse the data
• Now: start with the data.
There is so much data that is accessible.
Ideas
Data
Synthesis / Induction
Hypothesis Analysis / Deduction
[Kell and Oliver]
Published. Eventually.
Methods
Lab Books
Preprints
DataVideo
Blogs
Podcasts
Codes
Reproducible, or rather “fully supported”Transparent science, Composite research components
Algorithms
Models
Presentations
OntologiesIntermediateResults
Related Articles
Comments& Reviews
Plans
Models
Methods
Lab Books
Preprints
DataVideo
Blogs
Podcasts
Codes
Reproducible, or rather “fully supported”Transparent science, Composite research components
Algorithms
Models
Presentations
OntologiesIntermediateResults
Related Articles
Comments& Reviews
Reproducible Sciencemeans context, quality, trust
means easy access to the sources
Methods are Scientific
commodities• Scripts, workflows,
simulations, experimental plans statistical models, ...
• Repeatable, reproducible, comparable and reusable research.
• Sharing to propagates expertise and build reputation.
,
http://myexperiment.org
120 Simulation tools
1,200 Seminars, podcasts, etc.
77,000 Users worldwide
550 Contributors
Developed by the NSF Network for Computational NanotechnologyOnline since October 2002[Michael McLennan]
http://nanoHUB.org
[Jean-Claude Bradley]
http://usefulchem.wikispaces.com/
1. A link brings up figures from the paper
0. Full text of PLoS papers stored in a database
2. Clicking the paper figure retrievesdata from the PDB which is
analyzed
3. A composite view ofjournal and database
content results
BioLit
Seamless integration between data and publications
From the Public Library of Science people.
4. The composite view haslinks to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
The Knowledge and Data Cycle
http://biolit.ucsd.edu
[Phil Bourne]
ICTP Trieste, December 10, 2007
[Phil Bourne]
[Phil Bourne]
The reproducible and interactive research documents*
Mixed stewardship research documents
The recombinant, compound research documents
The virtual research document
Multi-versioned, dynamic research document
*Papers, Books, whatever.
2020
Data, image, model, process, workflow, podcast, slideset*
Finding, citation, peer review, preservation, identity, versioning, security, privacy, copyright
management, format authority
Authority on metadata descriptions Propagation of descriptions
* Insert new research commodity type here
2020
What does this mean for library services?
Seamless interlinking of data, literature and other research commodities
Integrated search across external resourcesSelective quality curation
Hell is other people’s (lack of) semantic metadata
2020
Collaboration
(Virtual) Team Research• Research increasingly team-based• Teams produce more highly cited research• Team science is increasingly composed of co-
authors located at different universities. • “virtual communities of scholars” produce
higher impact work than comparable co-located teams or solo scientists.
• True for all fields and team sizes.
Studies of 19.9 million research articles over 5 decades as recorded in the Web of Science database, and an additional 2.1 million patent records from 1975-2005.Using the Web of Science database to analyze the collaboration arrangements of over 4,000,000 papers over a 30 year period Sources: Wuchty, Jones, and Uzzi
Noshir Contractor
Distributed and Collaborative
.....skills-rich and time-poor
Biologists, Geneticists, Bioinformaticians, Immunologists, Microarray specialists, Computer Scientists, Mathematicians,
Physicists.....
[Helen Hulme]
• Personal: log books and spreadsheets, file stores
• Group: shared data, methods, protocols, information, failures, insights, observations, know-how
• Born digital but not very digitally processable.
[Helen Hulme]
Virtual Research Environments 1
CollaborationEnvironments
Science Gateways to data and computing grids
Multi-authored document preparation
Multi-disciplinary
Proteomics
Classical Genetics / QTL studies
Animal Experts
Transcriptomics
Parasite Experts
Statistical modelling
Text Analysis
Image analysis
Health Epidemiology
Crossing boundariesInterdisciplinary Support
• Expert finding • Complementary experts swarming around a problem• Transferring data, methods and know-how from one
discipline to another e.g. astronomy image analysis applied to cancer tissue
microarrays
• How do you find relevant material that uses a different jargon in a different discipline organised to only suit its experts?
• Overlay and virtual journals are few and far between – e.g. the Virtual Journal of Quantum Information.
• Where is the overlay library?
Virtual Research Environments 2
Social Professional NetworkingExpert finding
The BL’s Research Information Centre
Open Science
Collective IntelligenceResearcher participation
Commons based productionSharing
Accelerated disseminationEmbedded in the researchers environment
and work practices
“Long Tail” Science. “Hypo” Science• Increased scale and diversity of
scientific participation – The small research team.– Niche experts.– The citizen.
• Easier to work with, and get hold of, digital output.– Better tools.
• Scaling effects of peer review, social working and community curation.
Open content, services and software.
Social tools for the social process of science.
http://www.wikipathways.org/
[Duncan Hull]
Growth of open access scientistsdigital natives, always online, hybrids
catalysts for change
[Phil Bourne]
Cameron Neylon’s
chemistry notebook
Paul
Jo
Sharing reusable methods
Competitive advantage.Academic vanity.
Reputation.Adoption.
Scrutiny.Being scooped.
Misinterpretation.
New Reward Schemes
Rew
ards
Fear
s
What is the role of the library?Trusted curator
Trusted data managerQuality arbiter
Knowledge disseminatorFormat authority
Add value content providerMetadata / controlled vocabulary provider
Add value service provider
2020
Services
Embedding into the Researchers WorkflowThe Cloud
Personal Scientist-centric tooling• We don’t come to
the library, it comes to us.
• We don’t use just one library or one source.
• We don’t use just one tool!
• Library services embedded in our toolkits, workbenches, browsers, authoring tools.
Zotero Firefox plug-in
Hypothesis Construction from the Literature
Marco Roos, Scott Marshall, University of Amsterdam
http://info.scopus.com/scsearchapi/geoCitations/index.html
What does this mean for library services? With not For
Opening up to researcher’s tools and research environments for discovery, management
and curation of research commoditiesEnabling and encouraging new services and
new content to add new valueRemove obstacles to interoperate and share
Collaborate, don’t control
Give researchers tools and access to content – They control their
own software/data apparatus and their experiments.
– They are creative
Pervasive devices and the mixing up of virtual and real worlds
Prior to leaving home Paul, a Manchester graduate student, syncs his IPhone with the latest papers, delivered overnight by the library via a news syndication feed. On the bus he reviews the stream, selecting a paper close to his interest in HIV-1 proteases.
The data shows apparent anomalies with his own work, and the method, an automated script, looks suspect.
Being on-line he notices that a colleague in Madrid has also discovered the same paper through a blog discussion and they Instant Message, annotating the results together.
By the time the bus stops he has recomputed the results, proven the anomaly, made a rebuttal in the form of a pubcast to the Journal Editor, sent it to the journal and annotated the article with a comment and the pubcast.
Based on an original idea by Phil Bourne
http://research.microsoft.com/towards2020science/
Questions?
Extras
Other References• Duncan Hull, Steve Pettifer, Doug Kell, Defrosting the digital library: bibliographic tools for the next
generation web to appear in PLoS Computational Biology• Michael Nielsen, The Future of Science http://michaelnielsen.org/blog/?p=448• Philip Bourne Will a biological database be different from a biological journal, PLOS Computational
Biology 1(3) www.ploscompbiol.org• James A. Evans Electronic Publication and the Narrowing of Science and Scholarship Science 18 July
2008: Vol. 321. no. 5887, pp. 395 - 399http://www.sciencemag.org/cgi/content/abstract/321/5887/395
• James Hendler Reinventing Academic Publishing, Editorials for IEEE Intelligent Systems http://www.mindswap.org/blog/2007/08/14/reinventing-academic-publishing-%E2%80%93-part-i/ http://www.mindswap.org/blog/2007/11/23/reinventing-academic-publishing-%E2%80%93-part-ii/ http://www.mindswap.org/blog/2008/01/03/reinventing-academic-publishing-%E2%80%93-part-iii/
• Cameron’s suggested open science blogs• http://www.earlham.edu/~peters/fos/2008/07/online-researchers-have-access-to-more.html • http://scienceblogs.com/clock/2008/07/electronic_publication_and_the.php• http://www.sennoma.net/main/archives/2008/07/an_open_access_partisans_view.php • http://openwetware.org/wiki/Science_2.0/Brainstorming
• http://sciencex2.org/en/user/113/track
http://research.microsoft.com/towards2020science/