Towards Incidental Collaboratories; Research Data Services
-
Upload
anita-de-waard -
Category
Documents
-
view
410 -
download
0
description
Transcript of Towards Incidental Collaboratories; Research Data Services
Research Data Services: Towards a Framework for Incidental Collaboratories
Anita de Waard VP Research Data Collabora@ons, Elsevier RDS
Jericho, VT, USA
Brief bio: • Background: – Low-‐temperature physics (Leiden & Moscow) – Joined Elsevier in 1988 as publisher in solid state physics – 1991: ArXiV => publishers will go out of business very soon!
• 1997-‐ now: Disrup@ve Technologies Director, focus on beXer representa@on of scien@fic knowledge: – Iden@fying key knowledge elements in ar@cles (linguis@cs thesis) – Building claim-‐evidence networks (through collabora@ons) – Help build communi@es to accelerate rate of change (Force11)
• Star@ng 1/1/2013: VP Research Data Collabora@ons -‐ why? – Douglas Engelbart’s thinking: connect minds! – My (non-‐biologists) understanding of biology:
The big problem in biology:
hXp://en.wikipedia.org/wiki/File:Duck_of_Vaucanson.jpg
Interspecies variability: A specimen is not a species Gene expression variability: Knowing genes is not knowing how they are expressed Microbiome: An animal is an ecosystem Systems biology: A whole is more than the sum of its parts Reduc@onist science doesn’t work for living systems!
Sta@s@cs to the rescue! With enough observa@ons, trends and anomalies can be detected: • “Here we present resources from a popula@on of 242 healthy adults
sampled at 15 or 18 body sites up to three @mes, which have generated 5,177 microbial taxonomic profiles from 16S ribosomal RNA genes and over 3.5 terabases of metagenomic sequence so far.”
The Human Microbiome Project Consor@um, Structure, func@on and diversity of the healthy human microbiome, Nature 486, 207–214 (14 June 2012) doi:10.1038/nature11234
• “The large sample size — 4,298 North Americans of European descent and 2,217 African Americans — has enabled the researchers to mine down into the human genome.”
Nidhi Subbaraman, Nature News, 28 November 2012, High-‐resolu@on sequencing study emphasizes importance of rare variants in disease.
• “A profile unique for a DNA sample source is obtained … a series of numbers are generated which can be used as a bar code for that DNA source. A registry of bar codes would make it easy to compare DNA samples”
Roland M. Nardone, Ph.D., Eradica@on of Cross-‐Contaminated Cell Lines: A Call for Ac@on, hXp://www.sivb.org/[email protected]
• Collect: store data at the level of the experiment: – Accessible through a single interface – Add enough metadata to know what was done/seen
• Connect: allow analyses over: – Similar experiment types – Experiments done with/on similar biological ‘things’:
• Species, strains, systems, cells • Anatomical components (e.g. spleen, hypothalamus) • An@bodies, biomarkers, bioac@ve chemicals, etc
• Keep: – Long-‐term preserva@on of data and sosware (Olive) – Fulfill Data Management Plan requirements – Allow gated access, if needed
Enable ‘incidental collaboratories’:
Problem: biological research is quite insular • Biology is small: because objects/
equipment are 10^-‐5 – 10^2 m, you can work alone (‘King’ and ‘subjects’).
• Biology is messy: it doesn’t happen behind a terminal.
• Biology is compe@@ve: different people with similar skill sets, vying for the same grants.
• In summary: it does not promote inherent collabora@on (vs., for instance, big physics or astronomy).
Prepare
Observe
Analyze
Ponder
Communicate
Try to pop the ‘lab bubble’!
Prepare
Analyze Communicate Think
Prepare
Analyze Communicate
Prepare
Analyze Communicate
Observa@ons
Observa@ons
Observa@ons
Labs go from being informa@on islands, to being ‘sensors in a network’.
Some objec@ons, and rebuXals: Objec&on: Rebu-al:
“But our lab notebooks are all on paper”
Develop smart phone/tablet apps for data input
“I need to see a direct benefit from something I spend my @me on”
Develop ‘data manipula@on dashboard’ for PI to allow beXer access to full experimental output for his/her lab
“I am afraid other people might scoop my discoveries”
Develop intra-‐lab data communica@on systems first and allow @med/granular data export
“I want things to be peer reviewed before I expose them”
Allow reviewers access to experimental database before publica@on (of data or paper)
“I don’t really trust anyone else’s data – well, except for the guys I went to Grad School with…”
Add a social networking component to this data repository so you know who (to the individual) created that data point.
Elsevier Research Data Services: Goals
1. Help increase the amount of data shared from the lab, enabling incidental collaboratories
2. Help increase the value of the data shared by increasing annota@on, normaliza@on, provenance enabling enhanced interoperability
3. Help measure and deliver credit for shared data, the researchers, the ins@tute, and the funding body, enabling more sustainable pla;orms
RDS Guiding Principles: • In principle, all open data stays open and URLs, front end etc. stay where they are (i.e. with repository)
• Collabora@on is tailored to data repositories’ unique needs/interests and of a ‘service-‐model’ type: – Aspects where collabora@on is needed are discussed – A collabora@on plan is drawn up using a Service-‐Level Agreement: agree on @me, condi@ons, etc.
– All communica@on, finance, IPR etc. is completely transparent at all @mes.
• Very small (2/3 people) department; immediate communica@on; instant deployment of ideas
RDS Approach:
• Collaborate and build on rela@onships with data repositories (life science, earth science, others)
• Integrate with other content sources, if possible • Build annota@on and standardisa@on tools and processes to implement this
• Develop next-‐genera@on infrastructure solu@ons for back-‐end integra@on
• Explore crea@ve revenue opportuni@es
NIF An@body Registry: Problem: • 95 an@bodies were iden@fied in 8 papers • 52 did not contain enough informa@on
to determine the an@body used • Some provided details in another paper • Failed to give species, vendor, catalog # Solu@on # 1: • Journals ask authors to provide
an@body catalog nr • Link to NIF Registry from manufacturers/
vendors’ sites
Solu@on #2: • Pilot with a lab:
Let’s start with the Urban Lab
• Geyng an@bodies • And messy bits • From the notebook • Into Nathan Urban’s command center
• By providing – 7” Tablets – Links to IgorPro – A dashboard UI
My ques@ons to you: • Thoughts on this approach: – In principle? – In prac@ce?
• Do you see serious hurdles: – Are we overlapping with other ini@a@ves; if so, are we complementary?
– How does this connect to libraries/local repositories? – Are there sensi@vi@es/pain points we are overlooking?
• Where to start: – How to collaborate? – Who to talk to – funding agencies, socie@es: who else? – Thoughts on data repositories/plazorms to connect to?
Your ques@ons to me?
[email protected] hXp://elsatglabs.com/labs/anita/
hXp://www.slideshare.net/anitawaard
Thanks go to: • Anita Bandrowski and Maryann Martone, NIF • Nathan Urban, Shreejoy Tripathy, CMU • David Marques, SVP RDS