Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami
-
Upload
gigascience-bgi-hong-kong -
Category
Technology
-
view
1.535 -
download
1
description
Transcript of Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami
![Page 1: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/1.jpg)
Scott Edmunds
(Lessons learned from the Genomics “Tsunami”)
www.gigasciencejournal.com
: a Journal or a Database?
HUPO Congress 2011, Geneva
![Page 2: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/2.jpg)
BGI Introduction
• Formerly known as Beijing Genomics Institute• Founded in 1999• Now the largest genomic organization in the
world• Goal
– Use genomics technology to impact the society– Make leading edge genomics highly accessible to the global research community
![Page 3: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/3.jpg)
Sequencers137 Illumina/HiSeq 200027 LifeTech/SOLiD 416 AB/3730xl + 110 MegaBACEs2 Illumina iScan
Largest Sequencing Capacity in the World
Data Production 5.6 Tb / day
> 1500X of human genome / day
Multiple Supercomputing Centers 157 TB Flops
20 TB Memory
12.6 PB Storage
![Page 4: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/4.jpg)
Mass spectrometry at BGI QTRAP 5500 Orbitrap velos maXis ultrafleXtreme™
Producer AB SCIEX Thermo Scientific Bruker Bruker
Source-Analyzer ESI-Triplequadrupole-LIT
ESI-LTQ-Orbitrap ESI-UHR-Q-TOF-MS MALDI-TOF/TOF
Application MRM • High accuracy• Label-free quantitation
• iTRAQ quantitation
• QC• MS image
QTRAP 5500, AB SCIEX Orbitrap velos, Thermo Scientific maXis Q-TOF, Bruker ultraflex, Bruker
![Page 5: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/5.jpg)
Products and Services Offered to Collaborators
• Protein Profiling for any species (tying in with 1000 PARGP)
• Techniques:– Quantitative analysis– Post-translational modification– Target Proteomics– Metabolomics
![Page 6: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/6.jpg)
![Page 7: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/7.jpg)
“Trans-Omics”
• Genomics • Transcriptomics• Proteomics • Metabolomics
Objective to integrate data from:
![Page 8: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/8.jpg)
BGI Proteomics Dept Focus:
• RAW MS data storage and analysis• Upstream analysis • “Large-scale” screening/quantitative analysis• Working on: Automatic analysis pipelines/tools
Industrial usage/standards
![Page 9: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/9.jpg)
Lessons Learned:
What went right?
![Page 10: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/10.jpg)
Bill Clinton: “We are here to celebrate the completion of the first survey of the entire human genome. Without a doubt, this is the most important, most wondrous map ever produced by human kind. “
“Today we are learning the language in which God created life.”
Lessons Learned: 1. having a cool project helps…
![Page 11: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/11.jpg)
Lessons Learned: 2. Reproducibility is important…
Helped by stability of:
1. Platforms
2. Infrastructure
3. Standards
1st Gen 2nd Gen
![Page 12: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/12.jpg)
Lessons Learned: 3. Sharing is important…
V
![Page 13: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/13.jpg)
Lessons Learned: 3. Sharing is important…
V
![Page 14: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/14.jpg)
Lessons Learned: 3. Sharing is important…
1. Automatic release of sequence assemblies within 24 hours.2. Immediate publication of finished annotated sequences.3. Aim to make the entire sequence freely available in the public domain for both
research and development in order to maximise benefits to society.
Bermuda Accords 1996/1997/1998:
1. Sequence traces from whole genome shotgun projects are to be deposited in a trace archive within one week of production.
2. Whole genome assemblies are to be deposited in a public nucleotide sequence database as soon as possible after the assembled sequence has met a set of quality evaluation criteria.
Fort Lauderdale Agreement, 2003:
The goal was to reaffirm and refine, where needed, the policies related to the early release of genomic data, and to extend, if possible, similar data release policies to other types of large biological datasets – whether from proteomics, biobanking or metabolite research.
Toronto International data release workshop, 2009:
![Page 15: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/15.jpg)
Benefits of Data-sharing
Piwowar HA, Day RS, Fridsma DB (2007) PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308
Sharing Detailed Research Data Is Associated with Increased Citation Rate.
Every 10 datasets collected contributes to at least 4 papers in the following 3-years.Piwowar, HA, Vision, TJ, & Whitlock, MC (2011). Data archiving is a good investment Nature, 473 (7347), 285-285 DOI: 10.1038/473285a
![Page 16: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/16.jpg)
19961997
19981999
20002001
20022003
20042005
20062007
20080
100
200
300
400
500
600
700rice wheat
Rice v Wheat: consequences of publically available genome data.
![Page 17: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/17.jpg)
Duplicated genes most responsive to ecological challenges
The Ecoresponsive Genome of Daphnia pulex Colbourne et al., Science 4 February 2011:
200Mb Genome, 30,907 genes
![Page 18: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/18.jpg)
wFleabase: Mar 2006Genome release: July 2007
Genome Published: Feb 2011
Daphnia Genome Consortium
>58 companion papershttps://daphnia.cgb.indiana.edu/Publications
![Page 19: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/19.jpg)
Problems?
Flickr cc: opensourceway
![Page 20: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/20.jpg)
Thomas Michael Dexter (Wellcome trust): “Mapping the human genome has been compared with putting a man on the moon, but I believe it is more than that. This is the outstanding achievement not only of our lifetime, but in terms of human history”
Lessons Learned: 4. Need to manage expectations…
June 2000
![Page 21: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/21.jpg)
Lessons Learned: 4. Need to manage expectations…
June 2010
![Page 22: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/22.jpg)
~100,000X
Moore’s Law
Sequencing
Source: E Lander/Broad
Sequencing cost ($ per Mbp)
Lessons Learned: 5. Data, data, data
![Page 23: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/23.jpg)
Data
Moore’s/Kryders Law
Storage
Sequencing Output Lessons Learned: 5. Data, data, data
![Page 24: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/24.jpg)
Data
Dissemination?
Publication
Sequencing Output Lessons Learned: 5. Data, data, data
![Page 25: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/25.jpg)
Flickr cc: opensourceway
Can we keep up?Lessons Learned: 5. Data, data, data
![Page 26: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/26.jpg)
Flickr cc: opensourceway
Do we have models for long term funding?
Lessons Learned: 5. Data, data, data
Human Gene Mutation Database
?
Kyoto Encyclopedia of Genes and Genomes
![Page 27: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/27.jpg)
Growing/widening user base.
Lessons Learned: 5. Data, data, data
?
3rd Gen sequencers: “Democratizing sequencing”
![Page 28: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/28.jpg)
Lessons Learned: 5. Data, data, data
?
Curation, curation, curation?
The long tail of new “big-data” producers?
![Page 29: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/29.jpg)
Lessons Learned: 5. Data, data, data
?
Are there now too many hurdles?
![Page 30: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/30.jpg)
Lessons Learned: 5. Data, data, data
?
Are there now too many hurdles?Technical: too large volumes
too heterogeneous no home for many data typestoo time consuming
Economic: too expensive, no long-term funding
Cultural: inertiano incentives to share unaware of how
![Page 31: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/31.jpg)
Potential solutions?
![Page 32: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/32.jpg)
Potential solutions: Better handling of data, data, data
Cloud?
![Page 33: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/33.jpg)
Potential solutions: Better handling of data, data, data
• What to save/what to throw away?
• Better Compression?
![Page 34: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/34.jpg)
Potential solutions: Better handling of metadata…
Cloud solutions?
Better tools for assessing data quality…
![Page 35: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/35.jpg)
Potential Solutions:
?
New incentives/creditCredit where credit is overdue:“One option would be to provide researchers who release data to public repositories with a means of accreditation.”“An ability to search the literature for all online papers that used a particular data set would enable appropriate attribution for those who share. “Nature Biotechnology 27, 579 (2009)
Prepublication data sharing (Toronto International Data Release Workshop)“Data producers benefit from creating a citable reference, as it can later be used to reflect impact of the data sets.” Nature 461, 168-170 (2009)
![Page 36: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/36.jpg)
Datacitation: Datacite and DOIs
Digital Object Identifiers (DOIs) offer a solution
Mostly widely used identifier for scientific articles
Researchers, authors, publishers know how to use them
Put datasets on the same playing field as articles
DatasetYancheva et al (2007). Analyses on sediment of Lake Maar. PANGAEA.doi:10.1594/PANGAEA.587840
![Page 37: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/37.jpg)
Datacitation: Datacite and DOIs
>1 million DOIs since Dec 2009
Central metadata repository to link with WoS/ISI
- finally can track and credit use!
![Page 38: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/38.jpg)
How can we combine these?
Journals Databases?
![Page 39: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/39.jpg)
www.gigasciencejournal.com
Large-Scale Data Journal/Database
Editor-in-Chief: Laurie Goodman, PhDEditor: Scott Edmunds, PhDAssistant Editor: Alexandra Basford, PhD
In conjunction with:
Now taking submissions…
![Page 40: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/40.jpg)
www.gigasciencejournal.com
Criteria and Focus of Journal/DatabaseReproducibility/ReuseUtility/UsabilityStandards/Searchability/Scale/SharingData publishing/DOI
![Page 41: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/41.jpg)
www.gigasciencejournal.com
Data publishing/DOIData hosting will follow standard funding agency and community guidelines.DOI assignment available for submitted data to allow ease of finding and citing datasets, as well as for citation tracking.Datasets tracked by WOS/ISI allowing additional metrics/credit for use.
![Page 42: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/42.jpg)
www.gigasciencejournal.com
Reproducibility/Reuse BGI Cloud Computing resources for handling and analyzing large-scale data.Integrated tools to promote more widespread access, viewing, and analysis of data.Encourage and aid use of workflow systems for methods (e.g. submission of Galaxy XML files).
![Page 43: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/43.jpg)
www.gigasciencejournal.com
Special Series/Hub for cloud-based toolsTechnical notes: test tools in the BGI-Cloud.Tools + Test Data (BGI or user) in one place.Aids reproducibility. Aids reviewers (free)Aids authors: visibility (pubmed, etc.)
hosting (included/free offers)
–contact us: [email protected]
Oledoe flickr cc
![Page 44: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/44.jpg)
www.gigasciencejournal.com
Standards/Searchability/Sharing ISA-Tab compatibility to aid and promote best practice in metadata reporting.All supporting data must be publically available.Ask for MIBBI compliance and use of reporting checklists.Part of the Biosharing network.
![Page 45: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/45.jpg)
To maximize its utility to the research community and aid those fighting the current epidemic, genomic data is released here into the public domain under a CC0 license. Until the publication of research papers on the assembly and whole-genome analysis of this isolate we would ask you to cite this dataset as:
Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium (2011) Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen. doi:10.5524/100001 http://dx.doi.org/10.5524/100001
Our first DOI:
To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
![Page 46: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/46.jpg)
![Page 47: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/47.jpg)
![Page 48: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/48.jpg)
“The way that the genetic data of the 2011 E. coli strain were disseminated globally suggests a more effective approach for tackling public health problems. Both groups put their sequencing data on the Internet, so scientists the world over could immediately begin their own analysis of the bug's makeup. BGI scientists also are using Twitter to communicate their latest findings.”
“German scientists and their colleagues at the Beijing Genomics Institute in China have been working on uncovering secrets of the outbreak. BGI scientists revised their draft genetic sequence of the E. coli strain and have been sharing their data with dozens of scientists around the world as a way to "crowdsource" this data. By publishing their data publicy and freely, these other scientists can have a look at the genetic structure, and try to sort it out for themselves.”
![Page 49: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/49.jpg)
![Page 50: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/50.jpg)
G10K Genomes Get DOI®s
doi:10.5524/100004
![Page 51: Scott Edmunds: GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami](https://reader036.fdocuments.in/reader036/viewer/2022062511/54bf2e414a7959ac458b4588/html5/thumbnails/51.jpg)
www.gigasciencejournal.com
We want your data!
@gigascience