GigaDB and Galaxy: revolutionizing data dissemination, organization and analysis
Peter LiGigaScience
www.gigasciencejournal.com
Journal and database forlarge-scale data
Editor-in-Chief: Laurie GoodmanEditor: Scott Edmunds
Commissioning Editor: Nicole NogoyLead Curator: Tam Sneddon
Data Platform: Peter Li
in conjunction with
Why another *omics journal?
Already many journals publishing research involving large data sets
Resultsreproducibility
Unrepeatability of scientific results
Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses.Nature Genetics 41: 149-155.
Out of 18 microarray papers, resultsfrom 10 could not be reproduced
How are we supporting data reproducibility?
Data sets
AnalysesGigaScience
paper
Linked to
Linked to
Community tools fordata reproduction and reuse
DOI
Paper DOI
Data set DOI
Linking of papers and data by citation of DOIs
http://gigadb.org
GigaDB is a new database integrated with the GigaScience journal to meet the needs of a new generation of biological and biomedical research as it enters the era of “big-data”… (see more)
Aspera data transfer
Faster download speeds
BGI Datasets Get DOI®s
PLANTSChinese cabbageCucumberFoxtail milletPigeonpeaPotatoSorghum
MicrobeE. Coli O104:H4 TY-2482T2D gut metagenome
Cell-LinesChinese Hamster OvaryMouse methylomes
Human Asian individual (YH)
- DNA Methylome - Genome Assembly
- TranscriptomeCancer (14TB)Single cell bladder cancerHBV infected exomesAncient DNA - Saqqaq Eskimo - Aboriginal Australian
VertebratesDarwin’s FinchGiant panda Macaque -Chinese rhesus -Crab-eatingMini-PigNaked mole rat Parrot, Puerto Rican Penguin - Emperor penguin- Adelie penguinPigeon, domesticPolar bearSheepTibetan antelope
InvertebrateAnt - Florida carpenter ant- Jerdon’s jumping ant- Leaf-cutter antRoundwormSchistosomaSilkwormParasitic nematodePacific oyster
Released pre-publicationPaper published in GigaScience
39 data sets
Currently: 39 public datasets*10 citations in references*
Humans Ancient DNA- Aboriginal Australian- Saqqaq Eskimo Asian individual (YH)
What about the analyses?
Data sets
AnalysesGigaScience
paper
Linked to
Linked to
How will we make analyses availablefor downloading and execution?
Example workflow: Investigate the evolutionary relationships between proteins
Proteinsequences
Bioinformatics data analyses as workflows
QueryMultiplesequencealignment
Implement GigaScience workflowsin a community-accepted format
http://galaxyproject.org
Over 20,000 main Galaxy server users
Over 500 papersciting Galaxy use
Over 55 Galaxyservers deployed
Open source
Tool list Tool parameterisation Results panel
Pilot project - Integrate BGI SOAP package into Galaxy
Enable SOAP tools to be used from within Galaxy workflows
Data analysis pipelines
SOAP1 SOAP2 SOAPdenovo1 SOAPdenovo2 SOAPsnp SOAPsplice
Integrate BGI SOAP package into Galaxy
Pythonwrapper
Pythonwrapper
Pythonwrapper
Pythonwrapper
Pythonwrapper
Pythonwrapper
GitHub open code repository
https://github.com/gigascience
Tool list Tool parameterisation Results panel
SOAPdenovo2 Galaxy workflow
http://www.myexperiment.org
Why publish in GigaScience?
Benefit• Data hosted in GigaDB• Allocation of DOIs to data• Metadata in isa-tab format• Galaxy tool integration• Use of tools in Galaxy
workflows
Added value• No need to use own servers• Citable data• Aids reuse of data• Supports reuse of tools• Improves documentation• Shows how tool can be used
with other bioinf. software
Thanks to:
• Tin-Lap Lee and Huayan Gao - CUHK• Tam, Jesse, Scott, Nicole & Laurie - GigaScience
Top Related