Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

39
CBIIT GigaGalaxy – A Galaxy- based Platform for Large-scale Genomics Analysis Tin-Lap, LEE School of Biomedical Sciences, CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Hong Kong SAR, China.

description

Tin-Lap Lee's ISCB-Asia talk on CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis. December 19th December 2012

Transcript of Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

Page 1: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

CBIIT GigaGalaxy – A Galaxy-based Platform for Large-scale Genomics Analysis

Tin-Lap, LEE

School of Biomedical Sciences, CUHK-BGI Innovation Institute of Trans-omics,

The Chinese University of Hong Kong, Hong Kong SAR, China.

Page 2: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

CBIIT• Jointly established between

The Chinese University of Hong Kong (CUHK) and BGI.

• “We aim to provide a platform conducive to training of multi-disciplinary talents conversant with the knowledge and application of genomics, proteomics, genetics , computation biology and bioinformatics, by capitalizing on both institutions’ expertise and strengths in genomic science.”

Page 3: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

Big Data Translates into Big Opportunities... and

Big Responsibilities

Page 4: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

The challenges for biomedical scientists

Page 5: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

The challenges for biomedical scientists

Page 6: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

http://galaxyproject.org/

Page 7: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

CBIIT GigaGalaxy Highlights:

• Provides enhanced functionality in additional to the original Galaxy functions

Specialized instances

Speed: local servers with SBS-UCSC genome database mirror in Hong Kong

Reproducibility: Seamless integration with Taverna/myExperiment workflows

Data exchange and publishing: GigaScience journal portal/GigaDB

Customized functions and more…..

Page 8: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

CBIIT GigaGalaxy

Benefits:

Simplifies complicated bioinformatics tasks, accelerate data processing and allow flexible analysis.

Significantly reduce software and hardware costs, encourage research collaboration.

Page 9: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

http://www.cuhk.edu.hk/cbiit/galaxy.html

Galaxy/CUHK-BGI

Page 10: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

CBIIT GigaGalaxy Structure

ToolDevelopment PublishingBiomedical and bioinformatics research

Page 11: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

What is SOAP?• SOAP - a tool package that provides full solution to NGS data analysis by BGI.

http://soap.genomics.org.cn/

Page 12: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

Why SOAP?• Galaxy has been using SAMtools for consensus sequence calling, but the

recent upgrade has left this part out, which is very limited to some biologists.

• SOAPsnp is the only other method that can call full consensus sequences besides SAMtools.

• The main galaxy site supports none of the SOAP tools, including SOAPsnp.

Page 13: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

Galaxy Tool Shed

• Enables sharing of Galaxy tools across Galaxy servers around the world.

• SOAP package tools configured for use in Galaxy.– SOAPsnp/SOAPdenovo

Page 14: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

NGS mapping: SOAP1

Page 15: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

NGS mapping: SOAP2

Page 16: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

SOAPsnp

Page 17: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

SOAPpopindel

Page 18: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

NGS De Novo Assembly: SOAPdenovo

Page 19: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

NGS De Novo Assembly: SOAPdenovo2

Page 20: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

CBIIT GigaGalaxy structure

BioinformaticsDevelopment PublishingBiomedical and bioinformatics research

Page 21: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

How does it work?

• myExperiment -a repository for workflows.

Taverna workflows.

New: Galaxy workflows.

• CBIIT GigaGalaxy integrationhttp://www.myexperiment.org

Page 22: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

Taverna workflow

http://www.taverna.org.uk/

Page 23: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis
Page 24: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

Galaxy workflow

Page 25: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

Import (1)

Page 26: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

Import (2)

Page 27: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

Export (1)

Page 28: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

Export (2)

Page 29: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

SOAPdenovo2 Galaxy workflow

Page 30: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

CBIIT GigaGalaxy structure

BioinformaticsDevelopment PublishingBiomedical and bioinformatics research

Page 31: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

www.gigasciencejournal.com

Large-Scale Data Journal/Database

Editor-in-Chief: Laurie Goodman, PhDEditor: Scott Edmunds, PhDCommissioning Editor: Nicole Nogoy, PhD

In conjunction with:

Now launched…

Page 32: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

GigaScience is go…

Page 33: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

www.gigaDB.org

Data Publishing

Page 34: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

40 Datasets with DOI®s

PlantsChinese cabbageCucumberFoxtail milletPigeonpeaPotatoSorghum

MicrobesE. Coli O104:H4 TY-2482Cell-LineChinese Hamster OvaryMouse Methylomes

Human Asian individual (YH) v1+v2- DNA Methylome - Genome Assembly- TranscriptomeCancer (14TB)Hep B infected exomesSingle Cell Bladder CancerAncient DNA - Saqqaq Eskimo - Aboriginal Australian

VertebratesGiant panda Macaque - Chinese rhesus - Crab-eatingMini-PigNaked mole rat ParrotPenguin - Emperor penguin- Adelie penguinPigeon, domesticPolar bearSheepTibetan antelope

InvertebrateAnt - Florida carpenter ant- Jerdon’s jumping ant- Leaf-cutter antRoundwormSchistosomaSilkworm

Released pre-publicationNon-BGIPaper in GigaScience

Coming soon…Microbiome data

Page 35: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

GigaDB v2 export to CBIIT GigaGalaxy

Page 36: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

How are we supporting data reproducibility?

Data sets

AnalysesGigaScience

paper

Linked to

Linked to

Community tools fordata reproduction and reuse

DOI

Page 37: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

Data Modeling

Pipeline design

Validation

Applications

CBIIT GigaGalaxy

Big data from the

“Sequencing Coal Face”Data, Data, Data…

Tin-Lap Lee, CUHK

Page 38: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

Acknowledgements• Lee Lab (CUHK)

– Huayan Gao

• GigaScience– Scott Edmunds– Peter Li– Tam Sneddon

• BGI-Hong Kong– Dennis Chan– Edmond Leung

• Galaxy team– Nate Coraor

• myExperiment– Finn Bacall– Dave De Roure

• NBIC– Kostas Karasavvas

BGI-Shenzhen- Ruiqiang Li- Ruibang Luo- Haofu Wu- SOAP team members

Page 39: Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

Thank you