Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin [email protected] School...

14
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin [email protected] School of Agriculture and Food Sciences, UQ, April 8, 2015 Research Computing Centre @ UQ

Transcript of Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin [email protected] School...

Page 1: Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin i.makunin@uq.edu.au School of Agriculture and Food Sciences, UQ, April 8, 2015.

Genomics Virtual Lab: analyze your data with a mouse click

Igor [email protected]

School of Agriculture and Food Sciences, UQ, April 8, 2015

ResearchComputing

Centre@ UQ

Page 2: Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin i.makunin@uq.edu.au School of Agriculture and Food Sciences, UQ, April 8, 2015.

Genomics Virtual Laboratory

Genome scale experiments are relatively cheap and very popular- cost of high throughput sequencing is going down- available data (genomes, transcripts etc)

Analysis of NGS data is a bottleneck (infrastructure, skills)

Genomics Virtual Lab: take the IT out of Bioinformatics- web-based resources (biologists-friendly)- DIY bioinformatics environment (for geeks)

GVL advantages: - public resources (no charges to users)- available immediately

Page 3: Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin i.makunin@uq.edu.au School of Agriculture and Food Sciences, UQ, April 8, 2015.

GVL products and servicesGenomics Virtual Lab: genome.edu.au

The main aim: facilitate the genomics research in Australia

Galaxy:• Tutorials and protocols (nextGen sequencing)• Galaxy for tutorials: galaxy-tut.genome.edu.au• Galaxy for full-scale analysis: galaxy-qld.genome.edu.au• “roll your own” Galaxy on the Australian government

funded computer infrastructure (NeCTAR cloud)+ ipython Notebook+ RStudio

Deploy your own computer cluster (NeCTAR cloud)

Mirror of UCSC Genome BrowserRStudio

LearnUseGet

Info

Page 4: Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin i.makunin@uq.edu.au School of Agriculture and Food Sciences, UQ, April 8, 2015.

Galaxy: how does it look like

Tools Working window History

Page 5: Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin i.makunin@uq.edu.au School of Agriculture and Food Sciences, UQ, April 8, 2015.

Galaxy: possibilitiesYou can:- analyze genome-scale nextGen sequencing data without bash scripting- work with big datasets, genomic regions, sequences etc.- create and use workflows (record steps of your analysis)- share results and workflows with a user or make it available to anyone

Data import: - upload through the web interface - ftp (for big datasets)

Public data: - UCSC Genome Browser- UCSC Archaea - Microbial data - EBA SRA

Over 2,000 tools available through the Galaxy tool shed

Page 6: Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin i.makunin@uq.edu.au School of Agriculture and Food Sciences, UQ, April 8, 2015.

Use: local Galaxy-qld server

GVL Galaxy in Queensland: galaxy-qld.genome.edu.au

- BWA, bowtie, bowtie2- Velvet (microbial genome assembly)- Trinity (de novo transcript assembly)- tophat, tophat2 (RNA-Seq)- DESeq, edgeR, Cufflinks (differential gene expression)- Variant detection tools- Metagenomics tools- MACS, MACS2, SPP (ChIP-Seq)- SAMtools- Picard

100s users1000s jobs per month

up to 1 Tb per user (for the UQ users)

Page 7: Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin i.makunin@uq.edu.au School of Agriculture and Food Sciences, UQ, April 8, 2015.

Data manipulation on Galaxy-qld

GVL Galaxy in Queensland: galaxy-qld.genome.edu.au

Useful tools for data manipulation:

- FASTA manipulation - MEME (identification of motifs)- BLAST search- Text manipulation: add column, merge, cut, trim, compute expression etc.- Filter and Sort- Join, Subtract and Group- Format conversion (genomics)- Operate on Genomics Intervals (including Fetch closest feature)- Statistics

Page 8: Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin i.makunin@uq.edu.au School of Agriculture and Food Sciences, UQ, April 8, 2015.

Good user practice for Galaxy-qld

GVL Galaxy in Queensland: galaxy-qld.genome.edu.au

Register with your UQ email and get a bigger disk allocation.

Use ftp for big datasets – it is faster. Galaxy recognises .gz compression.

Do not store unneeded datasets. Delete temporary files such as SAM. Purge deleted datasets.

Do not start many big jobs in parallel (BWA, bowtie, bowtie2, tophat, tophat2, velvet, trinity).

Create and use workflows for multi-step analysis.

Specify the quality score encoding for nextGen sequencing data (FASTQ files).

Page 9: Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin i.makunin@uq.edu.au School of Agriculture and Food Sciences, UQ, April 8, 2015.

Mirror of UCSC Genome Browser

ucsc.genome.edu.au- full mirror, regular update- keep user data for a long time

Page 10: Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin i.makunin@uq.edu.au School of Agriculture and Food Sciences, UQ, April 8, 2015.

Use: RStudio

http://gvl-rstudio.genome.edu.au/rstudio/

Based on the GVL clusterGenome data from Galaxy

Email to:[email protected] the registration

Page 11: Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin i.makunin@uq.edu.au School of Agriculture and Food Sciences, UQ, April 8, 2015.

Genomics Virtual Lab: LearnGenomics VL site: genome.edu.au

Easy-to-follow Galaxy tutorials (DIY, online)A dedicated Galaxy server: galaxy-tut.genome.edu.auTopics: RNA-Seq, variant detection, ChIP-Seq, microbial genome assembly …

Training through QFAB (with a nominal fee): qfab.org

Page 12: Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin i.makunin@uq.edu.au School of Agriculture and Food Sciences, UQ, April 8, 2015.

GVL Get: roll your own Galaxy

Default NeCTAR allocation for the UQ users: 2 CPUs, 8 GB RAM

Start you own virtual computer cluster on the NeCTAR cloud

Start your own Galaxy on the NeCTAR cloud- admin rights (can add tools)- as powerful as needed (based on allocation)- ability to add worker nodes- ipython Notebook- RStudio

Detailed instructions are available on the Genomics VL siteFollow announcements on QFAB web site: qfab.org

Page 13: Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin i.makunin@uq.edu.au School of Agriculture and Food Sciences, UQ, April 8, 2015.

SummaryGVL provides resources for genomics research:- learn & Galaxy-tut - local Galaxy-qld - roll your own

We are interested in users and the feedback

What you want to do?

Any special needs? (tools, datasets, resources)

What you want to learn?

Do you want to share / promote your workflows with other people?

Talk to us: Igor [email protected]

Page 14: Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin i.makunin@uq.edu.au School of Agriculture and Food Sciences, UQ, April 8, 2015.

Thank you!GVL site: www.genome.edu.auGalaxy for tutorials: galaxy-tut.genome.edu.auGalaxy Queensland: galaxy-qld.genome.edu.au

Contributors and participants: