Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin [email protected] School...
-
Upload
magnus-russell -
Category
Documents
-
view
216 -
download
4
Transcript of Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin [email protected] School...
Genomics Virtual Lab: analyze your data with a mouse click
Igor [email protected]
School of Agriculture and Food Sciences, UQ, April 8, 2015
ResearchComputing
Centre@ UQ
Genomics Virtual Laboratory
Genome scale experiments are relatively cheap and very popular- cost of high throughput sequencing is going down- available data (genomes, transcripts etc)
Analysis of NGS data is a bottleneck (infrastructure, skills)
Genomics Virtual Lab: take the IT out of Bioinformatics- web-based resources (biologists-friendly)- DIY bioinformatics environment (for geeks)
GVL advantages: - public resources (no charges to users)- available immediately
GVL products and servicesGenomics Virtual Lab: genome.edu.au
The main aim: facilitate the genomics research in Australia
Galaxy:• Tutorials and protocols (nextGen sequencing)• Galaxy for tutorials: galaxy-tut.genome.edu.au• Galaxy for full-scale analysis: galaxy-qld.genome.edu.au• “roll your own” Galaxy on the Australian government
funded computer infrastructure (NeCTAR cloud)+ ipython Notebook+ RStudio
Deploy your own computer cluster (NeCTAR cloud)
Mirror of UCSC Genome BrowserRStudio
LearnUseGet
Info
Galaxy: how does it look like
Tools Working window History
Galaxy: possibilitiesYou can:- analyze genome-scale nextGen sequencing data without bash scripting- work with big datasets, genomic regions, sequences etc.- create and use workflows (record steps of your analysis)- share results and workflows with a user or make it available to anyone
Data import: - upload through the web interface - ftp (for big datasets)
Public data: - UCSC Genome Browser- UCSC Archaea - Microbial data - EBA SRA
Over 2,000 tools available through the Galaxy tool shed
Use: local Galaxy-qld server
GVL Galaxy in Queensland: galaxy-qld.genome.edu.au
- BWA, bowtie, bowtie2- Velvet (microbial genome assembly)- Trinity (de novo transcript assembly)- tophat, tophat2 (RNA-Seq)- DESeq, edgeR, Cufflinks (differential gene expression)- Variant detection tools- Metagenomics tools- MACS, MACS2, SPP (ChIP-Seq)- SAMtools- Picard
100s users1000s jobs per month
up to 1 Tb per user (for the UQ users)
Data manipulation on Galaxy-qld
GVL Galaxy in Queensland: galaxy-qld.genome.edu.au
Useful tools for data manipulation:
- FASTA manipulation - MEME (identification of motifs)- BLAST search- Text manipulation: add column, merge, cut, trim, compute expression etc.- Filter and Sort- Join, Subtract and Group- Format conversion (genomics)- Operate on Genomics Intervals (including Fetch closest feature)- Statistics
Good user practice for Galaxy-qld
GVL Galaxy in Queensland: galaxy-qld.genome.edu.au
Register with your UQ email and get a bigger disk allocation.
Use ftp for big datasets – it is faster. Galaxy recognises .gz compression.
Do not store unneeded datasets. Delete temporary files such as SAM. Purge deleted datasets.
Do not start many big jobs in parallel (BWA, bowtie, bowtie2, tophat, tophat2, velvet, trinity).
Create and use workflows for multi-step analysis.
Specify the quality score encoding for nextGen sequencing data (FASTQ files).
Mirror of UCSC Genome Browser
ucsc.genome.edu.au- full mirror, regular update- keep user data for a long time
Use: RStudio
http://gvl-rstudio.genome.edu.au/rstudio/
Based on the GVL clusterGenome data from Galaxy
Email to:[email protected] the registration
Genomics Virtual Lab: LearnGenomics VL site: genome.edu.au
Easy-to-follow Galaxy tutorials (DIY, online)A dedicated Galaxy server: galaxy-tut.genome.edu.auTopics: RNA-Seq, variant detection, ChIP-Seq, microbial genome assembly …
Training through QFAB (with a nominal fee): qfab.org
GVL Get: roll your own Galaxy
Default NeCTAR allocation for the UQ users: 2 CPUs, 8 GB RAM
Start you own virtual computer cluster on the NeCTAR cloud
Start your own Galaxy on the NeCTAR cloud- admin rights (can add tools)- as powerful as needed (based on allocation)- ability to add worker nodes- ipython Notebook- RStudio
Detailed instructions are available on the Genomics VL siteFollow announcements on QFAB web site: qfab.org
SummaryGVL provides resources for genomics research:- learn & Galaxy-tut - local Galaxy-qld - roll your own
We are interested in users and the feedback
What you want to do?
Any special needs? (tools, datasets, resources)
What you want to learn?
Do you want to share / promote your workflows with other people?
Talk to us: Igor [email protected]
Thank you!GVL site: www.genome.edu.auGalaxy for tutorials: galaxy-tut.genome.edu.auGalaxy Queensland: galaxy-qld.genome.edu.au
Contributors and participants: