RNA-Seq analysis with Astrocyte - BioHPC Portal Home · 2016-09-28 · RNA-Seq analysis with...
Transcript of RNA-Seq analysis with Astrocyte - BioHPC Portal Home · 2016-09-28 · RNA-Seq analysis with...
RNA-Seq analysis with AstrocyteDifferential expression and transcriptome assembly
Beibei Chen Ph.D
BICF
9/28/2016
Agenda
• Launch Workflows using Astrocyte
• BICF Workflows
• BICF RNA-seq Workflow
• Experimental Design Affecting Your Analysis
• Required Inputs
• Astrocyte demo
• Outputs
• Common Errors
• Visualization of Results using Vizapps
• Vizapp Demo
Agenda
• Launch Workflows using Astrocyte
• BICF Workflows
• BICF RNA-seq Workflow
• Experimental Design Affecting Your Analysis
• Required Inputs
• Astrocyte demo
• Outputs
• Common Errors
• Visualization of Results using Vizapps
• Vizapp Demo
Allows groups to give easy-access to their analysis pipelines via the web
Astrocyte – BioHPC Workflow Platform
Standardized Workflows
Simple Web Forms
Online documentation & results visualization*
Workflows run on HPC cluster without developer or user needing cluster knowledge
Slide contribution: David Trudgian@BioHPC
astrocyte.biohpc.swmed.edu
Agenda
• Launch Workflows using Astrocyte
• BICF Workflows
• BICF RNA-seq Workflow
• Experimental Design Affecting Your Analysis
• Required Inputs
• Astrocyte demo
• Outputs
• Visualization of Results using Vizapps
• Common Errors
• Vizapp Demo
Browse workflows
BICF Workflows
• RNASeq
• Differential Expression Analysis
• Germline Variation
• Fastq to annotated VCF
• DNA
• RNA
• Somatic Variation
• Fastq to annotated VCF
Workflow for Genome Analysis
Annotation Sources
• Gene Annotation (Genes, Regulation and TFBS)
• dbSNP, ExAC
• clinvar, gwas catalog
• cosmic
• dbNSFP
• SIFT, Polyphen2, LRT, MutationTaster, MutationAssessor, FATHMM, VEST3, CADD, MetaLR, MetaSVM, PROVEAN, DANN, fathmm-MKL, fitCons
• PhyloP x 2, phastCons x 2, GERP++ and SiPhy
• Allele frequencies in 1000 Genomes Project phase 3 data, UK10K cohorts data, ExAC consortium data and the NHLBI Exome Sequencing Project ESP6500 data
• genesets (MSigDB)
• CIVIC
• BROAD Target
Agenda
• Launch Workflows using Astrocyte
• BICF Workflows
• BICF RNA-seq Workflow
• Experimental Design Affecting Your Analysis
• Required Inputs
• Astrocyte demo
• Outputs
• Visualization of Results using Vizapps
• Common Errors
• Vizapp Demo
Everything's connected slide by Dündar et al. (2015)
General
RNA-seq
Workflow
Experimental Design Affecting Your Analysis
• Whole transcriptome vs mRNA
• Single end vs paired ends• Paired-end produces more accurate alignments
• Paired-end allows for alternative transcript analysis
• Single-end is cheaper
• Number of Reads• 10-50M is a good range
• Read Length• Longer reads produce better alignments, min 50 bp paired
or 100bp single
Experimental Design Affecting Your Analysis
• Number of Samples
• Your power to detect an effect depends on
–Effect size (difference between group means)
–Within group variance
– Sample size
• More Samples the better, min 3 per group
• Five samples sequenced to 20M reads each offer more power than 2 samples sequenced to 50M reads
• Stranded
• Can distinguish expression of overlapping genes
Strand-specific RNA-seq
image from GATC Biotech
How to decide strand
Reverse stranded
Stranded
RNASeq Analysis Pipeline
http://www.utsouthwestern.edu/labs/bioinformatics/services/data-analysis/rnaseq-pipeline.html
Create a new project
Add data to your project
Add data to your project
For NGS experiment, this is recommended.
Make your design file
Make your design file
• Use tab as delimiter
– Excel save as “Text (tab delimited)”
• If no SubjectID, use same number/character for all rows
• If no FqR2, leave them empty
• For all contents, no “-”
• For all contents, no spaces
• Columns names MUST be exactly the same as documented
Comparisons
• Comparisons are based on SampleGroup
– All pair-wise comparisons
– Could be identified by file name
• A_B.edgeR.txt
• Log fold change will be A/B
• If you want B/A, -1*logFC
Select your data files and submit
SELECT YOUR FILES
Project is running
Timeline of the whole run
Download/visualize your results
Vizapp need about 30s to start if there is no queue. You need to refresh the page.
You can also choose individual files to download to your local computer
Agenda
• Launch Workflows using Astrocyte
• BICF Workflows
• BICF RNA-seq Workflow
• Experimental Design Affecting Your Analysis
• Required Inputs
• Astrocyte demo
• Outputs
• Common Errors
• Visualization of Results using Vizapps
• Vizapp Demo
Common errors and solutions
• Make sure the delimiter is tab
• Make sure the column name are the same as mentioned in documentation
• Make sure the file names match
Common errors and solutions
• Not all files are uploaded
• It’s about the proxy setting
• Use auto-detect proxy
Agenda
• Launch Workflows using Astrocyte
• BICF Workflows
• BICF RNA-seq Workflow
• Experimental Design Affecting Your Analysis
• Required Inputs
• Astrocyte demo
• Outputs
• Common Errors
• Visualization of Results using Vizapps
• Vizapp Demo
Vizapp: QC general stat
Vizapp: QC MSD and PCA
Vizapp: Gene Compare
Vizapp: DEA
• Uses edgeR results• Filter gene list by different parameters• Sort by different columns• Data table downloading
Vizapp: DEA heatmap
• Filter gene list by different parameters• Choose different comparisons• Support user define gene list (gene official symbol)• Support pathway
Vizapp: alternative splicing
Vizapp: alternative splicing
Different transcripts’ expression in sample groups
Vizapp: alternative splicing
Vizapp: QuSAGE
• Introduction to BioHPC
– 10/5/2016
– 10:30AM-Noon @ NL6.215
• Please attend so you can get an account to try this out
Acknowledgement
• Brandi Cantarel
• David Trudgian
• BioHPC team
• BICF team
Agenda
• Launch Workflows using Astrocyte
• BICF Workflows
• BICF RNA-seq Workflow
• Experimental Design Affecting Your Analysis
• Required Inputs
• Astrocyte demo
• Outputs
• Visualization of Results using Vizapps
• Common Errors
• Vizapp Demo