RNA-seq: Analysis options - GitHub Pages...GTF annotation file (transcriptome) Count matrix...

18
RNA-seq: Analysis options

Transcript of RNA-seq: Analysis options - GitHub Pages...GTF annotation file (transcriptome) Count matrix...

Page 1: RNA-seq: Analysis options - GitHub Pages...GTF annotation file (transcriptome) Count matrix generated from BAM using featurecounts Differential Expression Analysis Workflow #2 multiple

RNA-seq: Analysis options

Page 2: RNA-seq: Analysis options - GitHub Pages...GTF annotation file (transcriptome) Count matrix generated from BAM using featurecounts Differential Expression Analysis Workflow #2 multiple

✓ Genome? ✓ Transcriptome

Sequence reads

DGE with R:DESeq2, EdgeR, limma:voom

(+reference transcriptome index)

Count matrix generated using

tximport

DGE or isoform-level DE with R: Sleuth

Pseudocounts with Kallisto, Sailfish, Salmon

FASTQ

Biological samples/Library preparation

Differential Expression Analysis Workflow #1

Page 3: RNA-seq: Analysis options - GitHub Pages...GTF annotation file (transcriptome) Count matrix generated from BAM using featurecounts Differential Expression Analysis Workflow #2 multiple

multiple BAMs

Sequence reads

Quality control: FASTQC

DGE with R:DESeq2, EdgeR, limma:voom

FASTQ (+known GTF, optional)(+reference genome index)

(+reference transcriptome index)

Count matrix generated using

tximport

DGE or isoform-level DE with R: Sleuth

Pseudocounts with Kallisto, Sailfish, Salmon Quality control: Qualimap

Quality control: MultiQC

FASTQ

Biological samples/Library preparation

Alignment to Genome:HISAT2, STAR

✓ Genome? ✓ Transcriptome

Differential Expression Analysis Workflow #1

Page 4: RNA-seq: Analysis options - GitHub Pages...GTF annotation file (transcriptome) Count matrix generated from BAM using featurecounts Differential Expression Analysis Workflow #2 multiple

multiple BAMs(+known GTF)

Sequence reads

Alignment to Genome:HISAT2, STAR

DGE with R:DESeq2, EdgeR,

limma:voom

Count reads associated with genes:

htseq-count, featureCounts

FASTQ(+known GTF, optional)

(+reference genome index)

✓ Genome ✓ GTF annotation file

(transcriptome)

Count matrix generated from BAM using featurecounts

Differential Expression Analysis Workflow #2

Page 5: RNA-seq: Analysis options - GitHub Pages...GTF annotation file (transcriptome) Count matrix generated from BAM using featurecounts Differential Expression Analysis Workflow #2 multiple

multiple BAMs(+known GTF)

Sequence reads

Alignment to Genome:HISAT2, STAR

DGE with R:DESeq2, EdgeR,

limma:voom

Count reads associated with genes:

htseq-count, featureCounts

FASTQ

FASTQ (+known GTF, optional)(+reference genome index)

✓ Genome ✓ GTF annotation file

(transcriptome)

Count matrix generated from BAM using featurecounts

multiple BAMs

Quality control: Qualimap

Quality control: MultiQC

Differential Expression Analysis Workflow #2

Quality control: FASTQC

Page 6: RNA-seq: Analysis options - GitHub Pages...GTF annotation file (transcriptome) Count matrix generated from BAM using featurecounts Differential Expression Analysis Workflow #2 multiple

multiple BAMs(+known GTF)

Sequence reads

Alignment to Genome:HISAT2, STAR

DGE with R:DESeq2, EdgeR,

limma:voom

Count reads associated with genes:

htseq-count, featureCounts

FASTQ

FASTQ (+known GTF, optional)(+reference genome index)

✓ Genome ✓ GTF annotation file

(transcriptome)

Count matrix generated from BAM using featurecounts

multiple BAMs

Quality control: Qualimap

Quality control: MultiQC

https://hbctraining.github.io/

Intro-to-rnaseq-hpc-O2/

https://hbctraining.github.io/

DGE_workshop/

Differential Expression Analysis Workflow #2

Quality control: FASTQC

Page 7: RNA-seq: Analysis options - GitHub Pages...GTF annotation file (transcriptome) Count matrix generated from BAM using featurecounts Differential Expression Analysis Workflow #2 multiple

Reference-based assembly• Genome is known

Alternative methods: transcriptome assembly

Page 8: RNA-seq: Analysis options - GitHub Pages...GTF annotation file (transcriptome) Count matrix generated from BAM using featurecounts Differential Expression Analysis Workflow #2 multiple

Reference-based assembly• Genome is known• Transcriptome not available or is not good enough

Alternative methods: transcriptome assembly

Page 9: RNA-seq: Analysis options - GitHub Pages...GTF annotation file (transcriptome) Count matrix generated from BAM using featurecounts Differential Expression Analysis Workflow #2 multiple

Reference-based assembly• Genome is known• Transcriptome not available or is not good enough• Cufflinks and Scripture are two reference-based transcriptome assemblers

Alternative methods: transcriptome assembly

Page 10: RNA-seq: Analysis options - GitHub Pages...GTF annotation file (transcriptome) Count matrix generated from BAM using featurecounts Differential Expression Analysis Workflow #2 multiple

Reference-based assembly• Genome is known• Transcriptome not available or is not good enough• Cufflinks and Scripture are two reference-based transcriptome assemblers• Additional annotation of any newly-discovered genes or isoforms will need

to be generated

Alternative methods: transcriptome assembly

Page 11: RNA-seq: Analysis options - GitHub Pages...GTF annotation file (transcriptome) Count matrix generated from BAM using featurecounts Differential Expression Analysis Workflow #2 multiple

De novo assembly• Genome is not known, or is of poor quality

Alternative methods: transcriptome assembly

Page 12: RNA-seq: Analysis options - GitHub Pages...GTF annotation file (transcriptome) Count matrix generated from BAM using featurecounts Differential Expression Analysis Workflow #2 multiple

De novo assembly• Genome is not known, or is of poor quality• Amount of data needed is greater than for a reference-based assembly

Alternative methods: transcriptome assembly

Page 13: RNA-seq: Analysis options - GitHub Pages...GTF annotation file (transcriptome) Count matrix generated from BAM using featurecounts Differential Expression Analysis Workflow #2 multiple

De novo assembly• Genome is not known, or is of poor quality• Amount of data needed is greater than for a reference-based assembly• Oases, TransABySS, Trinity are examples of well-regarded transcriptome

assemblers, especially Trinity

Alternative methods: transcriptome assembly

Page 14: RNA-seq: Analysis options - GitHub Pages...GTF annotation file (transcriptome) Count matrix generated from BAM using featurecounts Differential Expression Analysis Workflow #2 multiple

De novo assembly• Genome is not known, or is of poor quality• Amount of data needed is greater than for a reference-based assembly• Oases, TransABySS, Trinity are examples of well-regarded transcriptome

assemblers, especially Trinity• Newly-discovered genes or isoforms will need to be annotated using

homolog-based and other methodologies

Alternative methods: transcriptome assembly

Page 15: RNA-seq: Analysis options - GitHub Pages...GTF annotation file (transcriptome) Count matrix generated from BAM using featurecounts Differential Expression Analysis Workflow #2 multiple

Reference-based assembly

De novo assembly

Transcriptome Assembly

Martin J.A. and Wang Z., Nat. Rev. Genet. (2011) 12:671–682

Page 16: RNA-seq: Analysis options - GitHub Pages...GTF annotation file (transcriptome) Count matrix generated from BAM using featurecounts Differential Expression Analysis Workflow #2 multiple

Sequence reads

Annotate the genes/transcripts

Pseudocounts with Kallisto, Sailfish, Salmon

DGE with R:DESeq2, EdgeR, limma:voom

DGE or isoform-level DE with R: Sleuth

Count matrix generated using

tximport

Differential Expression Analysis Workflow #3

Alignment to Genome:HISAT2, STAR

Merge assemblies from all samples

Reference-based transcriptome assembly

Quality control: FASTQC

Page 17: RNA-seq: Analysis options - GitHub Pages...GTF annotation file (transcriptome) Count matrix generated from BAM using featurecounts Differential Expression Analysis Workflow #2 multiple

Sequence reads

de novo assembly with

Trinity

Annotate the genes/transcripts

Differential Expression Analysis Workflow #4

Pseudocounts with Kallisto, Sailfish, Salmon

DGE with R:DESeq2, EdgeR, limma:voom

DGE or isoform-level DE with R: Sleuth

Count matrix generated using

tximport

Quality control: FASTQC

Page 18: RNA-seq: Analysis options - GitHub Pages...GTF annotation file (transcriptome) Count matrix generated from BAM using featurecounts Differential Expression Analysis Workflow #2 multiple

These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.