Transcriptome analysis

10
Transcriptome analysis With a reference Challenging due to size and complexity of datasets Many tools available, driven by biomedical research GATK and R/Bioconductor offer many options Start by mapping reads to reference genome with a mapping/alignment tool – deal with exon-intron junctions Reconstruct transcripts from mapped reads – deal with alternate splicing products Calculate relative abundance of different transcripts Estimate biological significance based on annotation Example tools: Bowtie/TopHat, Cufflinks, Myrna BIT 815: Analysis of Deep Sequencing Data

description

BIT 815: Analysis of Deep Sequencing Data. Transcriptome analysis. With a reference Challenging due to size and complexity of datasets Many tools available, driven by biomedical research GATK and R/ Bioconductor offer many options - PowerPoint PPT Presentation

Transcript of Transcriptome analysis

Page 1: Transcriptome  analysis

Transcriptome analysis

• With a reference– Challenging due to size and complexity of datasets– Many tools available, driven by biomedical research– GATK and R/Bioconductor offer many options– Start by mapping reads to reference genome with a

mapping/alignment tool – deal with exon-intron junctions– Reconstruct transcripts from mapped reads – deal with

alternate splicing products– Calculate relative abundance of different transcripts– Estimate biological significance based on annotation– Example tools: Bowtie/TopHat, Cufflinks, Myrna

BIT 815: Analysis of Deep Sequencing Data

Page 2: Transcriptome  analysis

Workflow summary from a review “From RNA-seq reads to differential expression results”, by Oshlack et al, Genome Biol 11:220, 2010.

Note emphasis on statistical analysis methods; an equal emphasis should be placed on experimental design.

Page 3: Transcriptome  analysis

The ‘Tuxedo’ suite of programs:Bowtie, TopHat, Cufflinks and CummeRbund

See Trapnell et al, Nature Protocols 7:562 – 578, 2012 for details

BIT 815: Analysis of Deep Sequencing Data

Page 4: Transcriptome  analysis

•TopHat maps reads•Cufflinks assembles transcripts•Cuffmerge merges transcript data detected in different treatments•Cuffdiff evaluates differential expression•CummeRbund provides visualization tools

Page 5: Transcriptome  analysis

BIT 815: Analysis of Deep Sequencing Data

Why merge data across treatments?

Page 6: Transcriptome  analysis

BIT 815: Analysis of Deep Sequencing Data

Differential transcript abundance mechanisms

Page 7: Transcriptome  analysis

Transcriptome analysis

• Without a reference– First step is assembly – Transcriptome assembly pipelines

• Velvet/Oases – Oases is a post-assembly processor for Velvet• Trans-ABySS (BCGSC) – based on ABySS parallel assembler• Rnnotator – based on Velvet• Trinity (Broad Institute) – a set of three programs

– Common strategy: Assembly at multiple k-values, then merging of resulting contigs, followed by refinement

– Once an assembly is available, continue with analysis as before

BIT 815: Analysis of Deep Sequencing Data

Page 8: Transcriptome  analysis

After Transcriptome Assembly…

BIT 815: Analysis of Deep Sequencing Data

• Some amount of analysis of differential splicing versus differential promoter activity is possible, but conclusions may be less robust in the absence of a reference

• The fraction of the total number of genes that can be discovered by RNA-seq depends on the diversity of tissue types and developmental stages analyzed, as well as the depth of sequencing

Page 9: Transcriptome  analysis

330 million SOLiD reads from a human cell line detect only about 67% of all annotated transcripts in the human genome.

Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Labaj et al, Bioinformatics 27:i383-91, 2011

Page 10: Transcriptome  analysis

Transcriptome analysis with RSEMRNA-Seq with Expectation Maximization

Li & Dewey, BMC Bioinformatics 12:323, 2011

BIT 815: Deep Sequencing

(a). Allows estimation of transcript abundance without a reference genome, based on alignments to assembled transcripts, although the

transcripts can be taken from a reference genome sequence if it is available

(b). Uses the Bowtie aligner by default, but considers reads that map to multiple locations in the reference transcript collection

(c). For each sample, files of estimated transcript and isoform abundance are produced, along with SAM files of alignments.

(d). The files of transcript and isoform abundance can be used to evaluate differential expression using tools from R and Bioconductor