Alternative Splicing Prediction -...
Transcript of Alternative Splicing Prediction -...
Alternative SplicingAlternative SplicingPredictionPrediction
Fundamentals of Genetics
DNA
"Central Dogma of Molecular Biology"
Fundamentals of Genetics
mRNA
DNA
Transcription
"Central Dogma of Molecular Biology"
Fundamentals of Genetics
mRNA
protein
DNA
TranslationTranscription
"Central Dogma of Molecular Biology"
Fundamentals of Genetics
pre-mRNA
Fundamentals of Genetics
pre-mRNA
Splicing
mRNA
protein
Translation
mRNA SplicingDNA
mRNA SplicingDNA
5' 3'
pre-mRNA
mRNA SplicingDNA
intronsexons
Fundamentals of Genetics
5' 3'
pre-mRNA
mRNA Splicing
introns
exons
DNA
pre-mRNA
mRNA Splicing
mRNA
introns
exons
DNA
mRNA Splicing
mRNA Splicing
pre-mRNApre-mRNA
mRNA mRNA
Alternative Splicing
pre-mRNApre-mRNA
mRNA variantsmRNA variants
Alternative Splicing
Exon Skipping Intron Retention
Alt. 3'Alt. 5' Alt. Both
pre-mRNApre-mRNA
mRNA variantsmRNA variants
Alternative Splicing
Why Study Alternative Splicing?
Multiple transcripts
One gene
Why Study Alternative Splicing?
Regulation
Multiple transcripts
One gene
Why Study Alternative Splicing?
Regulation
Multiple transcripts
One gene
Protein Diversity
Why Study Alternative Splicing?
Regulation
Disease
Multiple transcripts
One gene
Protein Diversity
Why Study Alternative Splicing?
Regulation
Disease~23,000 genes ~20,000 genes
Multiple transcripts
One gene
Protein Diversity
mRNA Sequences
Conventional sequencing● up to full-length mRNA transcripts● costly
mRNA sequence
Genome
mRNA Sequences
fl-cDNA
ESTs
mRNA Sequences
fl-cDNA
ESTs
Gene Models
● Start, end, introns, exons, etc.● “wet-lab” results● computational results (ESTs)
Description of known features
RNA-Seq
fl-cDNA
EST
RNA-Seq
"Next-Generation" sequencing● short "reads"● cheap, plentiful
Ungapped Alignment
mRNA sequence
DNA sequence
TGTTTTTTACCAGGAGTTGCCAAGAATTGGCCAATGCCTTCTTACGACC
GAATTGGCCAATGCCTTCTTAC
GAATTGGCCAATGCCTTCTTAC
Spliced Alignment
TGATTCAGTCATCACTTTAAGAGCCATGGAGT
short readshort read
Spliced Alignment
TGATTCAGTCATCA .........
TGATTCAGTCATCA
TGATTCAGTCATCACTTTAAGAGCCATGGAGT
short readshort read
Genomic referenceGenomic reference
Spliced Alignment
TGATTCAGTCATCA ......... CTTTAAGAGCCATGGAGT
TGATTCAGTCATCA CTTTAAGAGCCATGGAGT
TGATTCAGTCATCACTTTAAGAGCCATGGAGT
short readshort read
Genomic referenceGenomic reference
Spliced Alignment
TGATTCAGTCATCA GT.....AG CTTTAAGAGCCATGGAGT
TGATTCAGTCATCA CTTTAAGAGCCATGGAGT
TGATTCAGTCATCACTTTAAGAGCCATGGAGT
short readshort read
Genomic referenceGenomic reference
Anchor Regions
TGATTCAGTCATCA GT.....AG CTTTAAGAGCCATGGAGT
TGATTCAGTCATCA CT
Genomic referenceGenomic reference
2nt anchor: P(match by chance) = 1/16
Anchor region: minimum length by which a readoverlaps a junction on either side
Anchor Regions
TGATTCAGTCATCA GT.....AG CTTTAAGAGCCATGGAGT
TGATTCAGTCATCA CTTTAAGA
Genomic referenceGenomic reference
8nt anchor: P(match by chance) = 1/48
= 1/65,536
Anchor region: minimum length by which a readoverlaps a junction on either side
File Formats
Sequences: FASTQ
File Formats
Gene Models: GFF3
File Formats
Gene Models: GTF
File Formats
Alignments: SAM
Splice Graphs vs. Transcripts
Transcripts
Splice Graphs vs. Transcripts
Transcripts
fl-cDNA
Splice Graphs vs. Transcripts
Splice Graph
Transcripts
fl-cDNA
Splice Graphs vs. Transcripts
Splice Graph
Transcripts
fl-cDNA, ESTsRNA-Seq
fl-cDNA
SpliceGrapherGene ModelGene Model
SpliceGrapher
ESTsESTs
Gene ModelGene Model
SpliceGrapher
RNA-SeqRNA-Seq ESTsESTs
Gene ModelGene Model
RNA-Seq DataUngapped alignmentsUngapped alignments
Predicted Splice GraphPredicted Splice Graph
RNA-Seq DataUngapped alignmentsUngapped alignments
Predicted Splice GraphPredicted Splice Graph
RNA-Seq DataUngapped alignmentsUngapped alignments
Spliced AlignmentsSpliced Alignments
Predicted Splice GraphPredicted Splice Graph
Spliced AlignmentsSpliced Alignments
RNA-Seq DataUngapped alignmentsUngapped alignments
Predicted Splice GraphPredicted Splice Graph
offset = 14nt
Challenges with RNA-Seq
Short Reads
Challenges with RNA-Seq
Short Reads
Ambiguous OriginsAmbiguous Origins
Challenges with RNA-Seq
Short Reads
Ambiguous OriginsAmbiguous Origins
Variable CoverageVariable Coverage
Challenges with RNA-Seq
Short Reads
Ambiguous OriginsAmbiguous Origins
Variable CoverageVariable Coverage
Challenges with RNA-Seq
Short Reads
Ambiguous OriginsAmbiguous Origins
Variable CoverageVariable Coverage
Challenges with RNA-Seq
Short Reads
Ambiguous OriginsAmbiguous Origins
Variable CoverageVariable Coverage
Highly Localized EvidenceHighly Localized Evidence
Validating Splice Sites
Splice SiteSVM
GeneModels
ESTAlignments
Validating Splice Sites
Splice SiteSVM
GeneModels
ESTAlignments
Accuracy ~87-97%
Validating Splice Sites
Genomic referenceGenomic reference
Splice SiteSVM
TCATGTCTTCATGTTTGCGGTAAGAGGTAGTCATCACTTTAAGAG
GeneModels
ESTAlignments
Accuracy ~87-97%
Validating Splice Sites
Genomic referenceGenomic reference
Splice SiteSVM
TCATGTCTTCATGTTTGCGGTAAGAGGTAGTCATCACTTTAAGAG
GeneModels
ESTAlignments
Accuracy ~87-97%
Other Approaches
● Splice graph prediction Sircah (EST only)
Other Approaches
● Splice graph prediction Sircah (EST only)
● Transcript prediction BowTie/TopHat/Cufflinks HashMatch/Supersplat/TAU Scripture
Other Approaches
● Splice graph prediction Sircah (EST only)
● Transcript prediction BowTie/TopHat/Cufflinks HashMatch/Supersplat/TAU Scripture
Results
AS Predictions for AS Predictions for A. thalianaA. thaliana
Results
AS Predictions for AS Predictions for A. thalianaA. thaliana
Results
AS Predictions for AS Predictions for A. thalianaA. thaliana
Results
Results
Results
Results
Results
Example 1 - Cufflinks
Example 1 - TAU
Example 1 - SpliceGrapher
Example 2 - Cufflinks
Example 2 - TAU
Example 2 - SpliceGrapher
Example 3 - Cufflinks
Example 3 - TAU
Example 3 - SpliceGrapher
Conclusions
● Uses gene models, ESTs, and RNA-seq
Conclusions
● Uses gene models, ESTs, and RNA-seq● Conservative splice graph predictions
Curated gene models establish context Accurate splice site models
Conclusions
● Uses gene models, ESTs, and RNA-seq● Conservative splice graph predictions
Curated gene models establish context Accurate splice site models
● Visualization aids
Ongoing Analyses
PlantsA.thaliana V.vinifera B.distachyon G.max O.sativa
MammalsB.taurusH.sapiens
More Information
Funding from NSF award 0743097
Sofware: splicegrapher.sourceforge.netResults: http://combi.cs.colostate.edu/SpliceGrapher