Transcript Alignment Assembly and Automated Gene...
Transcript of Transcript Alignment Assembly and Automated Gene...
![Page 1: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/1.jpg)
Transcript Alignment Assembly andAutomated Gene Structure Improvements
Using PASA-2
Mathangi [email protected]
Rice Genome Annotation WorkshopMay 23rd, 2007
-2
![Page 2: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/2.jpg)
About PASA
PASA is an open source free to download softwareprogram written by Brian Haas ([email protected])
Reference :Its original application is described in:
Haas, B.J., Delcher, A.L., Mount, S.M., Wortman, J.R., Smith Jr,R.K., Jr., Hannick, L.I., Maiti, R., Ronning, C.M., Rusch, D.B., Town,C.D. et al. (2003) Improving the Arabidopsis genome annotationusing maximal transcript alignment assemblies. Nucleic Acids Res,31, 5654-5666.
![Page 3: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/3.jpg)
Topics Outline
Overview of the PASA Pipeline Alignment Assembly Algorithm Annotation comparison
![Page 4: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/4.jpg)
FL-cDNAs and ESTs
“Gold standard” for gene structure resolution• Introns and exons via spliced alignment
Direct evidence for:• Alternative splicing• Untranslated regions (UTRs)• Polyadenylation sites
![Page 5: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/5.jpg)
The PASA Pipeline
Automate incorporation of transcript alignmentsinto gene structure annotations
It was originally developed to refine genestructures in Arabidopsis as part of ourArabidopsis re-annotation effort.
Since that time, we’ve expanded the pipeline andapplied it to a range of other organisms at TIGR,now with a special focus on Rice.
![Page 6: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/6.jpg)
Influxes of mRNA SequencesAfter Initial Genome Releases
1000
10000
100000
1000000
1999 2000 2001 2002 2003 2004 2005 2006
human
mouse
Drosophila
Arabidopsis
Dec.2000
Mar.2000
Feb.2001Dec.2002
![Page 7: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/7.jpg)
Additionally Found Uses of PASA
Automated generation of training sets for GeneFinders (Aedes, Aspergillus, Tetrahymena)
Evaluation of EST libraries (Tetrahymena) examine redundancy within EST library selection of clones for full-length sequencing
Transitive gene structure annotation for closelyrelated species (Aspergillus sp.)
Comparing different annotation methods on thesame contigs (Plasmodium vivax)
Cataloging polyA sites for more detailed studies(Arabidopsis, Rice)
![Page 8: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/8.jpg)
The PASA Pipeline [at a glance]
Align transcripts to genome
Assemble the alignmentsPASAPASA: PProgram to AAssemble SSpliced AAlignments
Compare alignment assemblies to existing annotations, suggest updates
![Page 9: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/9.jpg)
PASA Pipeline
Seqclean
Align to Genome
Cluster overlapping alignments
PASA alignment assembly
subCluster PASA assemblies
Compare to annotation
Update annotation
Transcript Sequences
Seqclean (TIGR Gene Indices)•vector removal•poly-A identification, stripping•trash low quality seqs
![Page 10: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/10.jpg)
PASA Pipeline
Seqclean
Align to Genome
Cluster overlapping alignments
PASA alignment assembly
subCluster PASA assemblies
Compare to annotation
Update annotation
Transcript Sequences
BLAT and sim4 spliced alignments
Valid alignment criteria:
• min 95% Identity min 90% transcript length aligned (both configurable parameters)• consensus splice sites
•(GT,GC) donors•AG acceptor
• Assign Transcribed Orientations•Splice sites•Polyadenylation sites
![Page 11: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/11.jpg)
PASA Pipeline
Seqclean
Align to Genome
PASA alignment assembly
subCluster PASA assemblies
Compare to annotation
Update annotation
Transcript Sequences
BLAT and sim4 spliced alignments
Cluster overlapping alignments
![Page 12: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/12.jpg)
PASA Pipeline
Seqclean
Align to Genome
Cluster overlapping alignments
PASA alignment assembly
subCluster PASA assemblies
Compare to annotation
Update annotation
Transcript Sequences
BLAT and sim4 spliced alignments
![Page 13: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/13.jpg)
PASA Pipeline
Seqclean
Align to Genome
Cluster overlapping alignments
PASA alignment assembly
subCluster PASA assemblies
Compare to annotation
Update annotation
Transcript Sequences
BLAT and sim4 spliced alignments
![Page 14: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/14.jpg)
PASA Pipeline
Seqclean
Align to Genome
Cluster overlapping alignments
PASA alignment assembly
subCluster PASA assemblies
Compare to annotation
Update annotation
Transcript Sequences
Annotation ComparisonFL-cDNAs and ESTstreated separately withdifferent rules for incorporation
Annotation Updates-exon modifications-alt splice isoform additions-gene merges-gene splits-new genes
![Page 15: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/15.jpg)
Alignment Assembly
Maximize evidence supporting gene structures.
(Maximum evidence) ~ (Maximum # alignments)
Goal: find maximal assembly of compatiblealignments.
![Page 16: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/16.jpg)
Alignment Assembly using PASA:Program to Assemble Spliced Alignments
Maximally Assemble Compatible Alignments
•Assemblies
5’ 3’
![Page 17: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/17.jpg)
Alignment Assembly using PASA:Program to Assemble Spliced Alignments
Maximally Assemble Compatible Alignments
•Assemblies
5’ 3’
![Page 18: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/18.jpg)
Alignment Assembly using PASA:Program to Assemble Spliced Alignments
Maximally Assemble Compatible Alignments
•Assemblies
5’ 3’
![Page 19: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/19.jpg)
PASA Algorithm
Containments preclude the simplechaining of compatible alignments (B iscontained within A)
ABC
~ :compatible!~ :not compatible
A ~ BB ~ CA !~ C
![Page 20: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/20.jpg)
PASA AlgorithmFinding the Single Maximal Assembly
Determine pairwise compatibilities
Determine pairwise containments Ca = # alignments contained in a, including a
Sort list of alignments by left-most coordinate
Chain compatible alignments, summing unique containments. {Create Left Path Graph, chain compatible alignments from left to right}
Solve by dynamic programming
La = maximal chain of alignments originatingfrom the left of alignment a and ending at a.
Find maximal assembly as the chain with maximal # alignments.
![Page 21: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/21.jpg)
PASA AlgorithmFind Maximal Assemblies for Missing
Alignments (Alt Spliced Isoforms)
Create reciprocal {right path} graph{chain compatible alignments from right to left}
Ra = maximal chain of alignments originatingfrom the Right of alignment a and ending at a.
For each missing alignment a, find the maximal assembly containing a
(restated as sum of left and right paths)
![Page 22: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/22.jpg)
Annotation ComparisonThe PASA Pipeline [Capabilities]
Then (NAR, 2003) : Update gene structures:
- Changes in introns and exons- UTR additions
Model additional gene structures- Alternative splicing isoforms- New gene models
Now, PASA-2 (above plus following enhancements) : Gene merging Gene splitting Antisense classification Polyadenylation sites
![Page 23: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/23.jpg)
Incorporation of PASA assembliesinto the annotation
FL-assemblies contain at least one FL-cDNA, expected to
encode all exons, complete protein, possiblyUTRs.
non-FL-assemblies encode part of a gene:
- part of one or more exons- potentially UTRs.
![Page 24: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/24.jpg)
Full-length cDNAs Provide Complete Gene Structures(hence, full-length Assemblies too!)
Full-length cDNA
GappedAlignment
Genomic DNA
•cDNA-genome spliced alignment•ORF reconstruction based on the joined exons.•UTRs identified.•Automated process
AAAAAAAAAAAAAA
Poly-A site
![Page 25: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/25.jpg)
FL-assembly-based updates
Existing model:
::FL-assembly-based model replaces the existing model= CDS= cDNA
FL-assembly-based model:
UTR Different Introns/Exons UTR
![Page 26: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/26.jpg)
Non-FL-assembly-based updates
Existing model:
Non-FL-assembly:*stitching*
Stitched product replaces existing model
![Page 27: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/27.jpg)
Alternative Splicing(incompatible alignment assemblies)
Sets of mutually incompatible alignment assembliesMultiple FL-assembliesFL-assembly(s) and non-FL-assembly(s)Non-FL-assemblies (*pre-existing gene model required)
![Page 28: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/28.jpg)
Minimize Corruption or Pollution of Existing Annotations
Requirements of a FL-assembly Min ORF size requirement
- MIN_PERCENT_PROT_CODING (ie. 40%)- MIN_FL_ORF_SIZE (ie. 100 aa)
Max # UTR exons (ie. 2 or 3)- MAX_UTR_EXONS
Requirements of an annotation update Compared to existing model, must pass validation tests:
- Length test (ie. must encode a protein at least 70% the length of thecurrent one)
- *Maybe trust FL assemblies more than ESTs; can set stringencies separately:- MIN_PERCENT_LENGTH_FL_COMPARE (involving FL-assemblies)- MIN_PERCENT_LENGTH_NONFL_COMPARE (involving non-FL assemblies)
- Homology test [Fasta Alignment] (ie. 70% identity, 70% length)- MIN_PERID_PROT_COMPARE (ie. 70% identity)- MIN_PERCENT_ALIGN_LENGTH (ie. 70% of the shorter protein length)
* all user-configurable parameters, option names shown in italics
![Page 29: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/29.jpg)
Enhancements: Gene Merging (FL-cDNA)
If FL-ORF_SPAN overlaps both gene1 and gene2[, ... geneX] by at leastMIN_PERCENT_OVERLAP_GENE_REPLACE, gene1 and gene2[,... geneX] are to be merged and replaced by the FL-assembly basedgene.
gene1
gene2FL-assemblybased gene
FL-ORF_SPAN
![Page 30: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/30.jpg)
reconstructedgene
Enhancements: Gene Merging (non-FL)
Same rule as before, using ORF_SPAN andMIN_PERCENT_OVERLAP_GENE_REPLACE
gene1
gene2
ORF_SPAN
stitch into overlapping modelsnonFL assembly
![Page 31: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/31.jpg)
Enhancements: GeneSplitting
Requires multiple FL-assemblies from distinct sub clusters map to the
same gene have the same transcribed orientation, and the min and max of the new ORFs must cover at least
MIN_PERCENT_OVERLAP_GENE_REPLACE of the gene tobe split.
FL-assemblies
Existing gene
![Page 32: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/32.jpg)
Gene Merging and GeneSplitting
Homology (used loosely) between the existinggene and the replacement is not required.
Only require that the locus of interest continuesto be covered by ORFs.
Why? Merged and split genes may appear very different from
the existing [predicted] gene. One of the split products may look quite similar to the
preexisting gene, but the other may not. Our experience is that the existing methodology of
splitting and merging works quite well, and we haven’tneeded to explore additional methods.
![Page 33: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/33.jpg)
Want more aggressive updates?
Besides merging and splitting, individual geneupdates must pass the homology test. Failuresrequire manual inspection.
But, many that fail homology may still providereasonable, and improved gene structureupdates.
Option (flag): STOMP_HIGH_PERCENTAGE_OVERLAPPING_GENE If update fails the homology test, consider the
ORF_SPAN alone. if ORF_SPAN > MIN_PERCENT_OVERLAP_GENE_REPLACE,
allow update to occur.
Called STOMPing
![Page 34: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/34.jpg)
Trusting the FL-Status
Solution: If a FL-assembly is compatible with anexisting gene annotation, treat it as non-FL
Ideally, FL-transcripts are full length!existing gene
FL-assembly
But, often:existing gene
FL-assembly
update
* option TRUST_FL_STATUS, by default, disabled.
![Page 35: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/35.jpg)
Example Application of PASA to Rice
![Page 36: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/36.jpg)
Results fromAnnotationComparison(Counting PASA
assemblies)
cgi-bin/status_report.cgi
![Page 37: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/37.jpg)
Gene Comparison Summary(Counting Genes) cgi-bin/status_report.cgi
![Page 38: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/38.jpg)
Gene Structure Updates Summary
![Page 39: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/39.jpg)
Examining Updates(clicking any link in the previous report)
![Page 40: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/40.jpg)
Assembly Report Page
![Page 41: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/41.jpg)
Examples of Classified UpdatesFL adds/extends UTRs
FL extends protein
FL updates structure (passes homology test)
FL updates structure (fails homology, passes ORF span)
![Page 42: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/42.jpg)
FL merges genes
FL split gene
FL novel gene
![Page 43: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/43.jpg)
EST extends UTRs
EST extends protein
EST updates structure (passes homology test)
![Page 44: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/44.jpg)
EST updates structure (fails homology test, passes ORF span)
EST merges multiple genes
![Page 45: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/45.jpg)
A tool for Studying Alternative Splicing
Unspliced Introns: 45%
Alt donor/acceptor: 32%
Start/end in intron: 34%
Exon skipping: 8.4%
Alternate exon: 7.4%*categories overlap due to combinations
Distribution of splicing variations is similar to those described in Arabidopsis.
Evidence for >5000 genes alternatively spliced
![Page 46: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/46.jpg)
PASA Pipeline Application Framework
Web browser Shell Terminal
MySQL DatabaseText files includingFasta formatted sequence databases and config files.
CGI scripts,run from Apache
Perl Scripts,C++ program
••UI TierUI Tier
••App TierApp Tier
••Data TierData Tier
![Page 47: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/47.jpg)
PASA Documentationhttp://pasa.sf.net
![Page 48: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/48.jpg)
Obtaining PASA
http://www.tigr.org/software
![Page 49: Transcript Alignment Assembly and Automated Gene …rice.plantbiology.msu.edu/training/Haas_PASA2.pdf · Transcript Alignment Assembly and Automated Gene Structure Improvements Using](https://reader030.fdocuments.in/reader030/viewer/2022020121/5c6bec2609d3f29a768c1656/html5/thumbnails/49.jpg)
QUESTIONS?