Gene Prediction Chengwei Luo, Amanda McCook, Nadeem Bulsara, Phillip Lee, Neha Gupta, and Divya...
-
Upload
jeffery-jefferson -
Category
Documents
-
view
215 -
download
2
Transcript of Gene Prediction Chengwei Luo, Amanda McCook, Nadeem Bulsara, Phillip Lee, Neha Gupta, and Divya...
Gene PredictionChengwei Luo, Amanda McCook, Nadeem Bulsara,
Phillip Lee, Neha Gupta, and Divya Anjan Kumar
Gene Prediction
• Introduction
• Protein-coding gene prediction
• RNA gene prediction
• Modification and finishing
• Project schema
Gene Prediction
• IntroductionIntroduction
• Protein-coding gene prediction
• RNA gene prediction
• Modification and finishing
• Project schema
Why gene prediction?
Exponential growth of sequences
Metagenomics: ~1% grow in lab
New sequencing technology
How to do it?Protein-coding gene prediction
Phillip Lee & Divya Anjan Kumar
Homology Search
ab initio approach
Nadeem Bulsara & Neha Gupta
Homology Search is not Enough!
Biased and incomplete Database
sequenced genomes are not evenly distributed on the tree of life, and does not reflect the diversity accordingly either.
PRODIGALProkaryotic Dynamic Programming Gene Finding Algorithm
Developed at Oak Ridge National Laboratory and the University of Tennessee
EasyGene
Developed at University of Copenhagen
Statistical significance is the measure for gene prediction.
• High quality data set based onsimilarity in SwissPRot isextracted from genome.
• Data set used to estimate theHMM where based on ORF scoreand length statistical significance iscalculated.
Problem:
• No standalone version available
Noncomparative Prediction
Fig: James A. Goodrich & Jennifer F. Kugel, Nature Rev. Mol. Cell Biol. (2006) 7:612
Comparative+Noncomparative
Effective sRNA prediction in V. cholerae
• Non-enterobacteria
• sRNAPredict2
• 32 novel sRNAs predicted
• 9 tested
• 6 confirmed
Jonathan Livny et al. Nucleic Acids Res. (2005) 33:4096
Software
*Rolf Backofen & Wolfgang R. Hess, RNA Biol. (2010) 7:1
Eva K. Freyhult et al. Genome Res. (2007) 17:117
Modification & finishing
• Consensus strategy to integrate ab initio results
• Broken gene recruiting
• TIS correcting
• IS calling
• operon annotating
• Gene presence/absence analysis
Modification & finishingConsensus strategy
pass
pass
fail
Broken gene recruiting
ab initio results
homology search
candidate fragments
Modification & finishingTIS correcting
Start codon redundancy:ATG, GTG, TTG, CTG
Markov iteration, experimental verified data
Leaderless genes