Post on 14-Jan-2016
Eukaryotic Gene Prediction
Rui Alves
How are eukaryotic genes different?
DNA
RNA PolmRNA
RybProtein
How are eukaryotic genes different?
DNA
RNA Pol
RybProtein
mRNA mRNA
SpliceosomemRNA mRNA
Correctly Identifying Splicing sites is not a trivial task
How do we predict splicing sites?
• By Homology
• Ab initio– SS motifs– Codon usage– Exonic Splicing Enhancers– Intronic Splicing Enhancers– Exonic Splicing Silencers– Intronic Splicing Silencers
Homology Splice Site Prediction
Known spliced gene
Predicted spliced gene
Splice Site Motifs
Exonic Splicing Enhancers
Exonic Splicing Silencers
Genes & Development 18:1241-1250
Interaction between SE and SI
Rules for Splicing
• 3’ end likely target for repression
• Distance between SE and 3’ end < 100bp
• Splicing efficiency p(interaction SEC-3’ end)
Methods for splicing detection
Training set
of
know spliced
genes
Algorithm
Test set
of
know spliced
genes
Set
of
know spliced
genes
GA, NN, HMM
Bayesian
GA, NN, HMM
Bayes,METest set
Predictions
A Genetic Algorithm Method
Motif DM1 … AMi … EM
DM1
AM
p(i)
EM
IM
Shuffle lines and columns k times and each time calculate the probability of a given
combination of motifs getting spliced
Select m best combinations and continue to evolve the algorithm until it predicts training
set
A Neural Net Method
Weight Table for splice
elements
Hidden Nodes
Sequences
Predicted Splicing
Corrected Weight Table for splice
elements
Summary
• Eukaryotic genes have exons
• Biological rules combined with mathematical and statistical approaches can be used to predict the boundaries for the exons and to predict the splice variants
How to find what genes a string of DNA contains
Rui Alves
Simple steps
• Go to a known gene prediction server (or google for one)
• Input sequence and wait for prediction
• Get prediction(s), either as cDNA or as a tranlated protein sequence and do homology searches to identify them in a known database (e.g. NCBI or SWISSPROT)
Simple steps a)
• Go to a known gene prediction server (or google for one)
• Input sequence and wait for prediction
• Get prediction(s), either as cDNA or as a translated protein sequence and do homology searches to identify them
Paper PresentationThe human genome (Science) vs. The human
genome (Nature)
Nature : Pages 875 to 901
Science: Pages 1317-1337
Compare the differences in methods and results for the annotation
DO NOT SPEND TIME TALKING ABOUT THE SEQUENCING OR ASSEMBLY ITSELF
Do not go into the comparative genome analysis