Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb...

Post on 14-Jan-2016

214 views 1 download

Transcript of Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb...

Eukaryotic Gene Prediction

Rui Alves

How are eukaryotic genes different?

DNA

RNA PolmRNA

RybProtein

How are eukaryotic genes different?

DNA

RNA Pol

RybProtein

mRNA mRNA

SpliceosomemRNA mRNA

Correctly Identifying Splicing sites is not a trivial task

How do we predict splicing sites?

• By Homology

• Ab initio– SS motifs– Codon usage– Exonic Splicing Enhancers– Intronic Splicing Enhancers– Exonic Splicing Silencers– Intronic Splicing Silencers

Homology Splice Site Prediction

Known spliced gene

Predicted spliced gene

Splice Site Motifs

Exonic Splicing Enhancers

Exonic Splicing Silencers

Genes & Development 18:1241-1250

Interaction between SE and SI

Rules for Splicing

• 3’ end likely target for repression

• Distance between SE and 3’ end < 100bp

• Splicing efficiency p(interaction SEC-3’ end)

Methods for splicing detection

Training set

of

know spliced

genes

Algorithm

Test set

of

know spliced

genes

Set

of

know spliced

genes

GA, NN, HMM

Bayesian

GA, NN, HMM

Bayes,METest set

Predictions

A Genetic Algorithm Method

Motif DM1 … AMi … EM

DM1

AM

p(i)

EM

IM

Shuffle lines and columns k times and each time calculate the probability of a given

combination of motifs getting spliced

Select m best combinations and continue to evolve the algorithm until it predicts training

set

A Neural Net Method

Weight Table for splice

elements

Hidden Nodes

Sequences

Predicted Splicing

Corrected Weight Table for splice

elements

Summary

• Eukaryotic genes have exons

• Biological rules combined with mathematical and statistical approaches can be used to predict the boundaries for the exons and to predict the splice variants

How to find what genes a string of DNA contains

Rui Alves

Simple steps

• Go to a known gene prediction server (or google for one)

• Input sequence and wait for prediction

• Get prediction(s), either as cDNA or as a tranlated protein sequence and do homology searches to identify them in a known database (e.g. NCBI or SWISSPROT)

Simple steps a)

• Go to a known gene prediction server (or google for one)

• Input sequence and wait for prediction

• Get prediction(s), either as cDNA or as a translated protein sequence and do homology searches to identify them

Paper PresentationThe human genome (Science) vs. The human

genome (Nature)

Nature : Pages 875 to 901

Science: Pages 1317-1337

Compare the differences in methods and results for the annotation

DO NOT SPEND TIME TALKING ABOUT THE SEQUENCING OR ASSEMBLY ITSELF

Do not go into the comparative genome analysis