B.sc biochem i bobi u 4 gene prediction
-
Upload
rai-university -
Category
Education
-
view
64 -
download
0
Transcript of B.sc biochem i bobi u 4 gene prediction
![Page 1: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/1.jpg)
Gene and protein prediction
Course: B.Sc Biochemistry
Subject: Basic of Bioinformatics
Unit: IV
![Page 2: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/2.jpg)
• Gene: A sequence of nucleotides coding
for protein
• Gene Prediction Problem: Determine the
beginning and end positions of genes in a
genome
Gene Prediction: Computational Challenge
![Page 3: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/3.jpg)
Gene Prediction: Computational Challengeaatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatcctgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcgg
![Page 4: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/4.jpg)
Gene Prediction: Computational Challengeaatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatcctgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcgg
![Page 5: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/5.jpg)
Gene Prediction: Computational Challengeaatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatcctgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcgg
Gene!
![Page 6: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/6.jpg)
Protein
RNA
DNA
transcription
translation
CCTGAGCCAACTATTGATGAA
PEPTIDE
CCUGAGCCAACUAUUGAUGAA
Central Dogma: DNA -> RNA -> Protein
1.
![Page 7: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/7.jpg)
• Central Dogma was proposed in 1958 by Francis Crick
• Crick had very little supporting evidence in late 1950s
• Before Crick’s seminal paper
all possible information transfers
were considered viable
• Crick postulated that some
of them are not viable
(missing arrows)
• In 1970 Crick published a paper defending the Central Dogma.
Central Dogma: Doubts
![Page 8: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/8.jpg)
• In 1961 Sydney Brenner and Francis Crick discovered frameshift mutations
• Systematically deleted nucleotides from DNA
– Single and double deletions dramatically altered protein product
– Effects of triple deletions were minor
– Conclusion: every triplet of nucleotides, each codon, codes for exactly one amino acid in a protein
Codons
![Page 9: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/9.jpg)
• Codon: 3 consecutive nucleotides
• 4 3 = 64 possible codons
• Genetic code is degenerative and redundant
– Includes start and stop codons
– An amino acid may be coded by more than
one codon
Translating Nucleotides into Amino Acids
![Page 10: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/10.jpg)
• In 1977, Phillip Sharp and Richard Roberts experimented with mRNA of hexon, a viral protein.
– Map hexon mRNA in viral genome by hybridization to adenovirus DNA and electron microscopy
– mRNA-DNA hybrids formed three curious loop structures instead of contiguous duplex segments
Discovery of Split Genes
![Page 11: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/11.jpg)
Exons and Introns
• In eukaryotes, the gene is a combination of coding segments (exons) that are interrupted by non-coding segments (introns)
• This makes computational gene prediction in eukaryotes even more difficult
• Prokaryotes don’t have introns - Genes in prokaryotes are continuous
![Page 12: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/12.jpg)
Central Dogma and Splicing
exon1 exon2 exon3intron1 intron2
transcription
translation
splicing
exon = codingintron = non-coding
Batzoglou
2.
![Page 13: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/13.jpg)
Gene Structure
3.
![Page 14: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/14.jpg)
Splicing Signals
Exons are interspersed with introns and
typically flanked by GT and AG
4.
![Page 15: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/15.jpg)
Consensus splice sites
Donor: 7.9 bits
Acceptor: 9.4 bits
5.
![Page 16: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/16.jpg)
Promoters
• Promoters are DNA segments upstream
of transcripts that initiate transcription
• Promoter attracts RNA Polymerase to the
transcription start site
5’Promoter 3’
6.
![Page 17: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/17.jpg)
Splicing mechanism
(http://genes.mit.edu/chris/)
7.
![Page 18: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/18.jpg)
Splicing mechanism
• Adenine recognition site marks intron
• snRNPs bind around adenine recognition
site
• The spliceosome thus forms
• Spliceosome excises introns in the mRNA
![Page 19: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/19.jpg)
• Statistical: coding segments (exons) have typical
sequences on either end and use different
subwords than non-coding segments (introns).
• Similarity-based: many human genes are similar
to genes in mice, chicken, or even bacteria.
Therefore, already known mouse, chicken, and
bacterial genes may help to find human genes.
Two Approaches to Gene Prediction
![Page 20: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/20.jpg)
UAA, UAG and UGA correspond to 3 Stop codons that (together with Start codon ATG) delineate Open Reading Frames
Genetic Code and Stop Codons
8.
![Page 21: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/21.jpg)
Six Frames in a DNA Sequence
• stop codons – TAA, TAG, TGA
• start codons - ATG
GACGTCTGCTTTGGAGAACTACATCAACCGGACTGTGGCTGTTATTACTTCTGATGGCAGAATGATTGTG
CTGCAGACGAAACCTCTTGATGTAGTTGGCCTGACACCGACAATAATGAAGACTACCGTCTTACTAACAC
GACGTCTGCTTTGGAGAACTACATCAACCGGACTGTGGCTGTTATTACTTCTGATGGCAGAATGATTGTG
GACGTCTGCTTTGGAGAACTACATCAACCGGACTGTGGCTGTTATTACTTCTGATGGCAGAATGATTGTG
GACGTCTGCTTTGGAGAACTACATCAACCGGACTGTGGCTGTTATTACTTCTGATGGCAGAATGATTGTG
CTGCAGACGAAACCTCTTGATGTAGTTGGCCTGACACCGACAATAATGAAGACTACCGTCTTACTAACAC
CTGCAGACGAAACCTCTTGATGTAGTTGGCCTGACACCGACAATAATGAAGACTACCGTCTTACTAACAC
CTGCAGACGAAACCTCTTGATGTAGTTGGCCTGACACCGACAATAATGAAGACTACCGTCTTACTAACAC
9.
![Page 22: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/22.jpg)
• Detect potential coding regions by looking at ORFs
– A genome of length n is comprised of (n/3) codons
– Stop codons break genome into segments between consecutive Stop codons
– The subsegments of these that start from the Start codon (ATG) are ORFs
• ORFs in different frames may overlap
3
n
3
n
3
n
Genomic Sequence
Open reading frame
ATG TGA
Open Reading Frames (ORFs)
![Page 23: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/23.jpg)
• Long open reading frames may be a gene
– At random, we should expect one stop codon every (64/3) ~= 21 codons
– However, genes are usually much longer than this
• A basic approach is to scan for ORFs whose length exceeds certain threshold
– This is naïve because some genes (e.g. some neural and immune system genes) are relatively short
Long vs.Short ORFs
![Page 24: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/24.jpg)
Testing ORFs: Codon Usage
• Create a 64-element hash table and count the frequencies of codons in an ORF
• Amino acids typically have more than one codon, but in nature certain codons are more in use
• Uneven use of the codons may characterize a real gene
![Page 25: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/25.jpg)
Codon Usage in Human Genome
10.
![Page 26: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/26.jpg)
Gene Prediction and Motifs
• Upstream regions of genes often contain motifs that can be used for gene prediction
-10
STOP
0 10-35
ATG
TATACTPribnow Box
TTCCAA GGAGGRibosomal binding site
Transcription start site
![Page 27: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/27.jpg)
Promoter Structure in Prokaryotes (E.Coli)
Transcription starts
at offset 0.
• Pribnow Box (-10)
• Gilbert Box (-30)
• Ribosomal
Binding Site (+10)
11.
![Page 28: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/28.jpg)
Ribosomal Binding Site
12.
![Page 29: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/29.jpg)
Splicing Signals
• Try to recognize location of splicing signals at
exon-intron junctions
– This has yielded a weakly conserved donor
splice site and acceptor splice site
• Profiles for sites are still weak, and lends the
problem to the Hidden Markov Model (HMM)
approaches, which capture the statistical
dependencies between sites
![Page 30: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/30.jpg)
Donor and Acceptor Sites:
GT and AG dinucleotides• The beginning and end of exons are signaled by donor
and acceptor sites that usually have GT and AC
dinucleotides
• Detecting these sites is difficult, because GT and AC
appear very often
exon 1 exon 2
GT AC
AcceptorSite
DonorSite
![Page 31: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/31.jpg)
Popular Gene Prediction Algorithms
• GENSCAN: uses Hidden Markov Models
(HMMs)
• TWINSCAN
– Uses both HMM and similarity (e.g.,
between human and mouse genomes)
![Page 32: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/32.jpg)
Books and Web References
• Books Name :
1. Introduction To Bioinformatics by T. K. Attwood
2. BioInformatics by Sangita
3. Basic Bioinformatics by S.Ignacimuthu, s.j.
• http://en.wikipedia.org/wiki/Gene_prediction
• http://www.humgen.nl/programs.html
• http://rulai.cshl.edu/reprints/nrg890_fs.pdf
Management 8/e - Chapter 8 32
![Page 33: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/33.jpg)
Image References
• 1. https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcQNwz43SompELD7S4Gadm73w8jNgs44zWxSX2wytTzJP0W6AuWBQw
• 2. http://svmcompbio.tuebingen.mpg.de/img/splice.png
• 3.http://bja.oxfordjournals.org/content/early/2009/05/29/bja.aep130/F2.large.jpg
• 4. http://www.web-books.com/MoBio/Free/images/Ch7E3.gif
• 5. https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcQSRar31enTpHlRv9-lff0C3cFOr8lB73z5h1aF4TnI3EP2Ye96HQ
• 6. http://edoc.hu-berlin.de/dissertationen/kuehn-kristina-2006-02-23/HTML/image005.jpg
• 7. http://www.nature.com/nrn/journal/v4/n10/images/nrn1234-i1.jpg
![Page 34: B.sc biochem i bobi u 4 gene prediction](https://reader030.fdocuments.in/reader030/viewer/2022032421/55a735f31a28abd9528b45d6/html5/thumbnails/34.jpg)
• 8. https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTXSXyL-0Nz7wVnCEEYXi5hiriOOB63rmRr9-XjHVt_2RRg_P3C
• 9. http://cloud.gmod.org/gbrowse2/tutorial/figures/dna2.png• 10.https://encrypted-
tbn0.gstatic.com/images?q=tbn:ANd9GcTXSXyL-0Nz7wVnCEEYXi5hiriOOB63rmRr9-XjHVt_2RRg_P3C
• 11. https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQeaTfC5lrWR-otzQsi7ONsKcYgjzoea--vsN1vMxFwxE3by4qElw
• 12.https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSCziU2aI59p-Svlmwo0WbWzwnBff-6jrXshMhXemdMvYqRiK1G