Post on 04-Jul-2015
description
Authors: Phan, V., Saha, S., Pandey, A., Wong, T-Y
Published in: Intl. Journal of Data Mining and Bioinformatics
Vol. 4, No. 4, 2010
Presented by:
Khaled MonsoorBioinformatics Masters ProgramThe University of MemphisMail: kmonsoor@memphis.edu
Date: Nov 05, 2010
What ?
Why ?
How ?
Result ?
Conclusion
Synthetic gene design with a large number of hidden stops
Like him …
Sleeping is waste of precious time
What are the Hidden stops in genes ?
Can we “redesign” genes to include more Hidden stops ?
How clever computer algorithms can help us ?
What ?
Why ?
How ?
Result ?
Conclusion
Synthetic gene design with a large number of hidden stops
It is now feasible to construct artificial genomes.
Researchers at the C. Venter Research Institute created artificially the genome of Mycoplasma genitalium, completed in 2010
…. To increase efficiency of protein synthesis in ‘designed’ genes ?
How to increase efficiency …
Hidden stops can protect from frame shifts
by terminating them early
Without hidden stops, frame shifts can cause
very long non-functional proteins
Dictates what a protein is composed of
Has evolved through millions of years
A protein is a sequence of amino acids
Contains 20(twenty) amino acids
8
mRNA:
ATGTCCAAACCT
Protein:
M S L P
10
11
CCT, CCC, CCA, CCG all represent P (Proline)
A mutation in the 3rd
positions does not change the amino acid
Deletion creates frame shifts, which change entire subsequence content
RNA: ….. CAT.CAT.CAT.CAT ….
Protein: …HHHH… (chain of Histidine)
Deletion of 3rd character (T): CAC.ATC.ATC.AT
Protein: HII
... Totally bizarre something else !!!
12
:-(
(start) (codon)k (stop)
Start – ATG
Stop – TAA, TAG, TGA
Codon – any triplet not equal to TAA, TAG, or TGA
Example: ATG.ACC.AAT.CGG.TAA
14
Stop codon (but hidden)
Hidden stops can protect against frame shifts by terminating consequence translation early
Without hidden stops, frame shifts can cause very long non-functional proteins, resulting to NOT
ONLY waste of time, amino acid resources (money), ATP (energy) but also produce some
deadly toxin
15
Ref: Seligmann and Pollock, DNA and Cell Biology, 2004
What ?
Why ?
How ?
Result ?
Conclusion
Synthetic gene design with a large number of hidden stops
•Design genes with maximum hiddenstops
•Constraints:
1. None,
2. by matching GC content, and
3. by matching codon usage
17
18
Consider this protein is MSDSKED
Both sequences encode for this protein:
1. ATG.AGT.GAT.AGT.AAA.GAA.GAC.TAA
2. ATG.TCC.GAT.TCG.AAA.GAA.GAC.TAA
Sequence (1) is better! It has 4 hidden stops!
19
Goal:
• Given a protein, design a DNA sequence that encodes the protein with the maximum number of hidden stops
20
Idea:
Optimal design of whole sequence is based on optimal design of partial sequences
H(i, j) = optimal design up to ith amino acid, Ai , which is coded by its jth codon
21
This formula can be computed recursively (in linear time, O(n))
H(i, j) = maxk { H(i-1, k) + Ikj }
Maximizing over all k codons coding the previous amino acid, Ai-1
Ikj = 1 if the kth codon of Ai-1 and jth codon of Ai is a stop codon
22
Protein DNA This is a 1-to-many mapping
Back translation should:
1. Satisfy constraints imposed by host genomes,
2. Serve specific design purpose
23
GC content = number of G & C in sequence
GC content relates to the stability of DNA
Algorithm’s objectives: 1. maximize number of hidden stops, 2. then, match GC content of host genome
25
Algorithm:
Construct the sequence with maximum number of hidden stops
“Fit” this sequence to the required Codon usage
Result:
Cannot achieve both max hidden stops and match Codon usage
Still “better” than wild-type genes
27
28For Leucine, codon CUG is used 51% in E. Coli.
What ?
Why ?
How ?
Result ?
Conclusion
Synthetic gene design with a large number of hidden stops
1. “Wild type” (genes from NCBI)
2. Random gene (constrained by Codon usage of “wild type”
3. “Optimal” – design with no constraint (max stop codon)
4. Constrained by GC content of wild type
5. Constrained by Codon usage of wild type
31
.
.
.
Nu
mb
er o
f h
idd
en s
top
co
do
n
What ?
Why ?
How ?
Result ?
Conclusion
While maintaining GC content & codon usage of wild-types, the algorithms can propose gene s with 1approx 10% more hidden stops
Maintaining both the constraints, the shape of distribution graph of ‘wild-type’ and ‘designed’ gene can maintain 98% Pearson correlation
As a lagging grad student,
I’ll try my best to answer
…
Thank you for attending his boring presentation … oh