Maximizing hidden stop codon on gene design

Post on 04-Jul-2015

728 views 1 download

description

Khaled Monsoor's presentation on a paper on "Maximizing hidden stop codon on gene design

Transcript of Maximizing hidden stop codon on gene design

Authors: Phan, V., Saha, S., Pandey, A., Wong, T-Y

Published in: Intl. Journal of Data Mining and Bioinformatics

Vol. 4, No. 4, 2010

Presented by:

Khaled MonsoorBioinformatics Masters ProgramThe University of MemphisMail: kmonsoor@memphis.edu

Date: Nov 05, 2010

What ?

Why ?

How ?

Result ?

Conclusion

Synthetic gene design with a large number of hidden stops

Like him …

Sleeping is waste of precious time

What are the Hidden stops in genes ?

Can we “redesign” genes to include more Hidden stops ?

How clever computer algorithms can help us ?

What ?

Why ?

How ?

Result ?

Conclusion

Synthetic gene design with a large number of hidden stops

It is now feasible to construct artificial genomes.

Researchers at the C. Venter Research Institute created artificially the genome of Mycoplasma genitalium, completed in 2010

…. To increase efficiency of protein synthesis in ‘designed’ genes ?

How to increase efficiency …

Hidden stops can protect from frame shifts

by terminating them early

Without hidden stops, frame shifts can cause

very long non-functional proteins

Dictates what a protein is composed of

Has evolved through millions of years

A protein is a sequence of amino acids

Contains 20(twenty) amino acids

8

mRNA:

ATGTCCAAACCT

Protein:

M S L P

10

11

CCT, CCC, CCA, CCG all represent P (Proline)

A mutation in the 3rd

positions does not change the amino acid

Deletion creates frame shifts, which change entire subsequence content

RNA: ….. CAT.CAT.CAT.CAT ….

Protein: …HHHH… (chain of Histidine)

Deletion of 3rd character (T): CAC.ATC.ATC.AT

Protein: HII

... Totally bizarre something else !!!

12

:-(

(start) (codon)k (stop)

Start – ATG

Stop – TAA, TAG, TGA

Codon – any triplet not equal to TAA, TAG, or TGA

Example: ATG.ACC.AAT.CGG.TAA

14

Stop codon (but hidden)

Hidden stops can protect against frame shifts by terminating consequence translation early

Without hidden stops, frame shifts can cause very long non-functional proteins, resulting to NOT

ONLY waste of time, amino acid resources (money), ATP (energy) but also produce some

deadly toxin

15

Ref: Seligmann and Pollock, DNA and Cell Biology, 2004

What ?

Why ?

How ?

Result ?

Conclusion

Synthetic gene design with a large number of hidden stops

•Design genes with maximum hiddenstops

•Constraints:

1. None,

2. by matching GC content, and

3. by matching codon usage

17

18

Consider this protein is MSDSKED

Both sequences encode for this protein:

1. ATG.AGT.GAT.AGT.AAA.GAA.GAC.TAA

2. ATG.TCC.GAT.TCG.AAA.GAA.GAC.TAA

Sequence (1) is better! It has 4 hidden stops!

19

Goal:

• Given a protein, design a DNA sequence that encodes the protein with the maximum number of hidden stops

20

Idea:

Optimal design of whole sequence is based on optimal design of partial sequences

H(i, j) = optimal design up to ith amino acid, Ai , which is coded by its jth codon

21

This formula can be computed recursively (in linear time, O(n))

H(i, j) = maxk { H(i-1, k) + Ikj }

Maximizing over all k codons coding the previous amino acid, Ai-1

Ikj = 1 if the kth codon of Ai-1 and jth codon of Ai is a stop codon

22

Protein DNA This is a 1-to-many mapping

Back translation should:

1. Satisfy constraints imposed by host genomes,

2. Serve specific design purpose

23

GC content = number of G & C in sequence

GC content relates to the stability of DNA

Algorithm’s objectives: 1. maximize number of hidden stops, 2. then, match GC content of host genome

25

Algorithm:

Construct the sequence with maximum number of hidden stops

“Fit” this sequence to the required Codon usage

Result:

Cannot achieve both max hidden stops and match Codon usage

Still “better” than wild-type genes

27

28For Leucine, codon CUG is used 51% in E. Coli.

What ?

Why ?

How ?

Result ?

Conclusion

Synthetic gene design with a large number of hidden stops

1. “Wild type” (genes from NCBI)

2. Random gene (constrained by Codon usage of “wild type”

3. “Optimal” – design with no constraint (max stop codon)

4. Constrained by GC content of wild type

5. Constrained by Codon usage of wild type

31

.

.

.

Nu

mb

er o

f h

idd

en s

top

co

do

n

What ?

Why ?

How ?

Result ?

Conclusion

While maintaining GC content & codon usage of wild-types, the algorithms can propose gene s with 1approx 10% more hidden stops

Maintaining both the constraints, the shape of distribution graph of ‘wild-type’ and ‘designed’ gene can maintain 98% Pearson correlation

As a lagging grad student,

I’ll try my best to answer

Thank you for attending his boring presentation … oh