Codon models R CGT CGC R D GAC GCC A Synonymous substitution Nonsynonymous substitution.

43

Transcript of Codon models R CGT CGC R D GAC GCC A Synonymous substitution Nonsynonymous substitution.

Codon models

RCGTCGCR

D GACGCCA

Synonymous substitution

Nonsynonymous substitution

Naïve assumption: no selection against synonymous substitutions

Selection

sequence position

rate of synonymous substitutions

Synonymous purifying selection (conservation)

Protein folding

Splicing regulatory elements

mRNA structure

Overlapping genes

Codon bias

Species 1Species 2Species 3

T AACT GCCACG GCTACA GCAT A

L T S ICTT ACA AGC ATCCTT ACA AGC ATCCTT ACA AGC ATC L T S I

G R GGG CGTGGT CGGGGA CGA G R

sequence position

How should we model synonymous selection?

Testing for synonymous selection

H0: free from synonymous selection → constant Ks

H1: under synonymous selection → variable Kslikelihood ratio test

21

0

1 ~)|(

)|(log2

MDL

MDL

Research objective

Quantify and characterize the

magnitude and role of synonymous purifying

selection

Comparative sequence data

S.cerevisiaeS.paradoxusS.mikataeS.bayanusS.castelli

> 20 million years

70%-90% coding DNA sequence identity

Comparative sequence data5,135 datasets of multiple sequence alignments + phylogenies (5,182 of ~6,000 S. cerevisiae genes)

Obtained from Wapinski et al., Nature 2007

GATCGATTC

GATCGATTA

GATCGGTCC

GCTCGGTCC

GATAGACAT

?

Under synonymous selection

Not under synonymous selection

54.4%

(2,794)45.6%

(2,341)

position

Under significant synonymous selection

Under synonymous selection

Not under synonymous selection

42%

(2,154)

45.6%

(2,341)

12.4%

(640)

Synonymous selection underlies codon bias

Different organisms prefer specific codons over others that encode the same amino acid

R: S. cerevisiaeAGA 48%

AGG 21%

CGA 7%

CGC 6%

CGG 4%

CGU 14%

Codon bias maintains translational efficiency

Translation speed Translation accuracy

Codon adaptation index (CAI) quantifies codon bias

Sharp and Li. Nucleic Acids Res, 1987

Genes under synonymous selection are codon biased

Synonymous selection underlies codon bias

position

Codon bias (synonymous selection) derives from protein structure

Translation speed Translation accuracy

S. cerevisiae mitochondrial NADP(+)-dependent isocitrate dehydrogenase (PDB: 2QFY)

Codon bias at the protein 3D structure

S. cerevisiae mitochondrial NADP(+)-dependent isocitrate dehydrogenase (PDB: 2QFY)

codon bias core > codon bias surface

S. cerevisiae mitochondrial NADP(+)-dependent isocitrate dehydrogenase (PDB: 2QFY)

codon bias interface > codon bias surface

MDR1 is a member of the ABC transporter family.

They pump drugs out of the cell utilizing ATP, which change conformation of the protein.

These proteins were shown to induce multi-drug resistance in various cancers.

C3435T is a synonymous SNP that was reported to be a risk factor for several diseases such as Parkinson’s diseases, colon cancer, and renal epithelial tumor.

It can be either because:

1. Change in mRNA level

2. Change in splicing

3. Linkage disequilibrium with other causative SNPs

4. Something else

FACS analysis.

In purple – cell transfected with empty vector

All other colors – cell trasfected with a vector containing MDR1 (various haplotypes)

MDR1 pumps the drug (Bodipy) out of the cells.

Bodipy

All other colors – cell trasfected with a vector containing MDR1 – various haplotypes

The inhibitor works differently on the various haplotypes

Trypsin works differently on the various haplotypes

They showed that synonymous substitutions did not change protein levels but rather the structure.

This was shown by differential response to specific antibodies.

Important for linking SNPs to diseases.

Conservation of Ks in pol

0

1

2

3

4

5

750 800 850 900 950

Site

Ks

rate

Mayrose et al. Bioinformatics/ISMB (2007)

0

1

2

3

4

900 910 920 930 940 950

Position

Ks

ra

te

DNA flap

cPPT

CTS

?

Conservation of Ks in pol (zoom in)

cPPTA

This region serves as a primer for the reverse transcriptase in the synthesis of the plus-strand DNA.

cPPT

CTS = Central Termination SequenceA

The CTS is involved in the nuclear import of the HIV-1 genome.

CTS

????

In Pol one region is of unknown function

Kudla et al. showed that the levels of GFP – which is a protein whose gene can easily be inserted into a host genome and its levels can then be easily quantified, are strongly affected by the secondary structure of the 5’ end of the mRNA.

Stable mRNA Non stable mRNA

Non- stable mRNA secondary structure at the 5’ end -> higher GFP level.

Mechanism: stable secondary structures at the 5’ end of the mRNA obstruct ribosome binding to the mRNA and result with lower protein levels

Based on that we hypothesized that the 5’ end of the mRNA should show signals of strong synonymous selection.

This is exactly what we found in our yeast data…

In addition, we found that the codon bias is reduced at this region, as to allow non-stable mRNA structures.