Protein-Nucleic Acids Interaction Perspective Non-specific interactions Specific interactions.

Post on 18-Dec-2015

218 views 2 download

Tags:

Transcript of Protein-Nucleic Acids Interaction Perspective Non-specific interactions Specific interactions.

Protein-Nucleic Acids Interaction

• Perspective

• Non-specific interactions

• Specific interactions

Why study proteins?

Although NA are the message of the cell, proteins are the medium through which message is expressed. One means little without the other.

What functions that DNA-protein interactions are involved in?

DNA replication, DNA repair, DNA recombination, transcription etc.

We concentrate on the molecular basis of protein-NA complexes known at high resolution.

Two effective techniques: X-ray crystallography and NMR spectroscopy (<25 kDa).

Both are equally valid but neither is sufficient without detailed kinetic, thermodynamic, and site-directed mutagenesis studies.

History of structure determination

Structure of DNA is regular: a list of the positions of the atoms in the double helix, the stability of DNA and the Chargaff rules, and provided a model for how DNA stores genetic information.

Proteins are much less regular, it is more difficult

to understand. The first structures for NA binding proteins (NABP) were of the stable and abundant nucleases binding to single-stranded nucleotides.

Work on more complicated protein-NA complexes (e.g., repressors, polymerases, tRNA synthetases) required two advances: techniques for overexpressing normally scarce proteins, and for synthesizing large amounts of oligonucleotides (ON).

Only in the late 1970s did it become reasonable to try to determine the structure of an ON, the protein it interacts with, and the complex between the two.

Aaron Klug

"for his development of crystallographic electron microscopy and his structural elucidation of biologically important nucleic acid-protein complexes“ (1982)

Alex Rich (?)ssNA-binding protein

Roger Kornberg

"for his studies of the molecular basis of eukaryotic transcription“(2006)

The forces between proteins and nucleic acids

There are four major forces that occur when proteins and NA interact, but it is very difficult to ascribe precise changes in free energy of association to specific interactions between protein and NA.

Four major forces between proteins and nucleic acids

• Electrostatic forces: salt bridges

• Dipolar forces: hydrogen bonds

• Entropic forces: the hydrophobic effect

• Dispersion forces: base stacking

Electrostatic forces: salt bridges

Electrostatic forces are long range, not very structure-specific, and contribute substantially to the overall free energy of association.

Salt bridges are electrostatic interactions between groups of opposite charge. They typically provide ~40 kJ/mol of stabilization per salt bridge.

In protein-NA complexes, they occur between the ionized phosphates of the NA and either the -ammonium group of lysine, the guanidinium group of arginine, or the protonated imidazole of histidine.

Salt bridges are influenced by the concentration of salt in the solution: as it increases, the strength of the salt bridges decreases

Salt bridges are much stronger in the absence of water molecules between the ionized groups (because water has high dielectric constant, ).

Strength F: proportional to (r, two-charge distance)-2.

F = Q1Q2 / r2

where Q is the magnitute of the charges

Changes in sequence in B-DNA perturb the average structure only subtly. Salt bridges alone, therefore, can not distinquish one B-DNA sequence from another.

Patterns of salt bridges, however, could clearly be used to distinguish ss- from ds-NA, and B- from Z-DNA.

Dipolar forces: hydrogen bonds

Hydrogen bonds are dipolar, short-range interactions that contribute little to the stability of the complex but much to its specificity.

Hydrogen bonds occur between the amino acid side chains, the backbone amides and carbonyls of the protein, and the bases and backbone sugar-phosphate oxygens of the NA.

When protein-NA molecules are not complexed, all their exposed hydrogen bond donors (X) and acceptors (Y) form linear hydrogen bonds to water. Hydrogen bonds are a result of dipole-dipole interactions:

X__H ---- Y__R (X, Y = nitrogen or oxygen)

Strength: proportional to (H-Y distance)-3.

When the complex forms, there is little change in the free energy due to hydrogen bond formation if the linear hydrogen bonds to water are replaced by a similar ones between the macromolecules.

By contrast, forming bent hydrogen bonds carries a free energy penalty of up to ~4 kJ/mol per hydrogen bond.

Thus hydrogen bonds are very important in making sequence-specific protein-NA interactions.

Entropic forces: the hydrophobic effect

Hydrophobic forces are short range, sensitive to structure, proportional to the size of the macromolecular interface, and contribute to the free energy of association

The hydrophobic effect is due to the behaviour of water at an interface.

When molecules aggregate, the ordered water molecules at the interface are released and become part of the disordered bulk water, thus stabilizing the aggregate by releasing the entropy of the system.

Molecules of water left at the interface between a protein and a NA obviously decrease the entropy of the system. Consequently, the surface of the protein and NA tend to be exactly complementary so that no unnecessary water molecules remain when the complex forms.

However, specific water molecules with defined functions in sequence recognition are often bound in the interface (e.g., met repressor).

Dispersion forces: base stacking

Dispersion forces have the shortest range but are very important in base stacking in double-stranded NA and in the interaction of protein with ssNA.

Base stacking is caused by two kinds of interaction: the hydrophobic effect and dispersion forces.

Molecules with no net dipole moment can attract each other by a transient dipole-induced dipole effect. It is very sensitive to the thermal motion of the molecules.

For dsNA, dispersion forces are clearly important in maintaining the structure by base stacking.

For ssNA, they also help it to bind proteins because aromatic side chains can intercalate between the bases of a ssNA

Strength: proportional to ~(dipole-dipole distance)-6

Geometric constraints imposed by the nucleic acid

All NA have repeating polyanionic backbones and so all proteins that bind them have strategically placed arginines and lysines that create an electrostatic field to neutralize the negative charge.

Contacts to the bases are called "direct readout" be

cause what contacts form depends directly on the sequence of the NA; distinguishing sequences by how the sequence affects the distortability or conformation of the NA is called "indirect readout".

Double-stranded B-DNA

Simple model-building predicted two of the many ways in which proteins interact with B-DNA by hydrogen-bonding:

1) an antiparallel -sheet interacting to the phosphate backbone in the minor groove,

2) an -helix interacting with bases in the major groove. The pattern of sites are more dissimilar for each base-pair in the ma

jor than in the minor groove.

Thus, to distinguish the cognate sequence from all others by direct readout alone, protein must form more than one hydrogen bond to some of the base-pairs in the major groove.

In specific protein B-DNA complexes, about 1/2 of the hydrogen bonds are to the bases and the other 1/2 to the phosphate backbone.

Single-stranded NA

Hydrophobic bases in ssNA are more exposed. ssNABP has more hydrophobic NA binding surface than dsNABP.

The hydrophobic surface often contains aromatic groups which interact more effectively with the NA bases, and also an electrostatic field that neutralizes the charge of the phosphate backbone.

Possibly because the structure of RNA varies more than that of DNA, proteins seem to recognize RNAs in more ways than they recognize DNAs.

RNAs, even more than DNAs, may be recognized by indirect readout: the correct RNA can distort to fit the protein whereas the incorrect one can not.

The kinetics of forming protein-nucleic acid complex

Two factors affect the rate of formation of all protein-NA complexes: random thermal diffusion and long-range, directional electrostatic attraction.

A "one-dimensonal random walk" can account for

the observed rate of genome sequence-specific protein-DNA complexes.

The protein first binds non-specifically to the DNA and then diffuses or jumps along the DNA until it finds the appropriate sequence.

Thus, all sequence-specific DNA binding proteins may bind DNA in two ways: one for tight, sequence-specific binding and the other for looser, non-sequence specific binding.

Protein-Nucleic Acids Interaction

• Perspective

• Non-specific interactions

• Specific interactions

Non-specific interactions

• Single-stranded nucleic acid binding proteins

• Non-sequence-specific nucleases

• Polynucleotide polymerases

The need for packaging

The fundamental building block of chromatin in eukaryotes is the nucleosome, a protein-DNA complex. The nucleosome core particle consists of 146 bp of DNA and eight small, highly basic histone proteins. The DNA wraps around the histone octomer to form a negative supercoil.

Bacteria also use small basic proteins to package DNA, such as the dimeric HU protein from E. coli, whose long -strand arms presumably wrap around a double-stranded DNA molecule.

Viruses are highly symmetric particles that can pack their nucleic acid genome efficiently inside the protein capsid.

Protein subunits containing many basic amino acids interact with the viral nucleic acid in a non-sequence-specific manner.

In the helical TMV, some sequence-specific contacts are involved in directing assembly of the virus, but there are no such contacts in the icosahedral FHV.

Single-stranded nucleic acid binding proteins

ssDNA is formed during replication and most organisms produce proteins to bind it. These proteins form an important but diverse group but, with the exception of gene 5 protein from bacteriophage fd, there is little structural information on how they interact with NA.

A model has been suggested in which lysines and arginines neutralize the DNA phosphate backbone and the bases stack against aromatic amino acid side chains.

Non-sequence-specific nucleases

All organisms must degrade NA during their life cycle. There is no one enzyme designed for this purpose, but rather a large number of enzymes with different specificities. These include exo- and endonucleases and enzymes specific for ss- and ds-NA and for base sequences.

e.g., RNase and DNase

RNase and DNase have different reaction mechanisms because RNase uses the ribose 2'-hydroxyl group, not present in DNA, to attack the 5'-phosphate ester linkage.

Ribonuclease A, barnase, and binase

RNase A is not sequence specific because it only interacts with the base at the active site;

all other contacts are electrostatic ones to the sugar-phosphate backbone.

Deoxyribonuclease I

DNase I cleaves different sequences with different rates because of sequence-dependent steric hindrance at the active site.

G-C tracts accommodate the catalytic loop better because they have wider minor grooves than A-T tracts.

Polynucleotide polymerases

There are four classes of template-directed polynucleotide polymerases: DNA- or RNA-dependent and DNA- or RNA-polymerizing.

All add nucleotides to the 3'-end of a growing polynucleotide chain but they differ widely in how accurately they replicate the NA (their fidelity) and how many nucleotides they add before dissociating (their processivity).

e.g., Pol I and RTase.

They have the same overall architecture for gripping a NA during polymerization. It is a domain that looks like a right hand, with palm, fingers, and thumb subdomains.

Part of the palm subdomain and the direction from which the NA approaches the active site is conserved in these two polymerases, their 3'-5' exonucleases, and RNase Hs may all use the same mechanism, which requires two divalent cations.

DNA-dependent DNA polymerases: E. coli DNA polymerase I (Pol I) and III

All cellular DNA-dependent DNA polymerases have a 3'-5' proof-reading exonuclease, require a primer to begin synthesis, and replicate their own NA the most faithfully.

The Klenow fragment of Pol I contains two widely-separated domains, one carrying the polymerase activity, and the other the 3'-5' proofreading exonuclease activity.

The DNA approaches the polymerase from exonuclease side and bends by 90o to enter the polymerase site.

The protein does not read the DNA sequence at all. Instead, when an incorrect base is added, the DNA strands separate and the daughter strand is therefore more likely to reach over to the exonuclease, which then removes the incorrect base.

RNA-dependent DNA polymerases: HIV-1 reverse transcriptase (RTase)

RTase is a unique heterodimer becuase its two subunits have the same sequence yet fold differently. The p66 subunit folds into a polymerase domain and an RTase H domain.

The DNA in the complex is A-form near the polymerase active site. Near the active site, the palm and thumb hold the primer strand, while the palm and fingers hold the template strand.

Two polypeptide helices interact with the phosphate backbone and probably ensure that RTase tracks the DNA correctly.

Protein-Nucleic Acids Interaction

• Perspective

• Non-specific interactions

• Specific interactions

Specific interactions

The placement of an -helix in the major groove appears to be the most common way of recognizing a specific DNA sequence.

Other parts of the protein, which form hydrogen bonds and salt bridges to the DNA backbone, position the element on the DNA so that it can achieve recognition.

Direct readout of the DNA sequence, most often in the major groove, is an important part of sequence-specific binding but is by no means the only component.

The direct readout can involve hydrogen bonds (1) directly to side chains, (2) to the polypeptide backbone, or (3) through water molecules, or depend on hydrophobic interactions.

Indirect readout is also important: the correct DNA sequence may differ from canonical B-DNA in a way that increases the surface area buried, the electrostatic attraction, or the number of hydrogen bonds formed.

The protein can also change conformation upon binding, affecting the overall stability of the complex or the ability of the protein to recognize a specific sequence.

Oligomerization upon binding the correct sequence, as in met repressor, GCN4, and glucocoticoid receptor, often increases affinity and specificity.

Consequently, there is no universal code by which proteins recognize DNA sequences.

Even in a single family, such as the HTH proteins, the recognition helix is presented to the DNA in many ways.

The need for specificity

For a cell to function at all, proteins must distinguish one NA from another very accurately.

Proteins that bind specific NA sequences also bind non-specific ones.

In some cases, like the transcriptional regulators, this binding is intrinsic to function; in others, like the tRNAs, the binding is merely unproductive.

Quantitative specificity

The simplest model of a protein-DNA binding reaction is given by the equation:

T + Xi <- > T.Xi Here T is free protein, Xi is any one particular site

or non-site DNA, and T.Xi is bound complex. A convenient way to quantify the specificity of su

ch a protein is to normalize the binding constant: Keq = [T.Xi] / [T][ Xi] to some user-defined reference value.

Binding constants can be normalized to an average of the set of Keq over all the Xi, yielding “specific binding constant”, Ks.

Assuming that the positions in the binding site contribute independently to the binding energy, a table of the Ks values (or, equivalently, Gs = -RT In Ks) for all the single mutants of a binding site allows one to estimate the affinity for any site or non-site sequence.

One standard method of measuring binding affinities uses a “gel shift” experiment.

Recall that: K(Xi) = [T.Xi] / [T][ Xi].

Typically, the ratio of bound to unbound DNA is determined at several different protein concentrations and then curve-fitting is employed to give a best estimate of K(Xi).

Multiple determinations of the same constant indicate that this approach can give values that are accurate to within a factor of about 2.

Gel shift experiment –Mnt protein binding to operator

The binding of the homotetrameric Mnt protein of Salmonella phage P22 to variants of its symmetric 17-base operator.

Changes in sequence affect binding affinity.

Stormo & Fields, TIBS, 1998

Interactions of mammalian proteins with cisplatin-damaged DNAJ.J. Turchi et al., 1999

Transcriptional regulators: the helix-turn-helix motif

• The prokaryotic complexes

• Eukaryotic complexes: the homeodomain

Structural Classification of HTH DNA-binding Domains and

Protein–DNA Interaction Modes

R. Wintjens and M. RoomanJMB, 1996

Exclusively eukaryotic transcriptional regulators:

the zinc finger and leucine zipper

• The zinc finger proteins

The Cys2His2 zinc finger (e.g., Zif268)

The Cys4 nuclear receptors

The GAL4 zinc finger

• The leucine zipper

Physical basis of a protein-DNA recognition code: zinc finger modules

Choo & Klug, 1997

Other -helical binding motifs

-Sheet binding motifs

• The met repressor family

• The TFIID TATA-box binding protein

Restriction endonucleases: EcoRI and EcoRV

EcoRI and EcoRV have very different structures and interact with DNA differently: the former only in the major groove; the latter in both grooves.

However, both employ the same enzyme mechanism and catalytic residues and both achieve their high degree of sequence specificity similarly.

They complex with the cognate DNA in a highly co-operative and symmetric manner.

In the complex with cognate DNA, much of

the free energy of binding has been used to drive the cognate DNA into an unfavorable conformation that places the scissile phosphodiester bond in the active site and completes the binding site for the essential Mg2+.

Enzymes that "flip-out" nucleosides

Nucleoside flipping involves rotation of backbone bonds to expose an out-of-stack base in dsDNA. It can then be a substrate for an enzyme-catalysed chemical reaction.

Rotations about and torsion angles but not the glycosylic bond, c, appeared to be required.

The phenomenon is fully established for two restriction methyltransferases (M.HhaI and M.HaeIII).

There is strong evidence for nucleoside flipping for two key repair enzymes (UDGase and E. coli photolyase).

Other examples are emerging and the phenomenon is likely to prove general for enzymes that requires access to unpaired bases. It appears to need no external source of binding energy.

After knowing all these, one may design ----a high through-put method to identify protein(s) that interact with a specific DNA sequence.