What is a Protein Data Bank

READING PDB FILES

Claire Shoemake

Definitions

• Protein is used interchangeably with receptor

• The implication is that the drug target (receptor)

• Ligand: The small molecule bound to the protein. This could be an endogenous molecule, or a drug.

the drug target (receptor) being considered is protein in nature

• Protein:ligand Complex:

This is the small molecule bound to its receptor. Normally the small molecule modulates receptor function (agonist/antagonist)

What is a Protein Data Bank (PDB)

File?• It is a textual file format describing the three dimensional structures

of molecules held in the Protein Data Bank.

http://bip.weizmann.ac.il/oca-bin/ocamain

• Most of the information in that database pertains to proteins, andthe pdb format accordingly provides for rich description andannotation of protein properties. However, proteins are oftenthe pdb format accordingly provides for rich description andannotation of protein properties. However, proteins are oftencrystallized in association with other molecules or ions such aswater, ions, nucleic acids, drug molecules and so on, whichtherefore can be described in the pdb format as well.

• The pdb file used as an example in this lecture is 1UZFhttp://bip.weizmann.ac.il/oca-bin/send-pdb?id=1uzf whichdescrbes the Angiotensin Converting Enzyme (ACE) bound to theACE inhibiting drug Captopril

Protein Classification PDB ID

1.

2.

1. Gives information regarding thecontent of the file.

2. Indicates that the protein is human. In this case human testicular ACE.

3. Indicates the nature of the tissue culture that is used to express, or grow, the protein described in this file.

4. Indicates the analytical 2.

3.

4.

4. Indicates the analytical technique- X-Ray or NMR, that was used by the authors to resolve the protein crystal. In this case the crystal being consideredis testicular ACE complexed to captopril.

X-ray Crystallography

http://en.wikipedia.org/wiki/X-ray_crystallography

• X-ray crystallography is a method of determining the arrangement of atoms within a crystal, in which a beam of X-rays strikes a crystal and diffracts into many specific directions.

• From the angles and intensities of these diffracted beams, a crystallographer can produce a three-dimensional picture of the density of electrons within the crystal.

• From this electron density, the mean positions of the atoms in the crystal can be determined, as well as their chemical bonds, and various other information.

• Since many materials can form crystals — such as salts, metals, minerals, semiconductors, as well as various inorganic, organic and biological molecules — X-ray inorganic, organic and biological molecules — X-ray crystallography has been fundamental in the development of many scientific fields.

• In its first decades of use, this method determined the size of atoms, the lengths and types of chemical bonds, and the atomic-scale differences among various materials, especially minerals and alloys. The method also revealed the structure and functioning of many biological molecules, including vitamins, drugs, proteins and nucleic acids such as DNA.

• X-ray crystallography is the chief method designing pharmaceuticals against diseases

• In an X-ray diffraction measurement, a crystal is mounted on a goniometer and gradually rotated while being bombarded with X-rays, producing a diffraction pattern of regularly spaced spots known as reflections. The two-dimensional images taken at different rotations are converted into a three-dimensional model of the density of electrons within the crystal using the mathematical method of Fourier transforms, combined with chemical data known for the sample. Poor resolution (fuzziness) or even errors may result if the crystals are too small, or not uniform enough in their internal makeup.

5.

5. Crystallographic team- also authors of the paper that must bepublished in a peer-reviewed

journal prior to deposition acceptance by the Protein Data Bank

6. Details of the journal publication submitted by the crystallographic

6.

submitted by the crystallographic team. It is of vital importance toobtain a copy of this publication when attempting drug design projects. These contain further information that may not be included in the pdb file

It is necessary to choose the best possible crystallographic structure prior to embarking on a drug design project. This is because this structure serves as a starting point and templateon which all successive steps are dependent.

One critical factor in crystallographic data selection is its resolution. Resolution implies the smallest distance within which atoms may be reliably distinguished.

The higher the resolution or the smaller the The higher the resolution or the smaller the distance within which atoms may be reliably distinguished, the better is the crystallographicstructure.

Resolutions ranging from 2-3.5Å are consideredacceptable starting points for drug designprojects

This particular crystal structure was resolved at2.0Å.

About 85% of the models (entries) in the Protein Data Bank were determined by X-ray crystallography. (Most of the remaining 15% were determined by solution nuclear magnetic resonance.) Analysis of x-ray diffraction patterns from protein crystals produces an electron density map, into which an atomic model of the protein is fitted. Major errors sometimes occur when fitting models in to low-resolution electron density maps

The value of Free R is the best clue as to whether major errors may be present in a published model.

Obtaining diffraction-quality crystals of Obtaining diffraction-quality crystals of proteins remains very difficult, despite many recent advances. For every new protein sequence targeted for X-ray crystallography, about one in twenty is solved

Free R is a statistical quantity introduced in 1992 by Axel T. Brünger to assess the quality of a model from X-ray crystallographic data.

It is calculated in the same manner as the R value, but from a subset of the data set aside for the calculation of free R, and not used in the refinement of the model. It is a more reliable tool for assessing the model than the R value becauseit is not self-referential -- that is, as an estimation of errors, free R is free of any bias that may have been introduced during refinement. As a rule of thumb, free R should not exceed the R value by more than 0.05; that is, if the R value is 0.20, free R shouldnot significantly exceed 0.25. Free R values exceeding 0.40 raise serious doubts about the model.

The R Value

• The R value is used to assess progress in the refinement of a model from X-ray crystallographic data, and can be used as one factor in evaluating the quality of a model. R is a measure of error between the observed intensities from the diffraction pattern and the predicted intensities that are calculated from the model. R values of 0.20 or less are taken as evidence that the model is reliable.

• As a rule of thumb, models with R values substantially exceeding (resolution/10) should be treated with caution. Thus, if the resolution of (resolution/10) should be treated with caution. Thus, if the resolution of a model is 2.5 Å, that model's R value should not exceed 0.25. Completely erroneous models (e.g. random models) give R values of 0.40 to 0.60.

• However, R values themselves must be treated with caution. Unlike the Free R, acceptable R values can be achieved despite serious errors in the model

Kleywegt, GJ, AT Brünger. 1996. Checking your imagination: applications of the free R value. Structure 4:897-904.

It is incumbent on the authors to submit It is incumbent on the authors to submit

experimental details to the Protein Data Bank.

This allows their experimental conditions to be

re-created, and their results to be reproduced.

The related entries section of the pdb file is valuable since it provides the researcher with

additional information regarding further structural information that may be available

about the protein, or receptor of interest.

In this case, three further depositions, with pdb IDs 1O86 (ACE + lisinopril), 1O8A (the

unbound form of ACE), and 1UZE (ACE + enalaprilat) are available.

It is of interest from a drug design point of view to visualise and compare these depositions

in order to identify whether or not the tertiary structure of the ACE is in any way ligand

dependant

The primary amino acid sequence i.e. the

linear sequence of the unfolded protein in this

case of testicular ACE enzyme is listed in this

section of the pdb file.

At this point of the file it is also possible to

deduce that the protein is a monomer. This

may be seen from the fact that the third

column of the file always contains the letter A.

This means that there is only one chain labelled

A, implying the monomeric status of the A, implying the monomeric status of the

protein

The term heteroatom is used in pdb files to designate all atoms that do not form part of the

protein i.e. all atoms that do not form part of the primary structure of the protein. This part of

the pdb file indicates all the heteroatoms (excluding water molecules) that form part of the

protein (ACE):ligand (captopril) complex.

The areas highlighted in blue are searchable, and lead to windows in which the structures of

the heteroatoms may be found.

In this case the presence of the Zn atom indicates the fact that ACE is a metalloprotease; MCO is

the code given by the authors for captopril. HOH indicates water.

Helices and sheets constitutethe secondary structure of a protein,or more clearly the nature of the folding that occurs along segments of the protein.

This section of the pdb file yieldsinformation regarding the secondarystructure of the protein being described.

The areas highlighted in blue are searchable......

Parts of which are shown above. In this case, the entry shows which amino acids form helix 1 on the ACE.

The coordinate section of the pdb file describethe coordinates of the atoms that are part of the protein.

For example, the first ATOM line on the leftdescribes the alpha-N atom of the first residue of peptide chain A, which is an aspartate residue.

The first three floating point numbers are its x, y and z coordinates and are in units of Ångströms.

The next three columns are the occupancy, temperature factor, and the element name, respectively.

The red rectangles delineate individual amino acids. The atoms making up any one amino acid have the same number in column 5 of the coordinatefile.

Thus, in this case, there are the coordinates of thefirst 6 amino acids in the primary amino acid sequence specifically aspartate, glutamine,alanine, glutamine, alanine and serine

The temperature factor or B-factor can be thought of as a measure of how much an atom oscillates or vibrates around the position specified in

the model. Atoms at side-chain termini are expected to exhibit more freedom of movement than main-chain atoms, and this movement

amounts to spreading each atom over a small region of space. Occupancy is one of several parameters included in refinement. The occupancy

nj of atom j is a measure of the fraction of molecules in the crystal in which atom j actually occupies the position specified in the model.

If all molecules in the crystal are precisely identical, then occupancies for all atoms are 1.00.

This part of the pdb file shows the last amino acid in the primary amino acid sequence of the protein. Its end is indicated by the TER entry encircled above.

The pdb file then continues to describe the first in the series of heteroatoms included in this entry- that is of those atoms which are not part of the protein molecule. The first is NAG or N-acetylglucosamine. As indicated previously, two NAG molecules were crystallised in this protein:ligand complex.

The coordinates for the metal ion (Zn) and the bound ligand molecule (Captopril) designated, as previously indicated through a code identifier MCO are indicated above.

For each atom in the chemical component, lists to how many and to which otheratoms that atom is bonded. The list of CONECT records is concluded with an END record.

Ligand Protein Contacts (LPC)

http://bip.weizmann.ac.il/oca-bin/lpc?PDB_ID=1uzf

Most pdb files contain ligand:protein contact information. This is of vital importance from a drug design point of view:

A clear idea of the amino acids which bind the ligand binding pocket is obtained

Critical binding interactions between the ligand and the receptor may be identified

Unstable contacts may also be identified and improved upon in the context of the design project

In this case, the table above lists the amino acids on the ACE which make contact with captopril. The bond length, the contact Surface area, and the nature of the bond are also indicated. The Table above left is a glossary which explains the terms used in the table above.

Hydrogen bonds play an important role in binding ligands to the ligand binding pocket of a receptor. They are different Hydrogen bonds play an important role in binding ligands to the ligand binding pocket of a receptor. They are different from hydrophobic or Van der Waals interactions. These latter are more numerous and are considered to be largely responsible for ligand stabilisation within a binding pocket. Hydrogen bonds, on the other hand, are associated with selectivity. This means that a ligand and its cognate receptor recognise each other on the basis of the hydrogen bonds they are capable of forging between them.

This is very important from a drug design point of view where selectivity is of paramount importance. Pdb files conveniently list the hydrogen bonds forged between protein and ligand in a separate table in the LPC section of the file.

In the above table, the first section on the extreme left describes the ligand atoms which are involved in hydrogen bond contacts with the protein amino acid side chains. In the first entry for example, Oxygen atom no1 (in the pdb entry) is forging a hydrogen bond with the hydroxyl group of tyrosine520 of the ACE. The protein atom section consequently describes the receptor atoms which forge hydrogen bond contacts with the ligand atoms. This hydrogen bond is 2.7Å long and occupies a total surface area of 19.4Å2

The classification section (Class in the table above) is discussed later on.

This table lists each atomic contact between the protein and the ligand. It is similar to that for the hydrogen bond interactions on the previous slide. It differs in that it does not segregate for hydrogen bond interactions, but includes the bond types.

It also indicates the unstable interactions in red. Drug designers will often try to optimise these instable contacts in order to create drug molecules that reside within a ligandbinding pocket with improved stability.

These are the reference tables included in the LPC section of a pdb file. They indicate the nature of the interactions of a pdb file. They indicate the nature of the interactions forged between protein and ligand (listed under the Class Section), and in the case of the table on the right, there is also information regarding which types of interactions will give rise to stable or unstable contacts between the protein and the ligand.

This data may be viewed graphically

using specialised software such as

VMD.......VMD.......

What is a Protein Data Bank

Documents

Transcript of What is a Protein Data Bank