CS790 – BioinformaticsProtein Structure and Function1 Disulfide Bonds Two cyteines in close...

Post on 04-Jan-2016

231 views 8 download

Transcript of CS790 – BioinformaticsProtein Structure and Function1 Disulfide Bonds Two cyteines in close...

Protein Structure and Function 1CS790 – Bioinformatics

Disulfide BondsDisulfide Bonds Two cyteines in

close proximity will form a covalent bond

Disulfide bond, disulfide bridge, or dicysteine bond.

Significantly stabilizes tertiary structure.

Protein Structure and Function 2CS790 – Bioinformatics

Determining Protein StructureDetermining Protein Structure There are O(100,000) distinct proteins in the

human proteome. 3D structures have been determined for 14,000

proteins, from all organisms• Includes duplicates with different ligands bound,

etc.

Coordinates are determined by X-ray X-ray crystallographycrystallography

Protein Structure and Function 3CS790 – Bioinformatics

X-Ray CrystallographyX-Ray Crystallography

~0.5mm

• The crystal is a mosaic of millions of copies of the protein.

• As much as 70% is solvent (water)!

• May take months (and a “green” thumb) to grow.

Protein Structure and Function 4CS790 – Bioinformatics

X-Ray diffractionX-Ray diffraction

Image is averagedover:• Space (many copies)• Time (of the diffraction

experiment)

Protein Structure and Function 5CS790 – Bioinformatics

Electron Density MapsElectron Density Maps Resolution is

dependent on the quality/regularity of the crystal

R-factor is a measure of “leftover” electron density

Solvent fitting Refinement

Protein Structure and Function 6CS790 – Bioinformatics

The Protein Data BankThe Protein Data Bank

ATOM 1 N ALA E 1 22.382 47.782 112.975 1.00 24.09 3APR 213ATOM 2 CA ALA E 1 22.957 47.648 111.613 1.00 22.40 3APR 214ATOM 3 C ALA E 1 23.572 46.251 111.545 1.00 21.32 3APR 215ATOM 4 O ALA E 1 23.948 45.688 112.603 1.00 21.54 3APR 216ATOM 5 CB ALA E 1 23.932 48.787 111.380 1.00 22.79 3APR 217ATOM 6 N GLY E 2 23.656 45.723 110.336 1.00 19.17 3APR 218ATOM 7 CA GLY E 2 24.216 44.393 110.087 1.00 17.35 3APR 219ATOM 8 C GLY E 2 25.653 44.308 110.579 1.00 16.49 3APR 220ATOM 9 O GLY E 2 26.258 45.296 110.994 1.00 15.35 3APR 221ATOM 10 N VAL E 3 26.213 43.110 110.521 1.00 16.21 3APR 222ATOM 11 CA VAL E 3 27.594 42.879 110.975 1.00 16.02 3APR 223ATOM 12 C VAL E 3 28.569 43.613 110.055 1.00 15.69 3APR 224ATOM 13 O VAL E 3 28.429 43.444 108.822 1.00 16.43 3APR 225ATOM 14 CB VAL E 3 27.834 41.363 110.979 1.00 16.66 3APR 226ATOM 15 CG1 VAL E 3 29.259 41.013 111.404 1.00 17.35 3APR 227ATOM 16 CG2 VAL E 3 26.811 40.649 111.850 1.00 17.03 3APR 228

http://www.rcsb.org/pdb/

Protein Structure and Function 7CS790 – Bioinformatics

Practical Assignment #1Practical Assignment #1 Get entry 2APR from the PDB. This is an

Aspartic Protease structure. Download Rasmol or Raswin and load 2APR. Render the molecule as sticks with CPK

coloring and print the image. Render the molecule as either a ribbons or

cartoon image, showing secondary structure. Rotate the molecule to show at least one beta

sheet and one alpha helix. Print this image and turn it in as well.

Protein Structure and Function 8CS790 – Bioinformatics

The Protein Folding ProblemThe Protein Folding Problem Central question of molecular biology:

“Given a particular sequence of amino acid Given a particular sequence of amino acid residues (primary structure), what will the residues (primary structure), what will the tertiary/quaternary structure of the resulting tertiary/quaternary structure of the resulting protein be?”protein be?”

Input: AAVIKYGCAL…Output: 11, 22…= backbone conformation:(no side chains yet)

Protein Structure and Function 9CS790 – Bioinformatics

Protein Folding – Biological perspectiveProtein Folding – Biological perspective Central dogma: Central dogma: Sequence specifies structureSequence specifies structure Denature – to “unfold” a protein back to

random coil configuration-mercaptoethanol – breaks disulfide bonds• Urea or guanidine hydrochloride – denaturant

Anfinsen’s experiments• Denatured ribonuclease• Spontaneously refolded into enzymatically active

form Verified for numerous proteins

Protein Structure and Function 10CS790 – Bioinformatics

Folding intermediatesFolding intermediates Levinthal’s paradox – Consider a 100 residue

protein. If each residue can take only 3 positions, there are 3100 = 5 1047 possible conformations.• If it takes 10-13s to convert from 1 structure to

another, exhaustive search would take 1.6 1027 years!

Folding must proceed by progressive stabilization of intermediates• Molten globules – most secondary structure formed,

but much less compact than “native” conformation.

Protein Structure and Function 11CS790 – Bioinformatics

Ideas on protein foldingIdeas on protein folding It is believed that hydrophobic collapse is a key

driving force for protein folding• Hydrophobic core!

Proteins are, in fact, only marginally stable• Native state is typically only 5 to 10 kcal/mole more

stable than the unfolded form Many proteins help in folding

• Protein disulfide isomerase – catalyzes shuffling of disulfide bonds

• Chaperones – break up aggregates and (in theory) unfold misfolded proteins

Protein Structure and Function 12CS790 – Bioinformatics

The Hydrophobic CoreThe Hydrophobic Core Hemoglobin A is the protein in red blood cells

(erythrocytes) responsible for binding oxygen. The mutation E6V in the chain places a

hydrophobic Val on the surface of hemoglobin The resulting “sticky patch” causes hemoglobin

S to agglutinate (stick together) and form fibers which deform the red blood cell and do not carry oxygen efficiently

Sickle cell anemia was the first identified molecular disease

Protein Structure and Function 13CS790 – Bioinformatics

Sickle Cell AnemiaSickle Cell Anemia

Sequestering hydrophobic residues in Sequestering hydrophobic residues in the protein core protects proteins from the protein core protects proteins from hydrophobic agglutination.hydrophobic agglutination.

Protein Structure and Function 14CS790 – Bioinformatics

Computational Protein FoldingComputational Protein Folding Two key questions:

• Evaluation – how can we tell a correctly-folded protein from an incorrectly folded protein?

H-bonds Electrostatics Hydrophobic exposure Etc.

• Optimization – once we get an evaluation function, can we optimize it?

Simulated annealing EC Etc.

Protein Structure and Function 15CS790 – Bioinformatics

Evaluation of Protein FoldsEvaluation of Protein Folds Empirical potential functions

• Residue-based: spatial relationships among residues

• Stereochemistry-based: molecular interactions (covalent, electrostatic, etc.) with coefficients

Ab-initio potential functions Procheck, etc. Full molecular dynamics

• Very computationally expensive

Protein Structure and Function 16CS790 – Bioinformatics

Threading: Fold recognitionThreading: Fold recognition Given:

• Sequence: IVACIVSTEYDVMKAAR…

• A database of molecular coordinates

Map the sequence onto each fold

Evaluate• Objective 1: improve

scoring function• Objective 2: folding

Protein Structure and Function 17CS790 – Bioinformatics

Fold OptimizationFold Optimization Simple lattice models (HP-

models)• Two types of residues:

hydrophobic and polar• 2-D or 3-D lattice• The only force is hydrophobic

collapse• Score = number of HH

contacts

Protein Structure and Function 18CS790 – Bioinformatics

The “hydrophobic zipper” effect:

Learning from Lattice ModelsLearning from Lattice Models

Ken Dill ~ 1997

Protein Structure and Function 19CS790 – Bioinformatics

Secondary Structure PredictionSecondary Structure Prediction Easier than folding

• Current algorithms can prediction secondary structure with 70-80% accuracy

Chou, P.Y. & Fasman, G.D. (1974). Biochemistry, 13, 211-222.

• Based on frequencies of occurrence of residues in helices and sheets

PhD – Neural network based• Uses a multiple sequence alignment• Rost & Sander, Proteins, 1994 , 19, 55-72

Protein Structure and Function 20CS790 – Bioinformatics

Secondary Structure PredictionSecondary Structure Prediction

AGVGTVPMTAYGNDIQYYGQVT…AGVGTVPMTAYGNDIQYYGQVT…A-VGIVPM-AYGQDIQY-GQVT…AG-GIIP--AYGNELQ--GQVT…AGVCTVPMTA---ELQYYG--T…

AGVGTVPMTAYGNDIQYYGQVT…AGVGTVPMTAYGNDIQYYGQVT…----hhhHHHHHHhhh--eeEE…----hhhHHHHHHhhh--eeEE…

Protein Structure and Function 21CS790 – Bioinformatics

A Peek at Protein FunctionA Peek at Protein Function Serine proteases – cleave other proteins

• Catalytic Triad: ASP, HIS, SER

Protein Structure and Function 22CS790 – Bioinformatics

Three Serine ProteasesThree Serine Proteases Chymotrypsin – Cleaves the peptide bond on

the carboxyl side of aromatic (ring) residues: Trp, Phe, Tyr; and large hydrophobic residues: Met.

Trypsin – Cleaves after Lys (K) or Arg (R)• Positive charge

Elastase – Cleaves after small residues: Gly, Ala, Ser, Cys

Protein Structure and Function 23CS790 – Bioinformatics

Specificity Binding PocketSpecificity Binding Pocket

Protein Structure and Function 24CS790 – Bioinformatics

onwardonward Apo-proteins and prosthetic groups Lab techniques for proteins

• Gels• Xtal• Digests

Some computational areas of interest• Folding• Docking, screening