Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues...

44
Protein structure prediction.

Transcript of Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues...

Page 1: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Protein structure prediction.

Page 2: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Protein domains can be defined based on:

• Geometry: group of residues with the high contact density, number of contacts within domains is higher than the number of contacts between domains.

- chain continuous domains - chain discontinous domains

• Kinetics: domain as an independently folding unit.

• Physics: domain as a rigid body linked to other domains by flexible linkers.

• Genetics: minimal fragment of gene that is capable of performing a specific function.

Page 3: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Domains as recurrent units of proteins.

• The same or similar domains are found in different proteins.

• Each domain has a well determined compact structure and performs a specific function.

• Proteins evolve through the duplication and domain shuffling.

• Protein domain classification based on comparing their recurrent sequence, structure and functional features – Conserved Domain Database

Page 4: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Protein folds.

• Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity (topology). Sometimes a few SSEs may be missing.

• Fold classification: structural similarity between folds is searched using structure-structure comparison algorithms.

Page 5: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Definition of protein folds.Protein fold – arrangement of secondary structures into a unique

topology/tertiary structure.

Example of alpha+beta proteins:

•TIM beta/alpha-barrel contains parallel beta-sheet barrel, closed; n=8, S=8; strand order 12345678, surrounded by alpha-helices

•NAD(P)-binding Rossmann-fold domains core: 3 layers, a/b/a; parallel beta-sheet of 6 strands, •order 321456

Page 6: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Fold recognition.

Unsolved problem: direct prediction of protein structure from the physico-chemical principles.

Solved problem: to recognize, which of known folds are similar to the fold of unknown protein.

Fold recognition is based on observations/assumptions:- The overall number of different protein folds is limited

(1000-3000 folds)

- The native protein structure is in its ground state (minimum energy)

Page 7: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Protein structure prediction flowchart

Protein sequence

Database similarity

search

Does sequence align with a protein of known structure

?

Protein family

analysis

Relationship to known structure?

Three-dimensional comparative

modeling

Predicted three-dimensional

structural model

Structural analysis

Is there a predicted structure?

Three-dimensional

structural analysis in laboratory

No

Yes

Yes

NoYes

No

From D.W.Mount

Page 8: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Protein structure prediction.

Prediction of three-dimensional structure from its protein sequence. Different approaches:

- Homology modeling (predicted structure has a very close homolog in the structure database).

- Fold recognition (predicted structure has an existing fold).

- Ab initio prediction (predicted structure has a new fold).

Page 9: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Homology modeling.

Aims to produce protein models with accuracy close to experimental and is used for:

- Protein structure prediction- Drug design- Prediction of functionally important sites (active

or binding sites)

Page 10: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Steps of homology modeling.

1. Template recognition & initial alignment.

2. Backbone generation.

3. Loop modeling.

4. Side-chain modeling.

5. Model optimization.

Page 11: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

1. Template recognition.

Recognition of similarity between the target and template.

Target – protein with unknown structure.

Template – protein with known structure.

Main difficulty – deciding which template to pick, multiple choices/template structures.

Template structure can be found by searching for structures in PDB using sequence-sequence alignment methods.

Page 12: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Two zones of sequence alignment.Two sequences are guaranteed to fold into the same structure if their

length and sequence identity fall into “safe” zone.

50 100 150 200

50

100

Homology modeling zone

Twilight zone

Alignment length

Sequence identity

Page 13: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

2. Backbone generation.

If alignment between target and template is ready, copy the backbone coordinates of those template residues that are aligned.

If two aligned residues are the same, copy their side chain coordinates as well.

Page 14: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

3. Insertions and deletions. insertion

AHYATPTTT AH---TPSS deletion Occur mostly between secondary structures, in the loop

regions. Loop conformations – difficult to predict.

Approaches to loop modeling:- Knowledge-based: searches the PDB for loops with known

structure- Energy-based: an energy function is used to evaluate the

quality of a loop. Energy minimization or Monte Carlo.

Page 15: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

4. Side chain modeling.Side chain conformations – rotamers. In similar proteins -

side chains have similar conformations. If % identity is high - side chain conformations can be copied

from template to target. If % identity is not very high - modeling of side chains using libraries of rotamers and different rotamers are scored with energy functions.

Problem: side chain configurations depend on backbone conformation which is predicted, not real

E1

E2

E3E = min(E1, E2, E3)

Page 16: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

5. Model optimization.

Energy optimization of entire structure.

Since conformation of backbone depends on conformations of side chains and vice versa - iteration approach:

Predict rotamers Shift in backbone

Page 17: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Classwork I: Homology modeling.

- Go to NCBI Entrez, search for gi461699

- Do Blast search against PDB

- Repeat the same for gi60494508

- Compare the results

Page 18: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Fold recognition.

Goal: to find protein with known structure which best matches a given sequence.

Since similarity between target and the closest to it template is not high, sequence-sequence alignment methods fail.

Solution: threading – sequence-structure alignment method.

Page 19: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Threading – method for structure prediction.

Sequence-structure alignment, target sequence is compared to all structural templates from the database.

Requires:- Alignment method (dynamic programming, Monte

Carlo,…)- Scoring function, which yields relative score for

each alternative alignment

Page 20: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Scoring function for threading.

• Contact-based scoring function depends on the amino acid types of two residues and distance between them.

• Sequence-sequence alignment scoring function does not depend on the distance between two residues.

• If distance between two non-adjacent residues in the template is less than 8 Å, these residues make a contact.

Page 21: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Scoring function for threading.

),(),(;),(1,

TrpIlewTyrAlawSaawSN

jiji

Ala

Ile Tyr

Trp

w is calculated from the frequency of amino acid contacts in PDB; ai – amino acid type of target sequence aligned with the position “i” of the template; N- number of contacts

Page 22: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Classwork I: calculate the score for target sequence “ATPIIGGLPY” aligned to template

structure which is defined by the contact matrix.

1 2 3 4 5 6 7 8 9 10

1 * * *

2

3 *

4 *

5 * *

6 *

7 *

8 *

9

10 * *

A T P Y I G L

A -0.2 -0.1 0 -0.1 0.5 -0.2 0.2

T 0.3 -0.1 -0.2 -0.3 0.1 0

P -0.2 -0.4 -0.1 0.1 -0.2

Y -0.4 -0.2 -0.1 -0.2

I 0.3 0.2 0.4

G 0.4 0.2

L 0.3

Page 23: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Alignment algorithms.

• Dynamic programming.

“frozen approximation”: traceback in the alignment matrix is not possible for interactions between two amino acids, so that:

),(1,

N

jiji bawS

b – amino acid type from template, not from target; now the score of every position does not depend on the alignment elsewhere in the sequence.

• Monte Carlo

Page 24: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Optimize the Sum ofResidue-Residue

Contact Potentials ...

…. by a Monte CarloAlignment Algorithm

Page 25: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

CASP prediction competitions.

Page 26: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Threading model validation.

• Correct bond length and bond angles

• Correct placement of functionally important sites

• Prediction of global topology, not partial alignment (minimum number of gaps)

>> 3.8 Angstroms

Page 27: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Placement of functionally important sites in threading.

Prediction of structure of methylglyoxal synthase based on the template of carabamoyl phosphate synthase

Page 28: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Classwork II: Homology modeling.

- Go to NCBI Entrez, search for gi461699

- Do Blast search against PDB

- Repeat the same for gi60494508

- Predict functionally important sites

Page 29: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

GenThreader http://bioinf.cs.ucl.ac.uk/psipred.

1. Predicts secondary structures for target sequence.

2. Makes sequence profiles (PSSMs) for each template sequence.

3. Uses threading scoring function to find the best matching profile.

Page 30: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Classwork III.

- Go to http://bioinf.cs.ucl.ac.uk/psipred

- Go over the options of protein structure prediction program

- Predict structure for protein sequence (“gwu_thread_seq.txt”)

http://bioinf2.cs.ucl.ac.uk/psiout/29594540ad0cf784.gen.html

Page 31: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Protein engineering and protein design.

Protein engineering – altering protein sequence to change protein function or structure

Protein design – designing de novo protein which satisfies a given requirement

Page 32: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Protein engineering strategies.

Goals:• Design proteins with certain function

• Increase activity of enzymes

• Increase binding affinity and specificity of proteins

• Increase protein stability

• Design proteins which bind novel ligands

Page 33: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Protein engineering uses combinatorial libraries.

• Random mutagenesis introduces different mutations in many genes of interest.

• Active proteins are separated from inactive ones: - in vivo (measuring effect on the whole cell)

- in vitro (phage display, gene is inserted into phage DNA, expressed, selected if it binds immobilized target protein)

Page 34: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Specificity of Kunitz inhibitors can be optimized by protein engineering.

• Kunitz domains – specific inhibitors of trypsin-like proteinases, highly conserved structure with only 33% identity.

• Each Kunitz domain recognizes one or more proteinases through the binding loop (yellow).

• Phage display method found mutants of Kunitz inhibitors which have higher specificity than native ones.

• Modeling of mutant proteins showed that enhanced specificity is caused by increased complementarity between binding loop and the active site.

Page 35: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Native state can be stabilized by reducing the difference in entropy

between folded and unfolded conformations

U

F

G

Reaction coordinate

ΔG

STHG

Page 36: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Model system: lysozyme from bacteriophage T4.

• Lysozyme has the ability to lyse certain bacteria by hydrolyzing the b-linkage between N-acetylmuramic acid (NAM) and N-acetylglucosamine (NAG) of the peptidoglycan layer in the bacterial cell wall.

• Conformational transition in lysozyme involves the relative movement of its two lobes to each other in a cooperative manner

Page 37: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Disulfide bridges increase protein stability.

• Increasing stability by reducing the number of unfolded conformations (since enthalpic contribution will be the same for folded and unfolded states).

• Task: to find positions on backbone where Cysteines can be introduced for disulfide bonds formation.

Page 38: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Strategy of introducing a new disulfide bond.

B. Mathews, 1989:• Analysis of disulfide bonds geometries in existing structures.

• Analysis of all pairs of amino acids which are close in space.

• Energy optimization of candidate disulfide bonds.

• Analysis of destabilizing effect of exchanging native amino acids into Cys.

As a result: three disulfide bonds were introduced through mutagenesis experiments in lysozyme

Page 39: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Stability of mutants compared to wild-type protein.

Measure of stability – melting temperature at which 50% of enzyme is inactivated during reversible heat denaturation. For wild-type Tm = 42 C.

• all mutants were more stable than wild-type.

• the longer the loop between Cys, the larger the effect (the more restricted is unfolded state).

• the more disulfide bonds were introduced, the more stable was the mutant.

From B. Mathews et al

Page 40: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Attempts to fill cavities to stabilize lysozyme failed…

• Introduction of cavities of size –CH3 group destabilizes protein by ~ 1kcal/mol.

• T4 lysozyme has two cavities; mutations Leu Phe and Ala Val destabilize the protein by ~ 0.5-1.0 kcal/mol.

• New side-chains (Val and Phe) adopt unfavorable conformations in cavities.

Page 41: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Classwork IV: analyzing the lysozyme’s mutants.

• Retrieve structure neighbors (1PQM and 1KNI) of 2LZM.

• Which mutant might have an increased stability and why?

Page 42: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Can structural scaffolds be reduced in size with maintaining function?

A. Braisted & J.A. Wells used Z-domain (58 residues) of bacterial protein A:

• removed third helix (truncated protein - 38 residues);

• mutated residues in the first and second helices;

• used phage display to select active forms;

• restored the binding of truncated protein.

Page 43: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Designing an amino acid sequence that will fold into a given structure.

• Inverse protein folding problem: designing a sequence which will fold into a given structure – much easier than folding problem!

• B. Dahiyat & S. Mayo: designed a sequence of zinc finger domain that does not require stabilization by Zn.

• Wild type protein domain is stabilized by Zn (bound to two Cys and two His); mutant is stabilized by hydrophobic interactions.

Page 44: Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Paracelsus challenge: convert one fold into another by changing 50% of residues.

• Challenge because all proteins with > 30% identity seem to have the same fold.

• L.Regan et al: Protein G (mainly beta-sheet) was converted to Rop protein (alpha-helical) by changing only 50% residues