Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

104
Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL

Transcript of Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Page 1: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Lesson 7Protein Structure

Prediction

GHIKLSYTVNEQNLKPERFFYTSAVAIL

Page 2: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Outline:

• Motivation

• Structure prediction approaches– Ab-initio– Threading– Homology modeling

• Hands ON

Page 3: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Protein 3D StructuresA protein’s structure has a critical effect on its function:

1. Binding pockets

PDB ID 1nw7

Page 4: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Protein 3D StructuresA protein’s structure has a critical effect on its function:

2. Areas of specific chemical\electrical properties

Page 5: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Protein 3D StructuresA protein’s structure has a critical effect on its function:

3. Importance of the global fold for function

Page 6: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Motivation to Acquire a Structure

• Identifying active and binding sites

• Characterization of the protein’s mechanism (catalysis & interactions)

• Searching for ligand of a given binding site

• Understanding the molecular basis of diseases

• Designing mutants

• Drug design

• And more...

Page 7: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Protein Structure Prediction

Why predict protein structure if we can use experimental tools to determine it?

• Experimental methods are slow and expensive

• Some structures were failed to be solved

• A representative family structure can suffice to

deduce structures of the entire family sequences

Page 8: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Structure Prediction Approaches

1. Homology (Comparative) Modeling

Based on sequence similarity with a protein for

which a structure has been solved.

2. Threading (Fold Recognition)

Requires a structure similar to a known structure

3. Ab-initio fold prediction

Not based on similarity to a sequence\structure

Page 9: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Ab-initioStructure prediction from “first principals”:

Given only the sequence, try to predict the structure

based on physico-chemical properties

(energy, hydrophobicity etc.)

• When all else fails works for novel folds

• Shows that we understand the process

Page 10: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

The Force Field(energy function)

A group of mathematical expressions describing the

potential energy of a molecular system

• Each expression describes a different type of physico-

chemical interaction between atoms in the system:

• Van der Waals forces

• Covalent bonds

• Hydrogen bonds

• Charges

• Hydrophobic effects

Non-bonded terms

Page 11: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

• Current methods (e.g. Rosetta) primarily utilize the fact that although we are far from observing all protein folds, we probably have seen nearly all sub-structures:

Ab-initio

Moult J. Philos. Trans. R. Soc. B. 361:453–458 (2006)

Local sequence-structure relationships:

• A library of known sub-structures (fragments less than 10 residues) is created.

• A range of possible conformations for each fragment in the query protein are selected.

Page 12: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Ab-initio

Moult J. Philos. Trans. R. Soc. B. 361:453–458 (2006)

Non-local sequence-structure relationships:

• The primary nonlocal interactions considered are hydrophobic burial, electrostatics, main-chain hydrogen bonding etc.

Structures that are consistent with both the local and non-local interactions are generated by minimizing the non-local interaction energy in the space definedby the local structure distributions.

Page 13: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Ab-initio - Example

Moult J. Philos. Trans. R. Soc. B. 361:453–458 (2006)

Page 14: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Given a sequence and a library of folds, thread the sequence

through each fold. Take the one with the highest score.

• Method will fail if new protein does not belong to any fold in

the library.

• Score of the threading is computed based on known

physical chemistry properties and statistics of amino acids.

Fold Recognition(Threading)

Page 15: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Input:1. sequence

H bond donorH bond acceptor

GlycinHydrophobic

2. Library of folds of known proteins

Threading: example

Page 16: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

S=20S=5S=-2Z=5Z=1.5Z= -1

H bond donorH bond acceptorGlycineHydrophobic

Threading: example

Page 17: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

EEabab A C D E …..

A -3 -1 0 0 ..C -1 -4 1 2 ..D 0 1 5 6 ..E 0 2 6 7 ... . . . .

ACCECADAAC -3-1-4-4-1-4-3-3=-23

• structural templatestructural template

• neighbor definitionneighbor definition

• energy functionenergy function

11

22

33

44

55

66

77

1010

88

99

AA

CC

CC

EE

CC

AA

DDAA

AA

CC

E Eji, positions

ba ji

Threading: example

Page 18: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

MAHFPGFGQSLLFGYPVYVFGD...

Potential fold

...

1) ... 56) ... n)

...

-10 ... -123 ... 20.5

Find best fold for a protein sequence: Fold recognition (threading)

Page 19: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Fold recognition: FFAS03

•The FFAS03 server provides an interface to the third generation of the profile-profile alignment and fold recognition algorithm FFAS.

• Profile-profile alignments utilize information present in sequences of homologous proteins to amplify the sequence conservation pattern defining the family

•The result: detection of remote homologies beyond the reach of other sequence comparison methods.

Jaroszewski, L., Rychlewski, L., Li, Z., Li, W. & Godzik, A. (2005) FFAS03: a server for profile-profile sequence alignments. Nucl. Acids Res. 33, W284-W288

Page 20: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Fold recognition: HHPRED

0.1

0.4

0.5

0.3

0.7

0.4

0.6

0.7

0.1

0.2

0.6

Emit Amino acid

Profiles are based on Hidden Markov Models:

Söding J. (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951-960.

Page 21: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Fold recognition: HHPRED

• Profile Hidden Markov Models (HMMs) are similar to sequence profiles, but in addition to the amino acid frequencies they contain information about the frequency of inserts and deletions.

• Using profile HMMs in place of simple sequence profiles should therefore further improve sensitivity.

• HHpred is the first server to employ HMM-HMM comparison, based on a novel statistical method. Using HMMs both on the query and the database side greatly enhances the sensitivity and selectivity over sequence-profile based.

Söding J. (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951-960.

Page 22: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

I-TASSER- Hybrid Approach

• In a recent wide blind experiment, CASP7, I-TASSER generated the best 3D structure predictions among all automated servers.

•Based on the secondary-structure threading and the iterative implementation of the Threading ASSEmbly Refinement (TASSER) program.

•For predicting the biological function of the protein, the I-TASSER server matches the predicted 3D models to the proteins in 3 independent libraries which consist of proteins of known enzyme classification (EC) number, gene ontology (GO) vocabulary, and ligand-binding sites.

Page 23: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

I-TASSER

Page 24: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Test Case:Rosetta Vs. TASSER

Grey: Crystal structure of β2-adrenergic receptor

Purple: Rosetta prediction, starting from homology modeling

Green: TASSER prediction

Page 25: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Homology Modeling

Page 26: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Homology Modeling – Basic Idea

Triophospate ismoerases44.7% sequence identity0.95 RMSD

1. A protein structure is defined by its amino acid sequence.

2. Closely related sequences adopt highly similar structures, distantly related sequences may still fold into similar structures.

3. Three-dimensional structure of

proteins from the same family is

more conserved than their

primary sequences.

Page 27: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Query proteinsequence

Homology modeling- widespread technique

e.g. Fiser et al., 2004; Petrey et al., 2005; Zhang, 2008

Homologous protein-structural template

Align query & templateprotein sequences

Build model

Evaluate model

Identify Homologous protein-structural template

Align query & templateprotein sequences

Page 28: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

General Scheme

1. Searching for structures related to the query sequence

2. Selecting templates

3. Aligning query sequence with template structures

4. Building a model for the query using information from the template structures

5. Evaluating the model

Fiser A et al. Methods in Enzymology 374: 461-491(2004)

Page 29: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

General Scheme

Page 30: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Homology modeling requires handling structures & sequences

• Query- only the protein sequence is available- usually found at the UniProt database

• Template- after identification, both structural and sequence-related data should be found- UniPort (or NCBI databases), RCSB and PDBsum

Page 31: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

1. Searching For Structures

• Sequence search against the PDB sequences

• Sequence-profile search

• Threading: sequence-structure fitness function

Page 32: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

1. Searching For StructuresIf BLAST search against the PDB fail to recognize adequate templates, turn to fold recognition (threading) servers:

• FFAS03- http://ffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl

• HHPRED- http://toolkit.tuebingen.mpg.de/hhpred

• HMAP (available through the FUDGE pipeline)- http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software:PUDGE

• I-TASSER- http://zhang.bioinformatics.ku.edu/I-TASSER/

These servers not only find optional templates, but also suggest a pairwise alignment and in some cases even construct the 3D model.

Page 33: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

2. Selecting TemplatesHow to select the right template?

• Higher sequence similarity - %ID

• Close subfamily - phylogenetic tree

• “Environment” similarity - solvent, pH, ligand, quaternary interactions

• The quality of the experimentally determined

structure

• Purpose of modeling - e.g. protein-ligand model vs. geometry of active site

Seq. 2

Seq. 1

Seq. 3

Seq. 4

Seq. 5Seq. 6

Page 34: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

2. Selecting Templates

More than one template

• Two ways to combine multiple templates:

– Global model – alignment with different domain of the target with little overlap between them

– Local model – alignment with the same part of the target

Page 35: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

More than one template

The more the merrier -

multiple structures with

the same fold:

2. Selecting Templates

Page 36: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

2. Selecting Templates

Trial and error

• Generate a model for each candidate template and/or their combination.

• Evaluate the models by an energy or any other scoring function.(will be discussed later…)

Page 37: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

3. Aligning query and template sequences

• All comparative modeling programs depend on a target-template alignment.

• When the sequence similarity between the template and target proteins is high, simple pairwise alignments are usually fine (e.g. Needleman-Wunsch global alignment).

• Gaps or low/medium sequence similarity indicate that we should improve the alignment...

Page 38: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Guidelines:

1. Create a multiple sequence alignment and extract thetemplate-query pairwise alignment.

3. Aligning query and template sequences

Page 39: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Pairwise alignments – not enough!

Page 40: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Guidelines:

1. Create a multiple sequence alignment and extract thetemplate-query pairwise alignment.

• Visual inspection of alignments - difficult to teach… a matter of experience…

TemplateQuery

3. Aligning query and template sequences

Page 41: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Guidelines:

1. Create a multiple sequence alignment and extract thetemplate-query pairwise alignment.

2. Use secondary structure information to improve pairwise alignment- avoid gaps in these regions!

QueryTemplate

3. Aligning query and template sequences

Page 42: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Guidelines:

1. Create a multiple sequence alignment and extract thetemplate-query pairwise alignment

2. Use secondary structure information to improve pairwise alignment- avoid gaps in these regions!

3. Biochemical and structural previous data

3. Aligning query and template sequences

Page 43: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

3. Aligning query and template sequences

• Most importantly, make sure that both the query and the selected template are included in the MSA.

• Sequences which are more distant than the template are not needed to be included in the alignment.

Page 44: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

3. Aligning query and template sequences

Query-template alignment via a profile-to-profile approach:

1. Construct an MSA for the query, serving as profiles depicting the protein family properties.

2. Align the profile to profiles of all proteins of the PDB, using, e.g., FFAS03 or HHpred.

3. Compare pairwise alignments constructed via the different methods – hope to get a consensus prediction…

Page 45: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

3. Aligning query and template sequences

Different levels of similarity between the template & query initiate various computational approaches:

Page 46: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

4. Building a model

Four methods to construct the 3D model:

By rigid body assembly

By segment matching

By satisfaction of spatial restrains

By searching the conformational space

Page 47: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

4. Building a model

Once you have an improved pairwise alignment between your query & template

Use NEST or Modeller to build your model!

Page 48: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

>P1;1VM6structureX:1VM6:1 :A :212 :A ::::-----MKYGIVGYSGRMGQEIQKVFSE-KGHELVLKVDV------------------------NGVEEL-DSPDVVIDFSSPEALPKTVDLCKKYRAGLVLGTTALKEEHLQMLRELSKEVPVVQAYNFSIGINVLKRFLSELVKVLEDWDVEIVETHHRFKKDAPSGTAILLESAL---------------------GK----SVPIHSLRVGGVPGDHVVVFGNIGETIEIKHRAISRTVFAIGALKAAEFLVGKDPGMYSFEEVI----*

>P1;DAPB_ECOLIsequence:DAPB_ECOLI:1 : :272 ::::MHDANIRVAIAGAGGRMGRQLIQAALALEGVQLGAALEREGSSLLGSDAGELAGAGKTGVTVQSSLDAVKDDFDVFIDFTRPEGTLNHLAFCRQHGKGMVIGTTGFDEAGKQAIRDAAADIAIVFAANFSVGVNVMLKLLEKAAKVMGDYTDIEIIEAHHRHKVDAPSGTALAMGEAIAHALDKDLKDCAVYSREGHTGERVPGTIGFATVRAGDIVGEHTAMFADIGERLEITHKASSRMTFANGAVRSALWLSGKESGLFDMRDVLDLNN*

4. Building a modelPIR format needed as input

Must match the PDB file name

Indicates that this is the template

Residues that take part in the alignment (pdb indexing!) and chain

End of alignment

Target sequence Target name Residues that take part in the alignment

End of alignment

Page 49: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

NEST- Incorporates a variety of programs to

facilitate the model building

• Input:

1. Sequence alignment of a query to one (or more) template PDBs

2. The template PDB file(s) in the same directory

• Output: a 3D model in PDB format

• Capabilities:1. Model building with artificial evolution2. Sequence alignment tuning3. Composite structure building\multiple templates4. Structure refinement

4. Building a model

Page 50: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

NEST - Based on “artificial evolution”:

• Changes to the template structure, such as residue mutation, insertions or deletions are made one at a time.

• After each change, a slight energy minimization is preformed to avoid atom clashes.

•This process is repeated until the target sequence is completely modeled.

•The resulting structure is subjected to minimization - energy is calculated based on a simplified potential function that includes: van der Waals, hydrophobic, electrostatic, torsion angle and hydrogen- bond terms.

4. Building a model

Page 51: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

• Spatial features, such as calpha-calpha distances, hydrogen bonds, and mainchain and sidechain dihedral angles, are transferred from the templates to the target.

• Thus, a number of spatial restraints on its structure are obtained.

• The 3D model is obtained by satisfying all the restraints as well as possible.

4. Building a modelModeller

Page 52: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

4. Building a modelModeller

• Distance and dihedral angle restraints on the target are calculated from its alignment with template.

• Restraints were obtained also from a statistical analysis of the relationships from a large database of pairs of homologous structures.

• Various correlations were obtained, e.g. correlations between Ca-Ca distances. These relationships can be used directly as spatial restraints.

• Restraints and CHARMM energy terms are then combined into an objective function, which is optimized in 3D space.

Page 53: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

4. Building a modelModeller

Generation and Refinement Using satisfaction of spatial restrains Can perform additional tasks:

de novo modeling of loops Optimization of models – using an objective

function Multiple alignment Comparison of protein structures

Page 54: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Input files: Template pdb:

1VM6.pdb

Template – target alignment in PIR format:

alignment.ali

Modeller script file:

model-default.py

4. Building a modelModeller

Page 55: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

4. Building a modelModeller- script for running on biocluster

Model-default.py# Homology modelling by the automodel class

from modeller import * # Load the automodel classfrom modeller.automodel import *

log.verbose() # request verbose outputenv = environ()

a = automodel(env,alnfile='dapb_1vm6.pir', #alignment, template and target

knowns='1VM6', #template or templatessequence='DAPB_ECOLI') #query name in PIR

a.starting_model= 1 # index of the first model a.ending_model = 1 # index of the last model # (determines how many models to calculate)a.make() # do the actual homology modelling

Page 56: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

4. Building a modelModeller- script for PDB numbering

from modeller import *env = environ()code = '2A79'mdl = model(env, file=code)aln = alignment(env)aln.append_model(mdl, align_codes=code)aln.write(file=code+'.seq')

• Run “mod9v7 [numbering script]”• the PDB sequence “[pdb].seq”• Find out the correct numbering for the template in PIR file….

Page 57: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

4. Building a modelModeller Vs. Nest

NEST:1. PDB file2. PIR file

Run “nest [pir file]” #need access to the unix/linux system

Modeller:1. PDB file2. PIR file3. Modeller script

Run: “mod9v7 [script file]” #need to install on windows

Page 58: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

4. Building a modelComparison of approaches

Page 59: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

5 .Model Evaluation

• The accuracy of the model depends on its sequence identity with the template:

Page 60: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

5 .Model EvaluationThe model can be assessed in two levels:

• Global- reliability of the model as a whole.*Useful when several models are generated and one should be chosen as the best one.*When different models were based on various templates, may help choose the best one.

• Local- assessing the reliability of the different regions, even specific residues, of the model. *Useful to detect local mistakes, that may originate in many time from alignment errors.

Page 61: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

5 .Model EvaluationExamples of assessment approaches:

1. Assessment of the model’s stereochemistry

2. Prediction of unreliable regions of the model - “pseudo energy” profile: peaks errors

3. Consistence with experimental observations

4. Consistence with evolutionary conservation rates

5. And more…

Page 62: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Summary :

5 Basic Steps

Page 63: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Hands ON

Page 64: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

The Query ProteinName: Dihydrodipicolinate reductase

Enzyme reaction:

Molecular process: Lysine biosynthesis (early stages)

Organism: E. coli

Sequence length: 273 aa

Page 65: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

1. Searching For Structures

Page 66: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

1. Searching For Structures

Get your sequence

<DAPB_ECOLIMHDANIRVAIAGAGGRMGRQLIQAALALEGVQLGAALEREGSSLLGSDAGELAGAGKTGVTVQSSLDAVKDDFDVFIDFTRPEGTLNHLAFCRQHGKGMVIGTTGFDEAGKQAIRDAAADIAIVFAANFSVGVNVMLKLLEKAAKVMGDYTDIEIIEAHHRHKVDAPSGTALAMGEAIAHALDKDLKDCAVYSREGHTGERVPGTIGFATVRAGDIVGEHTAMFADIGERLEITHKASSRMTFANGAVRSALWLSGKESGLFDMRDVLDLNNL

http://www.uniprot.org/

Page 67: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

1. Searching For StructuresFind templates with significant homology:

• BLAST against the sequences in the PDB

Find also more distant templates, using profile-to-profile approach:

• FFAS03 server• HHPRED server

Page 68: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

1. Searching For StructuresBlast against the PDB

http://www.ncbi.nlm.nih.gov/BLAST/

Page 69: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

1. Searching For StructuresBlast against the PDB

1 .Paste sequence

2. Select the PDB database

3.

http://www.ncbi.nlm.nih.gov/BLAST/

Page 70: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

1. Searching For StructuresBlast against the PDB

http://www.ncbi.nlm.nih.gov/BLAST/

Page 71: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

2. Selecting templates

Page 72: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

2. Selecting templatesBlast against the PDB

The real structureof our protein

Closest homologousstructure

Page 73: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

2. Selecting templatesBlast against the PDB

http://www.ncbi.nlm.nih.gov/BLAST/

The selected template:

1VM6, chain A

Page 74: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

2. Selecting templatesWho is our template?

www.ebi.ac.uk/thornton-srv/databases/pdbsum

PDB ID 1VM6 is UniProt entry

‘DAPB_THEMA’

Page 75: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

3. Alignment

Page 76: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

3. AlignmentFind query’s homologous sequences

1 .Paste query sequence

2.

http://conseq.bioinfo.tau.ac.il/

Page 77: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Download the query’s

alignment

3. AlignmentFind query’s homologous sequences

Page 78: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

1. Open: Start Phylogeny BioEdit

2. Open the alignment: file open ‘query.aln’

2. Select the template:Edit Search Find in Titles “DAPB_THEMA”

3. AlignmentExtract query-template pairwise alignment

Page 79: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

3. AlignmentExtract query-template pairwise alignment

“DAPB_THEMA”

Page 80: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

4. Add the query to the template selection: ctrl + ‘query’

5. Invert selection: Edit invert title selection

6. Delete other sequences: Edit Cut Sequences(s)

7. Minimize gaps: Alignment Minimize Alignment

8. Save the pairwise alignment:File Save as “DAPB_ECOLI_1VM6.fas”

3. AlignmentExtract query-template pairwise alignment

Page 81: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

3. AlignmentExtract query-template pairwise alignment

Save as “fasta” format

queryDAPB_THEMA

File name

Page 82: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Use fold recognition - FFAS03

Scores below -9.5 significant

3. Alignment

Page 83: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Use fold recognition - FFAS03

3. Alignment

http://ffas.ljcrf.edu/ffas-cgi/cgi/get_mu.pl?ses=&qdb=public&tdb=PDB0408&type=re&key=221830166.3750.0000000

Page 84: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Use fold recognition - HHPRED

http://toolkit.tuebingen.mpg.de/hhpred/histograms/8455009

3. Alignment

Page 85: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Use fold recognition - HHPRED3. Alignment

Page 86: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

• NEST and Modeller require a specific file format - unfortunately we will have to edit the pairwise alignment.

3. AlignmentEdit query-template pairwise alignment

>P1;1VM6structureX:1VM6:1 :A :212 :A -----MKYGIVGYSGRMGQEIQKVFSE-KGHELVLKVDV------------------------NGVEEL-DSPDVVIDFSSPEALPKTVDLCKKYRAGLVLGTTALKEEHLQMLRELSKEVPVVQAYNFSIGINVLKRFLSELVKVLEDWDVEIVETHHRFKKDAPSGTAILLESAL---------------------GK----SVPIHSLRVGGVPGDHVVVFGNIGETIEIKHRAISRTVFAIGALKAAEFLVGKDPGMYSFEEVI----*

>P1;DAPB_ECOLIsequence:DAPB_ECOLI:1 : :272 : MHDANIRVAIAGAGGRMGRQLIQAALALEGVQLGAALEREGSSLLGSDAGELAGAGKTGVTVQSSLDAVKDDFDVFIDFTRPEGTLNHLAFCRQHGKGMVIGTTGFDEAGKQAIRDAAADIAIVFAANFSVGVNVMLKLLEKAAKVMGDYTDIEIIEAHHRHKVDAPSGTALAMGEAIAHALDKDLKDCAVYSREGHTGERVPGTIGFATVRAGDIVGEHTAMFADIGERLEITHKASSRMTFANGAVRSALWLSGKESGLFDMRDVLDLNN*

Page 87: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

4. Model Building

Page 88: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

4. Model BuildingGet the template structure

1 .Paste the template’s PDB ID “1VM6”

2 .

http://www.rcsb.org/pdb/home/home.do

Page 89: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Get the template structure: 1vm6 chain A

Save as: “1VM6.pdb”

4. Model Building

Notice:case

sensitive!

Page 90: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Use LECS-

• As there is no webserver for building models, we will use our linux cluster

• We need to use two programs:

– A program to transfer the files from our computer to the linux cluster- WinSCP

– A terminal to run the commands- PuTTY

• Our username: nest

• Our password: uniprot1

4. Model Building

Page 91: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

http://www.salilab.org/modeller/download_installation.html

4. Model Building

Page 92: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

4 .Model BuildingRunning modeller:

1 .Put the PDB file, PIR alignment and modeller script in a specific directory, e.g. c:\test

2 .Desktop Modeller:

Page 93: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

4 .Model BuildingRunning modeller:

3. “cd c:\test” 4. “mod9v7 [modeller script name]

Page 94: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

4 .Model BuildingRunning modeller:

5 .The run completed successfully :

Page 95: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

4. Model Building

Running Modeller:Output files:

• Model-structure, e.g. “DAPB_ECOLI.B99990001.pdb”

• Log file- very important- specifies the problems of the run

• Other, not important, files

Page 96: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

5. Evaluation

Page 97: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Model Visualization

1. Open: Start Bioinformatics RasTop

2. Get the model: file open DABP_ECOLI_final.pdb

5. Evaluation

Page 98: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Active Site Residues

5. Evaluation

Page 99: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Stereochemistry -ProCheck

5. Evaluation

Page 100: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Model Conservation5. Evaluation

http://consurf.tau.ac.il

Page 101: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Model Conservation5. Evaluation

http://consurf.tau.ac.il

Page 102: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Model Conservation5. Evaluation

http://consurf.tau.ac.il

Page 103: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Real Vs. Model Superimposition

Page 104: Lesson 7 Protein Structure Prediction GHIKLSYTVNEQNLKPERFFYTSAVAIL.

Useful Links1. Searching for structures

– PDB-Blast at NCBI- http://blast.ncbi.nlm.nih.gov/Blast.cgi – Meta server- 3D judry http://bioinfo.pl/meta/– FFAS03- http://ffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl – HHPRED- http://toolkit.tuebingen.mpg.de/hhpred – FUDGE- pipeline- http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software:PUDGE

2. Selecting templates

3. Aligning query sequence with template structures– MSA - MUSCLE, T-coffee and MAFFT at http://toolkit.tuebingen.mpg.de/sections/alignment – Alignment editor – Bioedit - http://www.mbio.ncsu.edu/BioEdit/bioedit.html

4. Building a model– Nest - http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software:nest– Modeller - http://salilab.org/modeller/modeller.html

5. Evaluating the model– ConSurf http://consurf.tau.ac.il– PROCHECK http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html – WHATCHECK www.cmbi.kun.nl/swift/whatcheck/– ProSA https://prosa.services.came.sbg.ac.at/prosa.php – ProQ http://www.sbc.su.se/~bjornw/ProQ/ProQ.cgi – AT the Honig lab

http://luna.bioc.columbia.edu/Model_Quality_Assessment/cgi-bin/Model_Quality_Assessment.cgi