Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering...

62
Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    1

Transcript of Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering...

Page 1: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Computational Methods in Molecular Modelling

Uğur SezermanBiological Sciences and Bioengineering ProgramSabancı University, Istanbul

Page 2: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Motivation

Knowing the structure of molecules enables us to understand its mechanism of function

Current experimental techniques X-ray cystallography NMR

Page 3: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.
Page 4: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

PROTEIN FOLDING PROBLEMSTARTING FROM AMINO ACID SEQUENCE

FINDING THE STRUCTURE OF PROTEINS IS CALLED THE PROTEIN FOLDING PROBLEM

Page 5: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.
Page 6: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Forces driving protein folding

It is believed that hydrophobic collapse is a key driving force for protein folding Hydrophobic core Polar surface interacting with solvent

Minimum volume (no cavities) Van der Walls

Disulfide bond formation stabilizesHydrogen bondsPolar and electrostatic interactions

Page 7: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

SECONDARY STRUCTURE PREDICTION

Intro. To Struc.(Tooze and Branden)

Page 8: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Secondary Structure Prediction

AGVGTVPMTAYGNDIQYYGQVT…AGVGTVPMTAYGNDIQYYGQVT…A-VGIVPM-AYGQDIQY-GQVT…AG-GIIP--AYGNELQ--GQVT…AGVCTVPMTA---ELQYYG--T…

AGVGTVPMTAYGNDIQYYGQVT…AGVGTVPMTAYGNDIQYYGQVT…----hhhHHHHHHhhh--eeEE…----hhhHHHHHHhhh--eeEE…

Page 9: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Chou-Fasman ParametersName Abbrv P(a) P(b) P(turn) f(i) f(i+1) f(i+2) f(i+3)Alanine A 142 83 66 0.06 0.076 0.035 0.058Arginine R 98 93 95 0.07 0.106 0.099 0.085Aspartic Acid D 101 54 146 0.147 0.11 0.179 0.081Asparagine N 67 89 156 0.161 0.083 0.191 0.091Cysteine C 70 119 119 0.149 0.05 0.117 0.128Glutamic Acid E 151 37 74 0.056 0.06 0.077 0.064Glutamine Q 111 110 98 0.074 0.098 0.037 0.098Glycine G 57 75 156 0.102 0.085 0.19 0.152Histidine H 100 87 95 0.14 0.047 0.093 0.054Isoleucine I 108 160 47 0.043 0.034 0.013 0.056Leucine L 121 130 59 0.061 0.025 0.036 0.07Lysine K 114 74 101 0.055 0.115 0.072 0.095Methionine M 145 105 60 0.068 0.082 0.014 0.055Phenylalanine F 113 138 60 0.059 0.041 0.065 0.065Proline P 57 55 152 0.102 0.301 0.034 0.068Serine S 77 75 143 0.12 0.139 0.125 0.106Threonine T 83 119 96 0.086 0.108 0.065 0.079Tryptophan W 108 137 96 0.077 0.013 0.064 0.167Tyrosine Y 69 147 114 0.082 0.065 0.114 0.125Valine V 106 170 50 0.062 0.048 0.028 0.053

Page 10: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Computational Approaches

Ab initio methods Threading Comperative Modelling Fragment Assembly

Page 11: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

conformation

ener

gyAb-initio protein structure prediction as

an optimization problem

2. Solve the computational problem of finding an optimal structure.

3.

1. Define a function that map protein structures to some quality measure.

Chen KeasarBGU

Page 12: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

A dream function Has a clear minimum in the native structure. Has a clear path towards the minimum. Global optimization algorithm should find the

native structure.

Chen KeasarBGU

Page 13: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

An approximate function Easier to design and compute. Native structure not always the global minimum. Global optimization methods do not converge. Many

alternative models (decoys) should be generated. No clear way of choosing among them.

Decoy set

Chen KeasarBGU

Page 14: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Fold Optimization

Simple lattice models (HP-models) Two types of residues:

hydrophobic and polar 2-D or 3-D lattice The only force is

hydrophobic collapse Score = number of HH

contacts

Page 15: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

H/P model scoring:

Sometimes: Penalize for buried polar or surface

hydrophobic residues

Scoring Lattice Models

Page 16: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Learning from Lattice Models

Ken Dill ~ 1997

Hydrophobic zipper effect

Page 17: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

diamondlattice

fine square lattice

fragments continuous

Some residues

Basic element

residue

extended atom

atom

half a residue

torsion angle lattice

electrons & protons

Hinds &Levitt

Chen KeasarBGU

Page 18: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

What can we do with lattice models?

For smaller polypeptides, exhaustive search can be used Looking at the “best” fold, even in such a simple

model, can teach us interesting things about the protein folding process

For larger chains, other optimization and search methods must be used Greedy, branch and bound Evolutionary computing, simulated annealing Graph theoretical methods

Page 19: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Inverse Protein Folding Inverse Protein Folding ProblemProblemGiven a structure (or a functionality) identify

an amino acid sequence whose fold will be that structure (exhibit that functionality).

Crucial problem in drug design.NP-hard under most models.

Page 20: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

PROTEIN THREADING

Thread the given sequence to the different structural families exist in structural databases

Choose the optimum structure based on the potential energy function ( contact potential, free energy, e.g.) used

Page 21: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Threading: Fold recognitionGiven:

Sequence: IVACIVSTEYDVMKAAR…

A database of molecular coordinates

Map the sequence onto each fold

Evaluate Objective 1: improve

scoring function Objective 2: folding

Page 22: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Protein Fold Families (CATH,SCOP)

CATH website www.cathdb.info

                                                                   

Page 23: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Genetic Algorithm used as a search tool We are searching for the minima of our fitness function composed of profile and

contact energy terms.

In this problem value encoding have been used. Parents are represented as strings of positions. Population Size is 50.

A sample parent (string of positions) is figured below:

1 2 3 4 5 10 11 12 13 14 23 24 25 26 27 28 29 30 31 32 55 56 57 58

Branch and Bound algorithm have been used to produce random initial parents.

Mutation:

Mutation operator is the shifting of the structure’s position either to the right or left by some units.

Crossover:

Two-point cross-over is applied where , selected suitable structures are exchanged between two parents.

Page 24: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Our Aim

In this research, we have threaded a structurally unknown protein sequence to over 2200 SCOP family fold proteins and sought the best fitting structural family.

We also tried to find the optimum fit of the query sequence to a given fold.

Page 25: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Energy function is a combination of The sequence profile energy Contact Potential energy (inter & intra

structural residues are taken into account)

TotalEnergy= p1 ( ProfileEnergy ) + c1(ContactEnergy)

The weights are chosen such that the contributing energy from profile and contact energy terms will be equal.

Fitness Function

Page 26: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Profile Energy

We do structural alignment on all selected secondary structural units of the sequences.

Same numbered secondary structural units are selected.

Length of the units may differ.-- P E E L L L R W A N F H L E N ( 1aoa)

-- S E K I L L K W V R Q T -- -- -- (1qag)N S E K I L L S W V R Q S T R -- (1dxx)

Sixth helices of the selected all-alfa sequences

Page 27: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Profile Matrix calculated from a structure group

A C D E F G H I K L M N P Q R S T V W Y -

-0.33 -0.67 0.68 0.01 -1.33 0.01 0.34 -1 0.01 -1.33 -0.67 2.34 -0.67 0.01 -0.33 0.34 0.01 -1 -1.33 -0.67 4.01 0.34 -2 -0.33 -1 -3.33 -0.67 -1.33 -3 -0.33 -3.33 -2.33 0.01 2.68 -0.33 -1.67 3.01 1.01 -2.33 -4 -2.33 0.01-1 -3 2.01 6.01 -3 -3 0.01 -4 1.01 -3 -2 0.01 -1 2.01 0.01 -1 -1 -3 -3 -2 0.01

-1 -3 0.01 2.68 -3.67 -2.33 0.01 -3.33 4.34 -3 -2 0.01 -1 2.01 2.01 -0.33 -1 -3 -3 -2 0.01-1.33 -2 -4 -3.67 0.34 -4 -3.67 4.01 -3 3.01 2.34 -3.33 -3.33 -2.67 -3.67 -3 -1 3.01 -2.67 -1 0.01

-2 -2 -4 -3 1.01 -4 -3 2.01 -3 5.01 3.01 -4 -4 -2 -3 -3 -1 1.01 -2 -1 0.01 -2 -2 -4 -3 1.01 -4 -3 2.01 -3 5.01 3.01 -4 -4 -2 -3 -3 -1 1.01 -2 -1 0.01 0.01 -2 -0.67 -0.67 -3 -1 -0.67 -3.33 1.01 -3 -2 0.34 -1.67 0.34 1.68 3.01 1.01 -2.33 -3.67 -1.67 0.01 -3 -5 -5 -3 1.01 -3 -3 -3 -3 -2 -1 -4 -4 -1 -3 -4 -3 -3 15.01 2.01 0.01 1.68 -1 -3.33 -2.33 -1.67 -2.67 -3.33 2.34 -2.33 0.01 0.34 -2.33 -2.33 -2.33 -2.67 -1 0.01 3.34 -3 -1.33 0.01 -1.67 -3.33 -0.67 0.01 -3.33 -2 0.34 -3.67 2.01 -3.33 -2 1.68 -2.67 0.68 4.34 -0.33 -0.67 -3 -3.33 -1.33 0.01 -1.67 -2.67 -1.67 0.34 0.01 -2.67 0.34 -2 0.01 -1 0.01 -1.33 -2 3.34 -0.33 -1 -1.33 -2.33 -0.33 0.68 0.01 -0.33 -1.67 -0.67 -0.67 -2 -1.33 2.34 -2.67 -0.33 -2.33 -1.33 0.68 -1.33 0.01 -0.67 2.01 1.68 -2 -3.33 -0.67 0.01 -0.67 -1 -1.67 -1.33 -0.33 -2 -1.67 0.34 -1.33 1.34 0.68 -1.33 -1.67 -1 -1.33 -0.33 1.34 0.34 -1.67 -1 2.01 -1 -2.33 0.01 2.01 -2 -2 0.01 -2.67 1.34 -2 -1.33 -0.33 -1.33 1.01 2.34 -0.67 -0.67 -2 -2 -1 2.01 -0.33 -0.67 0.68 0.01 -1.33 0.01 0.34 -1 0.01 -1.33 -0.67 2.34 -0.67 0.01 -0.33 0.34 0.01 -1 -1.33 -0.67 4.01

PositionsProfile scores

Residue Names

Page 28: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Contact Potential Energy

Based on the counts of frequency of contacts in a database of known structures converted into energy values.

In this study, contact potential energy is the sum of energies of the residues that are closer than seven angstroms in distance to each other.

Jernigan’s & Dill’s Contact Potential Energy Tables have been used.

Page 29: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Selected Benchmark Set

All Alfa Set :1aoa,1dxx,1qagFold: Calponin-homology domain, CH-domain core: 4 helices: bundle Superfamily: Calponin-homology domain, CH-domain

Family: Calponin-homology domain, CH-domain

All Beta Set :1acx,1hzk,1noa,2mcmFold: Immunoglobulin-like beta-sandwich sandwich; 7 strands in 2 sheetsSuperfamily: Actinoxanthin-like Family: Actinoxanthin-like

Alfa+Beta Set : 1dwn,1e6t,1frs,1qbe,1unaFold: RNA bacteriophage capsid protein 6-standed beta-sheet followed with 2 helices; meander Superfamily: RNA bacteriophage capsid protein Family: RNA bacteriophage capsid protein

Page 30: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Secondary structure prediction results of the family of all alfa proteins

 Eight helixes of the following sequences are selected and each sequence is threaded to the other one and the shifts from the real structures are shown below.

Target Sequences

Template

sequences

1aoa 1dxx 1qag

1aoa T T T T T T T 30 T T T -6 -1 -1 T 27 1 T T T T 12 T T

1dxx T T T -4 1 5 4 9 T -3 T -5 T T T T 3 T T T 1 T 41 37

1qag -1 T T -5 T 4 41 32 5 1 T -6 -1 T -13 -1 T T T T T T T T

Page 31: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Secondary structure prediction results of the family of all beta proteins

 

Target Sequences

Template

sequences

Nine beta sheets of the following sequences are selected and each sequence is threaded to the other one and the shifts from the real structures are shown below.

1acx 1hzk 1noa 2mcm

1acx T T T T T T T T T 1 T T T T -2 T T T T T T T T -3 -1 T T T T T 2 T T 1 2 4

1hzk T T T T T T T T T T T T T T T T T T T T T T T 1 4 T T T T T T -3 -3 T T T

1noa T T T T T T 1 T T T T T T T -1 T T T T T T T T T 5 T T T T T T -2 -2 T T T

2mcm T T T T T T T T T T T T T T T T T T T T T 1 T T T T T T T T 1 T -1 T T T

Page 32: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

   Secondary structure prediction results of the family of alfa-beta proteins

  

Template

sequences

Target Sequences

1dwn 1e6t 1frs 1qbe 1una

1dwn T T 4 T T T T 4 T T T T T T T 5 T T T T T T T T T T T T T T T 5 T T T -1 -1 T T 1

1e6t T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T

1frs T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T -1 T T T T T T 1 T

1qbe -1 T T T 1 T T 3 T T -3 -11 T T T 4 T T -3 -11 T T T 1 T T T T T T T T -1 T T T T T T 1

1una T T T 1 1 T T T T T T -5 T T T T T T T -5 T T 1 T 1 T T T 2 T T T T T T T T T T T

Page 33: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Conclusion for fitting to a given fold

We obtained very good results for all-beta and alfa+beta proteins .

All alfa proteins gave good results generally but we had some shifts for the all alfa structures.

The main reason for the alfa shifts was mainly due to the fact that our all-alfa sequences had a very different lenghts and highly variable sequences which lowered the contribution from the profile scores.

Page 34: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Fold Classification Results

1ubi Threading Results

1e0q

1f9j 1ubi

-3000

-2800

-2600

-2400

-2200

-2000

-1800

-1600

-1400

-1200

-1000

0 100 200 300 400 500 600

Protein ID

En

erg

y V

alu

es

Other members of 1ubi's family

Page 35: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

All Beta

1acx Threading Results

1c01 1zfo 1klo

1acx

-3000

-2800

-2600

-2400

-2200

-2000

-1800

-1600

-1400

-1200

-1000

0 100 200 300 400 500 600 700

Protein ID

En

erg

y V

alu

es

Page 36: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

All Alpha

1bhd Threading Result

1hg6 1dfu 1qld 2pcf

1bhd

-3000

-2800

-2600

-2400

-2200

-2000

-1800

-1600

-1400

-1200

-1000

0 100 200 300 400 500 600 700

Protein ID

En

erg

y V

alu

es

Page 37: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

CONCLUSION

By optimising the fitting process with genetic algorithm and using a correct target function we have obtained quite clear classifications in the base of families.

It is also possible to use this method for superfamily classification by adjusting only profile information and weights.

We also applied the method to 6 CASP proteins and correctly classified their folds.

Page 38: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

HOMOLOGY MODELLING

Using database search algorithms find the sequence with known structure that best matches the query sequence

Assign the structure of the core regions obtained from the structure database to the query sequence

Find the structure of the intervening loops using loop closure algorithms

Page 39: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Homology Modeling: How it works

o Find template

o Align target sequence with template

o Generate model:- add loops- add sidechains

o Refine model

Page 40: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Prediction of Protein Structures

Examples – a few good examples

actual predicted actual

actual actual

predicted

predicted predicted

Page 41: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Prediction of Protein Structures

Not so good example

Page 42: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

1esr

Page 43: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.
Page 44: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.
Page 45: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

TURALIGN: Constrained Structural Alignment Tool For Structure Prediction

Page 46: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Motivation -1: Structure based Alignment

Most of the alignment algorithms are only sequence dependent (Needleman-Wunsch & Smith-Waterman )

Functional sites are usually mismatched Fail to give the best alignment between

highly divergent sequences having very similar structures

Page 47: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Motivation -2:Structure prediction of novel proteins

Using evolutionary information on sequence confirmation

Secondary structure predictions and possible locations of turns should be used for threading

Preservation of favorable contacts

Page 48: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Methods

Motif Alignment Based on Dynamic Algorithm Approach

Recursive Smith-Waterman Local Alignment Algorithm with Affine Gap Penalty Secondary Structure Similarity Matrix BLOSSUM 62 Position Specific Entropy Information

Filtering step using neighbourhood information Jernigan Contact Potential Matrix

Page 49: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Motif Alignment Using Dynamic Algorithm

Page 50: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Motif Alignment Using Dynamic Algorithm

Functional sites and motifs in template protein can be either given as input to the program or prosite scan* tool is used to detect the motifs.

*Gattiker,A et.al. Bioinformatics 2002:1(2) 107-108.

Page 51: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Recursive Smith-Waterman Local Alignment Algorithm with Affine Gap Penalty

50

47

pc

pR>0.9xpc

pL>0.9xpc

pR>0.9xpc

pL>0.9xpc

pc

Page 52: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Recursive Smith-Waterman Local Alignment Algorithm with Affine Gap Penalty

Build 3 matrices: A for the matches; B for the gaps on template; C for gaps on target.

S(i,j) : Pairwise Similarity Score go : Gap opening penalty ge : Gap extension penalty

Tracing back : Include the paths that have score > 0.9xMax

ge} j-ige, C go j-i ge, B go j-i{ A ji•C

ge} go ji- ge, C ji- ge, B go ji-{ A ji•B

S(i,j)} j-i- { XX ji•A CBA

)1,()1,()1,(max),(

),1(),1(),1(max),(

)1,1(max),( },,{

Page 53: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Recursive Smith-Waterman Local Alignment Algorithm with Affine Gap Penalty

SSS(i,j) : Secondary Structure Similarity

SS(i,j) : Sequence Similarity TS(i,j) : Turn Similarity

sc : Secondary Structure Similarity Coefficientac : Sequence Similarity Coefficienttc : Turn Similarity Coefficient

TS(i,j) tcSS(i,j) acSSS(i,j) scS(i,j)

Page 54: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Secondary Structure Similarity

),()),((),(3

1

ikPkiTSscjiSSSk

S H E L

H 2 -15 -4

E -15 4 -4

L -4 -4 2

Secondary Structure Similarity Matrix*

H H LH:0.7 0.5 0.0E:0.2 0.4 0.3L:0.1 0.1 0.6

Secondary Structure Prediction Servers

tCoefficien Similarity StructureSecondary :

jposition at Target of profile StructureSecondary :)(.,

iposition at Template of StructureSecondary :)(

sc

jP

iT

*Wallqvist,A et al. Bioinformatics. 2000 Nov;16(11):988-1002.

Page 55: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Sequence Similarity

Multiple Sequence Alignment of

Template Protein’s family*

20

1

),(log),()(i

jiPjiPjS...ALVKLI......A-IEII......AL-KLI...

templateof jposition at scoreon Conservati:)(

Matrix ProfileFamily :

templateof jposition at Entropy :)(

iC

P

iS

)(1

1)(

iSiC

),(62)(),( jiBLOSSUMiCacjiSS

*Glaser,F. Et al. Bioinformatics 19:163-164(2003)

Page 56: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Turn Similarity

),()(4),( jTPiTtcjiTS

Turn Prediction Servers

T T NT:0.7 0.5 0.0N:0.3 0.5 1.0

tCoefficien SimilarityTurn :

jposition at Target of profileTurn :)(.,

0 else T;i if 1)(

tc

jP

iT

Page 57: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Gap Penalties...L......-...

gege

gogo

3

23

2

...H/E...

... - ...20gapSec

And vice versa...

Page 58: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

Filtering

For each of the motif alignments get the 25 best alignments

Build a connectivity map of template protein and thread onto target.

jii,

*

,1

ji if ),(-

Å3.7 ji if 0

Å3.7 ji if 1

),(

Matrix PotentialContact Jernigan : J

Matrix Kirchoff:

),(),(

ji

R

R

ji

jiJjicsCS

ij

ij

iji

Get the best 25 alignmentsAccording to the score:

CSSTS *Miyazawa S, Jernigan R L.(1983) Macromolecules ;18:534–552.

Page 59: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

RESULTS

To test our program we have chosen 3 families from ASTRAL40* protein list. Citrate Synthase : 1csh,1iomA,1k3pA Methionine aminopeptidase:1b6a,1xgsA Methyltransferase:1fp2A,1fp1D

As testing measure: RMSD between the predicted and actual structure of target.

Page 60: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

RESULTS

For all the experiments done, our algorithm perfectly matched functional sites and motifs given as input to the program. 1csh vs 1iomA :

RMSD = 2.50 1csh vs 1k3pA

RMSD = 2.12 1k3pA vs 1iomA

RMSD = 3.03 1b6a vs 1xgsA

RMSD = 2.23 1fp2A vs 1fp1D

RMSD = 2.98 At average we got the best results for 5

experiments: RMSD = 2.57 with ac:0.4,sc:0.4,tc:0.2,cc:0

Page 61: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

User Interface of TURALIGN

Page 62: Computational Methods in Molecular Modelling Uğur Sezerman Biological Sciences and Bioengineering Program Sabancı University, Istanbul.

DOMAIN INTERACTIONS