Protein Structure Similarity
description
Transcript of Protein Structure Similarity
![Page 1: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/1.jpg)
1
Protein Structure Similarity
![Page 2: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/2.jpg)
2
Computation of Best Matches
Two “simultaneous” subproblems • Find maximal correspondence set C• Find alignment transform T
Chicken-and-egg issue: Each subproblem is relatively simple:
– If we knew C, we could compute T– If we knew T, we could get C by proximity
But the combination is hard !!!
![Page 3: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/3.jpg)
3
Computation of Best Matches
Two “simultaneous” subproblems • Find maximal correspondence set C• Find alignment transform T
Chicken-and-egg issue: Each subproblem is relatively simple:
– If we knew C, we could compute T– If we knew T, we could get C by proximity
But the combination is hard !!!
Only requires computing 6 parameters
![Page 4: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/4.jpg)
4
Find Alignment Transform
Two sets of points A= {a1,…,an} and B = {b1,…,bn}
Correspondence pairs (ai, bi) Find T = arg minT RMSD(A,T(B)) O(n) closed-form solution
[Arun, Huang, and Blostein, 87] [Horn, 87][Horn, Hilden, and Negahdaripour, 88]
![Page 5: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/5.jpg)
5
O(n) SVD-Based Algorithm T combines translation t and rotation R,
such that T(bi) = t + R(bi)
b = (Σi=1,...,nbi)/n [mean of the bi’s] Place the origin of coordinate system at b
minT RMSD(A,T(B)) simplifies to (up to some constants):
t and R can be computed separately
t = a [mean of the ai’s]
n n
2
i i it,Ri=1 i=1
min a-t -2 a,R(b)
[Arun, Huang, and Blostein, 87]
![Page 6: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/6.jpg)
6
O(n) SVD-Based Algorithm A3n = [a1-a, ..., an-a] B3n = [b1-b, ..., bn-
b]
Compute SVD decomposition of 3×3 correlation matrix BAT: BAT = UDVT where D is a diagonal matrices with decreasing non-negative entries (singular values) along the diagonal
If det(U)det(V) = 1 then S = I, else S = diag(1,1,-1)
R = USVT[Arun, Huang, and Blostein, 87]
![Page 7: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/7.jpg)
7
O(n) SVD-Based Algorithm A3n = [a1-a, ..., an-a] B3n = [b1-b, ..., bn-
b]
Compute SVD decomposition of 3×3 correlation matrix BAT: BAT = UDVT where D is a diagonal matrices with decreasing non-negative entries (singular values) along the diagonal
If det(U)det(V) = 1 then S = I, else S = diag(1,1,-1)
R = USVT[Arun, Huang, and Blostein, 87]
![Page 8: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/8.jpg)
8
[Arun, Huang, and Blostein, 87] rotation matrix
[Horn, 87] quaternion
![Page 9: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/9.jpg)
9
Trial-and-Error Approach to Protein Structure
Comparison
Guess small correspondence set
Compute T
Update correspondence set (correspondence from proximity)
Apply T
![Page 10: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/10.jpg)
10
Trial-and-Error Approach to Protein Structure
Comparison
1. Set CS to a seed correspondence set (small set sufficient to generate an alignment transform)
2. Compute the alignment transform T for CS and apply T to the second protein B
3. Update CS to include all pairs of features that are close apart
4. If CS has changed, then return to Step 2 else return (CS,T)
![Page 11: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/11.jpg)
11
Trial-and-Error Approach to Protein Structure
Comparison- result = nil- Iterate N times:
1. Set CS to a seed correspondence set (small set sufficient to generate an alignment transform)
2. Compute the alignment transform T for CS and apply T to the second protein B
3. Update CS to include all pairs of features that are close apart
4. If CS has changed, then return to Step 2 else result result {(CS,T)}
- Return result
![Page 12: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/12.jpg)
12
How to get seed correspondences?
![Page 13: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/13.jpg)
13
Seed Generation from Fragment
1. From distance matricesE.g., DALI [Holm and Sander, 1996]
![Page 14: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/14.jpg)
14
Using Distance Matrices (DALI)
Distances are invariant to rigid-body transformations DALI [Holm and Sander, 1996] looks for similar
hexapeptides by searching for similar 7x7 C-C distance matrices
1
40
85
45
![Page 15: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/15.jpg)
15
Seed Generation from Fragment
1. From distance matricesE.g., DALI [Holm and Sander, 1996]
2. From secondary structure elements (SSE’s)E.g., LOCK [Singh and Brutlag, 1996]
3. From voting scheme (using geometric hashing)E.g., 3dSEARCH [Singh and Brutlag, 2000]
![Page 16: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/16.jpg)
16
LOCK
A.P. Singh and D.L. Brutlag. Hierarchical Protein Structure Superposition Using Both Secondary and Atomic Representations. Proc. ISMB, pp. 284-293, 1997.
LOCK2:J. Shapiro and D.L. Brutlag. FoldMiner: Structural Motif Discovery Using an Improved Superposition Algorithm. Protein Science, 13:278-294, 2004.
http://motif.stanford.edu/lock2/
![Page 17: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/17.jpg)
17
LOCK Two levels of features: SSEs and C
atoms Stage 1 (SSE alignment): Initial alignment
is computed using SSEs represented as vectors
Stage 2 (atom alignment): Alignment is refined using C atoms represented as points
![Page 18: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/18.jpg)
18
Rationale for LOCK Using types of features is an effective way to
reduce combinatorial explosion and computation
SSEs, which are responsible for most of the stability and functionality of the proteins, are more meaningful and better conserved than types of atoms and amino-acids
If 2 structures are similar, some of their SSEs should form similar substructures
Drawback: It narrows down the set of possible applications, e.g., can’t find small motifs at atomic level
![Page 19: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/19.jpg)
19
Vector-Based Representation
-helices
-strands
loops
One vector per SSE (helix, strand, loop)
![Page 20: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/20.jpg)
20
Vector-Based Representation
DSSP [Kabsch and Sander, 1983] classifies residues into helices/strands
For -helix starting at residue i:Xorigin= (0.74Xi + Xi+1 + Xi+2 + 0.74Xi+3)/3.48where Xi is the position of the C atom of residue i(angle between two consecutive residues is 100dg factor 0.74)
Similar computation for Xend and for -strand
![Page 21: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/21.jpg)
21
Position-independent differences: |angle(i,k)-angle(p,r)| |angle(i,j)-angle(p,q)| |angle(j,k)-angle(q,r)| |distance(i,k)-distance(p,r)| |length(k)-length(r)|
Position-dependent differences: angle(k,r) distance(k,r)
Scores are additive
Assume that i and p have been aligned. What is the score of the alignment of k and r?
S(di) = 1+(di/di0)2
2Mi - Mi
Score = S(di)
Scoring Similarity
Maximal score
Value of di forwhich score is 0
![Page 22: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/22.jpg)
22
Stage 1: SSE Alignment1. For every pair of SSE vectors of protein A,
find all pairs of vectors in B that align well using orientation-independent scores seed correspondence sets
2. For each correspondence set: Find alignment transform and apply it to B Find correspondence set with maximal score
(record transform T and correspondence set CS that yields maximal score)
E.g., using start, middle, and end points of vectors
![Page 23: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/23.jpg)
23
Stage 1: SSE Alignment A = (i, j, k, l, m) B = (p, q, r, s, t) Seed correspondence {(i,p),(j,q)}
(m,t)
(i,p), (j,q)
(k,r)
(l,t)
(l,r)(k,t) (m,r)(k,s)
(l,s) (m,s) (m,t) (m,s) (m,t)(l,t) (m,t)
• Simultaneous gaps in both structures are not allowed (not in SCOP2) • Terminate a path when score of new correspondence is negative • Re-compute new transform with each new correspondence (?)
![Page 24: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/24.jpg)
24
Stage 2: Atom (Core) Alignment
1. Construct correspondence pairs of atoms : Atom i of A corresponds to atom j of T(B) iff i is
the closest atom in A to j and j is the closest atom in T(B) to i
The distance between i and T(j) is (3Å)
2. Prune correspondence set to largest subset of correspondence pairs that follow backbone alignment constraint
3. Re-compute T to be the transform that minimizes the RMSD of the atoms in the correspondence set
4. Iterate 1-2-3 until RSMD converges
![Page 25: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/25.jpg)
25
Experimental Results
685 protein structures from PDB such that each pair has less than 25% sequence identity
3 families of folds (based on SCOP classification): - myoglobins (11 structures) – ~20% amino acid identity- TIM barrels (50 structures)- immunoglobulins (38 structures)
Goal: Given one query protein in each family, find the other members of the family (3×685 = 2055 alignments)
Method: For each query, sort the 685 structures by score (computed by LOCK). Select the top k proteins. Count members of family (true positives) and non-members (false positives)
![Page 26: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/26.jpg)
26
# True positives
# False positives
11 0
Myoglobins (11)
# True positives
# False positives
40 0
45 1
50 5
TIM-barrels (50)
# True positives
# False positives
20 0
25 1
30 2
35 11
38 383
Immunoglobulins (38)
![Page 27: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/27.jpg)
27
Alignment of 11 Myoglobins
![Page 28: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/28.jpg)
28
Alignment of 50 TIM barrels
-helices in red-strands in yellow
![Page 29: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/29.jpg)
29
Alignments of 31 Immunoglobulins
Only -strands are shown
![Page 30: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/30.jpg)
30
ROC Curves
![Page 31: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/31.jpg)
31
Running Time
~ 1ms per seed correspondence
~ 1h to search 10,000 protein structures
~ 100s of days to compare all pairs of proteins in PDB
Geometric hashing to speedup stage 1
![Page 32: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/32.jpg)
32
Seed Generation from Fragment
1. From distance matricesE.g., DALI [Holm and Sander, 1996]
2. From secondary structure elements (SSE’s)E.g., LOCK [Singh and Brutlag, 1996]
3. From voting scheme (using geometric hashing)E.g., 3dSEARCH [Singh and Brutlag, 2000]
![Page 33: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/33.jpg)
33
Voting Scheme with Hash Table
Many-to-many comparison requires a better organization of computation to avoid repeating the same computation again and again
Pre-computation: Index proteins in hash table Query phase: Voting scheme using hash table
Several variants on this theme 3d-Lookup [Holm and Sander, 1995]
3dSEARCH [Singh 2002]
![Page 34: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/34.jpg)
34
Voting Scheme with Hash Table
Many-to-many comparison requires a better organization of computation to avoid repeting the same computation again and again
Pre-computation: Index proteins in hash table Query phase: Voting scheme using hash table
Several variants on this theme 3d-Lookup [Holm and Sander, 1995]
3dSEARCH [Singh 2002]
![Page 35: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/35.jpg)
35
Indexing Target Structures in Hash Table (3dSEARCH [Singh
2002]) Hash table: 3-D regular grid of cubic bins (~2Å)
For each target structureFor each pair of vectors (i,j)1. Compute a coordinate system2. Place an entry for each other vector
k into the bin containing the coordinates of the midpoint of the vector (or average of coordinates of origin, middle, and end points). Store ID of coordinate system + k’s orientation and type ( or ) in the entry.
![Page 36: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/36.jpg)
36
u
uv
v
Grid is same for all coordinate systems
![Page 37: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/37.jpg)
37
uu
v
v
Grid is same for all coordinate systems
![Page 38: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/38.jpg)
38
Indexing Target Structures in Hash Table (3dSEARCH [Singh
2002]) Hash table: 3-D regular grid of cubic bins (~2Å)
For each target structureFor each pair of vectors (i,j)1. Compute a coordinate system2. Place an entry for each other vector
k into the bin containing the coordinates of the midpoint of the vector (or average of coordinates of origin, middle, and end points). Store ID of coordinate system + k’s orientation and type ( or ) in the entry.
Grid is sparsely occupied hash table A structure with n SSEs contributes n(n-1)(n-2)
entries. Each vector is represented (n-1)(n-2) times 10,000 structures with 10 SSEs each yield ~7M
entries
![Page 39: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/39.jpg)
39
Voting Using Hash TableGiven a query structure For each pair of vectors (i,j)
1. Compute a coordinate system2. For each other vector k
a. Retrieve the bin accessed by this vector and the neighboring bins
b. For every entry (vector) in those bins that has the same orientation and type as k, add a vote for the coordinate system stored in the entry
Sort target structures based on max number of votes received by any of its coordinate systems
Small number of target structures. Use LOCK for better alignment
Hours of pure LOCK are reduced to seconds
![Page 40: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/40.jpg)
40
Advantages of Voting System
Very efficient in practice for many-to-many comparisons
Can establish correspondence between partial, disconnected substructures
Parallel implementation is straightforward Independent of the order in which vectors
are considered
Drawback (?): May establish correspondences that do not satisfy the backbone sequence constraint
![Page 41: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/41.jpg)
41
Problem #4: Find Pharmacophore in
Ligands Given: Collection of N (= 5 to 10) small
flexible ligands with similar activity (binding at same sites)
Benzamidine binding to beta-Trypsin (3ptb)
Inhibitor binding to HIV protease
![Page 42: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/42.jpg)
42
![Page 43: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/43.jpg)
43
Problem #4: Find Pharmacophore in
Ligands Given: Collection of N (= 5 to 10) small
flexible ligands with similar activity (binding at same sites)
A set of low-energy conformations (dozens to few hundreds) for each ligand
![Page 44: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/44.jpg)
44
Problem #4: Find Pharmacophore in
Ligands Given: Collection of N (= 5 to 10) small
flexible ligands with similar activity (binding at same sites)
A set of low-energy conformations (dozens to few hundreds) for each ligand
Find a substructure (pharmacophore) that has a match in at least one conformation of each ligand
![Page 45: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/45.jpg)
45
![Page 46: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/46.jpg)
46
O
O
OH
![Page 47: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/47.jpg)
47
O
O
OH
![Page 48: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/48.jpg)
48
O
O
OH
O
O
OHpharmacophore
![Page 49: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/49.jpg)
49
Pharmacophore and Rational Drug Design
Pharmacophore identification is a form of “reverse engineering” to get a model of a binding site
A pharmacophore can be used to modify ligands into more potent drugs and/or to screen large databases of ligands for “leads”
![Page 50: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/50.jpg)
50
Three Simultaneous Problems
Conformations? Correspondence? Transform?
But ligands are small molecules
![Page 51: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/51.jpg)
51
Software
DISCO [Martin et al., 1993] DISCOtech and GASP [Tripos, Inc.] CATALYST and HIPHOP [Accelrys et al.;
Green et al., 1994; Barnum et al., 1996] RAPID
P.W. Finn, L.E. Kavraki, J.C. Latombe, R. Motwani, C. Shelton, S. Venkatasubramanian, and A. Yao. RAPID: Randomized Pharmacophore Identification for Drug Design. Computational Geometry: Theory and Applications, 10, pp. 263-272, 1998
![Page 52: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/52.jpg)
52
M2M1 M3
![Page 53: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/53.jpg)
53
Pairwise Comparison
Multi-Probe({M1,…,MN})
1) Extract invariants from M1 and M2 by calling Pair-Probe(P1,P2) on every pair of conformations of the two ligands
2) Test each candidate invariant S obtained at Step 1 against every ligand Mi, i = 3,…,N by calling Pair-Probe(S,P) on S and each conformation P of Mi
![Page 54: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/54.jpg)
54
Pair-Proben: smallest number of atoms/features in a ligand: given constant (0 < ≤ 1) P1 and P2: Conformations of two distinct ligands (or candidate invariant)
Pair-Probe(P1,P2)Perform s times:
1) Pick a triplet of atoms at random from P1
2) Determine three atoms in P2 congruent to this triplet; compute the alignment transform T
3) Iterate: Apply T to P2; determine the atoms in P1 matching those in P2; update T
4) If the number of matching atoms exceed n, then return this atom set as a candidate invariant S
![Page 55: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/55.jpg)
55
Magnitude of s
Pr[picking 3 atoms in invariant] 3
Pr[failing to find invariant] (1 3)s We want: (13)s
( is acceptable probability of failure) s ln()/ln(13) Since x < ln(1-x) for 0 < x < 1, we get:
s ln(1/)/3
For = 10-2 and = 0.3, we get s 180
![Page 56: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/56.jpg)
56
1TLP 4TMN 5TMN 6TMN
Some Results
63 to 69 atoms with 10 to 15 torsional degrees of freedom
Feature: every non-H atom ~30 features of 6 types(atom types)
Invariant in active conformations: 7-atom pharmacophore + 7-atom scaffolding
11 800 44 20 10 5 2 1 0 0 1 0 0#conf t(s) #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14
![Page 57: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/57.jpg)
57
Fuel for Thoughts
![Page 58: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/58.jpg)
58
Idea: Many-to-many correspondence may be more
robust
Example: Hausdorf distance
[Huttenlocher et al., 1993]
![Page 59: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/59.jpg)
59
Hausdorf Distance
Two sets of points A = {a1,...,an} and B = {b1,...,bm} in k
dH(A,B) = maxaA minbB ||a-b||
DH(A,B) = max {dH(A,B), dH(B,A)} Variation for shape similarity:
ΔH(A,B) = minT DH(A,T(B)) But efficient algorithms only exist for
planar sets of points
B
A
dH(A,B)
![Page 60: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/60.jpg)
60
Other Idea: Minimize cost of transforming A into B
Old idea: Graphics: Morphing distance
Computer vision: Earth Mover’s distance[Rubner, Tomasi, and Guibas, 1998]
Protein similarity: Isotopic distance [Erdmann, 2004]
![Page 61: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/61.jpg)
61
Structure Alignment Isotopies
Two curves are isotopic if one can be deformed into the other without self-collision
Example: Polygonal curve with n vertices
One may think of structure alignment as an isotopy deforming one structure into the other
Two structures are similar if the isotopy is “small”M.A. Erdmann. Protein Similarity from Knot Theory: GeometricConvolution and Line Weavings, CMU Tech. Rep. CMU-CS-04-138.
![Page 62: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/62.jpg)
62
“Small” Isotopy Model a structure as a set of polygonal lines (e.g.,
vertices are C atoms) Two structures A and B are (T,δ)-isotopic if there
exists an isotopy deforming A into T(B) in such a way that no vertices of A moves further away than some δ from its initial or final location
[Erdmann 2004]
![Page 63: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/63.jpg)
63
Similarity Measure
dT(A,B) = inf {δ | A is (T,δ)-isotopic to B}
d(A,B) = infT dT(A,B) d is computable [Erdmann,2004]
But as complex as path planning, hence exponential in the number of degrees of freedom
Possibility of approximating d using probabilistic roadmaps?
![Page 64: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/64.jpg)
64
Topology of Line Weavings
helix axes
1xis1nar
M.A. Erdmann. Protein Similarity from Knot Theory: GeometricConvolution and Line Weavings, CMU Tech. Rep. CMU-CS-04-138.
![Page 65: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/65.jpg)
65
![Page 66: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/66.jpg)
66
2 topologically equivalent line weavings
3 equivalent classes for 4 lines[Erdmann 2004]
![Page 67: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/67.jpg)
67
![Page 68: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/68.jpg)
68
Another (incorrect) alignment of 1xis and 1nar
![Page 69: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/69.jpg)
69
2 non-equivalent line weavings
2 equivalent classes for 3 lines
+
![Page 70: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/70.jpg)
70
Why topology is interesting?
Two conformations may be geometrically close (small RMSD) may require a long continuous deformation to map one into the other (without steric clashes)
![Page 71: Protein Structure Similarity](https://reader034.fdocuments.in/reader034/viewer/2022051517/56815aa2550346895dc82846/html5/thumbnails/71.jpg)
71
Conclusion Automatic computation of structure similarity is
essential due to the rapid growth of the PDB and other molecule (e.g., ligand) libraries
As the growth of new protein structures outpaces that of new folds, detecting structural similarity will have to be much more fine-grained than it is today
Biological discoveries will likely lie in local, possibly rare structure similarities, rather than in global fold-level classification
Need for better understanding of applications and radically new approaches
Still a lot of work ...