Identification of Domains using Structural Data
-
Upload
regan-kline -
Category
Documents
-
view
19 -
download
0
description
Transcript of Identification of Domains using Structural Data
![Page 1: Identification of Domains using Structural Data](https://reader031.fdocuments.in/reader031/viewer/2022020111/56812d94550346895d92b118/html5/thumbnails/1.jpg)
Identification of Domains using Structural Data
Niranjan Nagarajan
Department of Computer Science
Cornell University
![Page 2: Identification of Domains using Structural Data](https://reader031.fdocuments.in/reader031/viewer/2022020111/56812d94550346895d92b118/html5/thumbnails/2.jpg)
Assorted Definitions of Domains
• Subsequences that can fold independently into a stable structure.
• Structurally compact substructures.
• Functionally well-defined building blocks.
• Evolutionarily conserved and reused fragments.
![Page 3: Identification of Domains using Structural Data](https://reader031.fdocuments.in/reader031/viewer/2022020111/56812d94550346895d92b118/html5/thumbnails/3.jpg)
Protein Structural Domain Identification
William R. Taylor
![Page 4: Identification of Domains using Structural Data](https://reader031.fdocuments.in/reader031/viewer/2022020111/56812d94550346895d92b118/html5/thumbnails/4.jpg)
Basic Algorithm
• Initial Assignment of Labels– Sequential residue numbering
• Update of Labels
• Termination Condition– Mean squared deviation of average between
successive cycles < 10^-6 or number of iterations > (length of protein)/2
![Page 5: Identification of Domains using Structural Data](https://reader031.fdocuments.in/reader031/viewer/2022020111/56812d94550346895d92b118/html5/thumbnails/5.jpg)
Update Formula
• Sit+1 = Si
t + step(t+1)*sign(jf(Sit, Sj
t)) i.• sign(x) = 1 if x > 0, -1 if x < 0, 0 if x = 0.• f(Si
t, Sjt) =
– r/dij if Sjt > Si
t and dij < r.– -r/dij if Sj
t < Sit and dij < r.
– 0 otherwise.
• Step(x) = – 1 if x < N/2. – 2(N-x)/N if N/2 <= x < N. – 0 otherwise.
![Page 6: Identification of Domains using Structural Data](https://reader031.fdocuments.in/reader031/viewer/2022020111/56812d94550346895d92b118/html5/thumbnails/6.jpg)
Example
• Full lines indicate protein backbone.• Neighboring residues within radius r are connected by
dashed lines. • Connections between i and i + 2 have been omitted for
clarity.• Label evolution is done without inverse distance
weighting.
![Page 7: Identification of Domains using Structural Data](https://reader031.fdocuments.in/reader031/viewer/2022020111/56812d94550346895d92b118/html5/thumbnails/7.jpg)
Refinements
• Median based smoothing with a window size of 21 to reclaim short loops of 10 or less residues.
• Small domains reassigned by using the weighted mean values of its neighbors (weights are given using f.)
• Domain recalculation repeated for at most five times.
![Page 8: Identification of Domains using Structural Data](https://reader031.fdocuments.in/reader031/viewer/2022020111/56812d94550346895d92b118/html5/thumbnails/8.jpg)
Preserving -sheets
• Matrix B of possible -sheet interactions between residues generated based on distance data and heuristics.
• Weighted mean heuristic used to generate initial assignment of labels with the averaging being iterated to convergence.
• Post-processing also done to badly broken -sheets.
![Page 9: Identification of Domains using Structural Data](https://reader031.fdocuments.in/reader031/viewer/2022020111/56812d94550346895d92b118/html5/thumbnails/9.jpg)
Self-testing with fake homologs
• Fake homologs generated by smoothing– Replacing central atom of triple by average.– Process repeated five times.
• Domain assignments compared and similarity evaluated based on overlap score.
• r optimized for best overlap score.
![Page 10: Identification of Domains using Structural Data](https://reader031.fdocuments.in/reader031/viewer/2022020111/56812d94550346895d92b118/html5/thumbnails/10.jpg)
Extension to Multiple Structures
• Algorithm is simultaneously run on structures corresponding to a multiple sequence alignment.
• Labels are synchronized to the average of the labels at a position after each iteration.