TEXTAL: A System for Automated Model Building Based on Pattern Recognition

29
TEXTAL: A System for Automated Model Building Based on Pattern Recognition Thomas R. Ioerger Department of Computer Science Texas A&M University

description

TEXTAL: A System for Automated Model Building Based on Pattern Recognition. Thomas R. Ioerger Department of Computer Science Texas A&M University. Main Stages of TEXTAL. electron density map. CAPRA. build-in side-chain and main-chain atoms locally around each CA. C-alpha chains. - PowerPoint PPT Presentation

Transcript of TEXTAL: A System for Automated Model Building Based on Pattern Recognition

Page 1: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

TEXTAL: A System for Automated Model Building Based on Pattern Recognition

Thomas R. IoergerDepartment of Computer Science

Texas A&M University

Page 2: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

Main Stages of TEXTALelectron density map

CAPRA

C-alpha chains

LOOKUP

model (initial coordinates)

model (final coordinates)

Post-processing routines

Reciprocal-spacerefinement/ML DM

HumanCrystallographer

(editing)

build-in side-chainand main-chain atoms

locally around each CA

example:real-spacerefinement

Page 3: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

F=<1.72,-0.39,1.04,1.55...> F=<1.58,0.18,1.09,-0.25...>

F=<0.90,0.65,-1.40,0.87...> F=<1.79,-0.43,0.88,1.52...>

Page 4: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

CAPRA:C-Alpha Pattern Recognition Algorithm

Page 5: TEXTAL: A System for Automated Model Building Based on Pattern Recognition
Page 6: TEXTAL: A System for Automated Model Building Based on Pattern Recognition
Page 7: TEXTAL: A System for Automated Model Building Based on Pattern Recognition
Page 8: TEXTAL: A System for Automated Model Building Based on Pattern Recognition
Page 9: TEXTAL: A System for Automated Model Building Based on Pattern Recognition
Page 10: TEXTAL: A System for Automated Model Building Based on Pattern Recognition
Page 11: TEXTAL: A System for Automated Model Building Based on Pattern Recognition
Page 12: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

Overview of CAPRA

• goal: predict CA chains from density map• not just “tracing” - more than Bones• desire 1:1 correspondence, ~3.8A apart• based on principles of pattern recognition

– use neural net to estimate which pseudo-atoms in trace “look” closest to true C-alphas

– use feature extraction to capture 3D patterns in density for input to neural net

– use other heuristics for “linking” together into chains, including geometric analysis (s.s.)

Page 13: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

CAPRA: C-Alpha Pattern-Recognition Algorithm

• Tracer - remove lattice points from map (lowest density first) without breaking connectivity

• Neural nework - for each pseudo atom, extract features, input to network, predict distances to CAs (1:10 in trace), trained on example points in real maps

• Linking - desire long chains, good CA predictions (not in side-chains), “structurally plausible” (e.g. linear, helical)

DensityTrace

NeuralNetwork

Linking intoC-alpha chains

pseudo atoms predictions ofdistance to true CA

map C-alphacoordinates

Page 14: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

Steps in CAPRA

Page 15: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

Examples of CAPRA Steps

Page 16: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

Tracer+ + + + ++ + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + +

+ + ++ + + + + + ++ + + + + + + + + + + + + + + + + + + +

+ + + + + + + + + + + + + + + +

Page 17: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

Neural Network

Page 18: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

Feature Extraction

• characterize 3D patterns in local density

• must be “rotation invariant”

• examples:– average density in region– standard deviation, kurtosis...– distance to center of mass– moments of inertia, ratios of moments– “spoke angles”

• calculated over spheres of 3A and 4A radius

Page 19: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

i

jijij biasoutwact ,

jactje

out

1

1

k

kkjjjj woutout ,)1(

ForwardPropagation:

BackwardPropagation:

kjkj outw ,

Page 20: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

Selection of Candidate C-alpha’s

• method:– pick candidates in order of lowest predicted

distance first,– among all pseudo-atoms in trace,– as long as not closer than 2.5A

• notes:– no 3.8A constraint; distance can be as high as 5A– don’t rely on branch points (though often near) – picked in random order throughout map– initially covers whole map, including side-chains

and disconnected regions (e.g. noise in solvent)

Page 21: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

Linking into Chains

• initial connectivity of CA candidates based on the trace

• “over-connected” graph - branches, cycles...

• start by computing connected components (islands, or clusters)

• two strategies:– for small clusters (<=20 candidates), find longest

internal chain with “good” atoms– for large clusters (>20 candidates), incrementally

clip branch points using heuristics

Page 22: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

Extracting Chains from Small Clusters

• exhaustive depth-first search of all paths

• scoring function:– length– penalty for inclusion of points with high

predicted distance to true CA by neural net– preference for following secondary structure

(locally straight or helical)

Page 23: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

Secondary Structure Analysis

• generate all 7-mers (connected fragments of candidate CAs of length 7)

• evaluate “straightness”– ratio of sum of link lengths to end-to-end distance– straightness>0.8 ==> potential beta-strand

• evaluate “helicity”– average absolute deviation of angles and torsions

along 7-mer from ideal values (95º and 50º)– helicity<20 ==> potential alpha-helix

Page 24: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

Handling Large Clusters

• start by breaking cycles (near “bad” atoms)

• clip links at branch points till only linear chains remain

• clip the most “obvious” links first, e.g.– if other two links are part of sec. struct.– if clipped branch has “bad” atom nearby– if clipped branch is small and other 2 are large

? ??

Page 25: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

Example of CA-chains for CzrA fit by CAPRA

Page 26: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

Results for MVK

Page 27: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

Results

protein PDB id final res method res used sec. str. sizeCzrA 2.3A MAD/MR 2.8A 94/104IF5a 1bkb 1.75A MAD 2.8A 136/139MVK 1kkh 2.4A MAD 2.4A 317/317PCAa 1l1e 2.0A MAD 2.8A 262/287P2 Myelin 1pmp 2.7A MIR 2.7A 131/131

protein % built RMS error # chains longest # ins/del cross-oversCzrA 84/104 (81)% 1.08A 5 53 0IF5a 127/136 (93%) 0.78A 4 52 0MVK 298/317 (95%) 0.83A 6 101 0PCAa 212/262 (81%) 0.89A 11 50 1P2 Myelin 111/131 (85%) 0.91A 6 63 2

Page 28: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

Availability

• Textal web site:– http://textal.tamu.edu:12321– server-side processing– free access to Capra– beta-testing of Textal

• To contact us, email: [email protected]

Page 29: TEXTAL: A System for Automated Model Building Based on Pattern Recognition

Acknowledgements

• Funding– National Institutes of Health– Welch Foundation

• People– Dr. James C. Sacchettini– The rest of the TEXTAL Group:

• Tod Romo

• Kreshna Gopal

• Reetal Pai