Parallel Fine Sampling to Solve Large or Difficult Structures Manually exploring large parameter...

6
Parallel Fine Sampling to Solve Large or Difficult Structures Manually exploring large parameter space to find right combination of parameters is time-consuming and frustrating. It often results in giving up on an otherwise solvable structure. Parallel exploration of parameter space is an effective approach to solve challenging structures efficiently and reliably Systematically explore parameter space Speed up with parallel execution on PC cluster • xxx

Transcript of Parallel Fine Sampling to Solve Large or Difficult Structures Manually exploring large parameter...

Page 1: Parallel Fine Sampling to Solve Large or Difficult Structures Manually exploring large parameter space to find right combination of parameters is time-

Parallel Fine Sampling to Solve Large or Difficult Structures

• Manually exploring large parameter space to find right combination of parameters is time-consuming and frustrating. It often results in giving up on an otherwise solvable structure.

• Parallel exploration of parameter space is an effective approach to solve challenging structures efficiently and reliably– Systematically explore parameter space– Speed up with parallel execution on PC cluster

• xxx

Page 2: Parallel Fine Sampling to Solve Large or Difficult Structures Manually exploring large parameter space to find right combination of parameters is time-

Structures Solved by Fine Grid Search

Target Mol/ASU Sites/Mol Sites Space Group

Resolution

MB3864A 4 6 24 P43 2.65

PE000293D 6 9 54 H3 2.15

PD06751F 6 14 84 P21212 1.90

TB1547G 8 12 96 P212121 2.20

PC06751C 6 20 120 P3121 2.70

FJ5490C 12 6 72 P1 2.00

FH7599A* 12 16 196 C2 2.00

*work in progress

Page 3: Parallel Fine Sampling to Solve Large or Difficult Structures Manually exploring large parameter space to find right combination of parameters is time-

PD06751F• 454aa/15 Met, 1.9Å P21212,

hexamer in asu• Space group choices narrowed

down by systematic absence• 1080 SHELXD jobs (200 trials

each), parameters explored:– E value cutoff (1.1-1.5/0.1)– Number of sites (40-120/10)– Resolution cutoff (3.5-5.8/0.1)

• 3.5 hrs to finish all 1080 jobs on SDC cluster (220 CPUs)

• Of 1080 jobs, 39% find correct heavy atom solutions

• First correct solution within minutes, 84/84 sites found

Page 4: Parallel Fine Sampling to Solve Large or Difficult Structures Manually exploring large parameter space to find right combination of parameters is time-

PE00293D• 285aa/11 Met, H3, 2.15Å,

hexamer/asu, 2 wavelength MAD, PDB id: 2p10

• 760 SHELXD jobs (200 trials each), parameters explored:– E value cutoff (1.1-1.5/0.1)

– Number of sites (20-90/10)

– Resolution cutoff (4.0-5.8/0.1)

• 1 hrs to finish all 760 jobs on SDC cluster (220 CPUs)

• Solutions are rare, only 12 jobs (out of 760 jobs, 1.5%) find correct heavy atom solutions, 53/54 sites found

Page 5: Parallel Fine Sampling to Solve Large or Difficult Structures Manually exploring large parameter space to find right combination of parameters is time-

TB1547G• 409aa (13 Met)/monomer, P212121, 2

tetramers per asu• Initially labeled as something else (TB5131A,

179aa/2 Met)• Treated as an unknown target• POINTLESS and XPREP to narrow down

space group choices, XPREP to generate FA values

• SHELXD Grid search:– Sites 20-120 in step of 10– Resolution cutoff 3.3-4.5 in step of 0.1– E value cutoff from 1.1-1.5 in step of 0.1

• 520 parallel SHELXD jobs, each SHELXD job attempts 200 trials

• The job order is randomized to uniformly sample the search space initially

• Solutions appeared usually appears in minutes, so jobs can be terminated early if necessary

• Each SHELXD job needs ~1hrs, ~2 hrs for all jobs to finish on SDC cluster (220 CPUs)

• Interpretation of density map gave correct identification of the target

Page 6: Parallel Fine Sampling to Solve Large or Difficult Structures Manually exploring large parameter space to find right combination of parameters is time-

FH7599A: MR+MAD• Estimated 10-20 monomers per

asu, 100-300 Heavy atom sites• No highly homologous (>20%

seq id) MR models• FFAS or PSI-BLAST identified a

remote sequence homolog TM0064 (14% seq id)

• TM0064 trimer poly-alanine is used as MR model, use of the trimer as MR template significantly improved signal to noise in MR procedure

• Density modification is critical for improving MR phases

• Improved DM phases + MAD data to locate ~200 heavy atom sites and MAD phasing

rmsd 2.42 Å for 82% C

FH7599A vs TM0064