Post on 17-Dec-2015
Distributed Computing for
Crystallographyexperiences and opportunities
Dr. Kenneth Shankland & Tom GriffinISIS Facility
CCLRC Rutherford Appleton Laboratory
STFC
Supercomputer
Expensive
State of the art
Good results
Dedicated
Cluster
Cheaper
Can easily expand
Dedicated
Distributed Grid
Cheaper
Increase with time
Can expand
Not dedicated
Many separate machines
Easy to use
Can easily expand
May be dedicated
Background – Parallel Computing Options
Spare cycles concept
• Typical PC usage is about 10%
• Usage minimal after 5pm
• Most desktop PCs are really fast
• Can we use (“steal?”) unused CPU
cycles to solve computational problems?
Suitable apps
•CPU Intensive
•Low to moderate memory use
•Licensing issues
•Not too much file output
•Coarse grained
•Command line / batch driven
The United Devices GridMP System
• Server hardware• Two, dual Xeon 2.8GHz servers RAID 10
• Software• Servers run RedHat Linux Advanced Server / DB2• Unlimited Windows (and other) clients
•Programming• Web Services interface – XML, SOAP• Accessed with C++ and Java
• Management Console• Web browser based• Can manage services, jobs, devices etc
• Large industrial user base•GSK, J&J, Novartis etc.
GridMP Platform Object ModelDocking
GOLDLigandfitMyDockTest ligands proteins
molec 1
molec m
protein 1
protein n
GOLD 2.0
WindowsLinux
gold20win.exegold20_rh.exe
Adapting a program for GridMP
1) Think about how to split your data
2) Wrap your executable
3) Write the application service• Pre and Post processing
4) Use the Grid
• Fairly easy to write
• Interface to grid via Web Services
• So far used: C++, Java, Perl, C# (any .Net language)
Package your executable
PROGRAM MODULEEXECUTABLE
Uploaded to, and residenton, the server
ExecutableDLLs Standard data
files Environmentvariables
Compress?
Encrypt? }
Create / run a jobPkg1 Pkg4Molecules Proteins
Pkg2 Pkg3
Create job, generatecross product
Datasets
Workunits
Clie
nt s
ide
Ser
ver
side
https://
Start job
Current status at ISIS
•218 registered devices
•321 total CPUs
• Potential power ~300Gflops (cf HPCx @ 500Gflops)
Application: Structures from powders
CT-DMF2: Solvated form of a polymorphic pharmaceutical from xtal screen– a=12.2870(7), b=8.3990(4),
c=37.021(2), β= 92.7830(10)
– V= 3816.0(4)
– P21/c, Z’=2 (Nfragments=6)
605040302010
2,200
2,000
1,800
1,600
1,400
1,200
1,000
800
600
400
200
Asymmetric unit
DASH
•Optimise molecular models against diffraction data
•Multi-solution simulated annealing
• Execute a number of SA runs (say 25) , pick the
best one
Grid adapt - straightforward
• Run GUI DASH as normal up to SA run point,
create .duff file
• Submit SA runs to GRID from own PCc:\dash-submit famot.grd
uploading data to server…
your job_id is 4300
•Retrieve and collate SA results
from GRID to your own PC
c:\dash-retrieve 4300
retrieving job data from server…
results stored in famot.dash
• View results as normal with DASH
Example of speedup
• Execute 80 SA runs on famotidine
with #SA moves set to 4 million
• Elapsed time 6hrs 40mins on 2.8GHz P4
• Elapsed time on grid 27 mins
• Speedup factor = 15 with only 24PCs
Calculations embody ca. 6 months of CPU time.
On our current grid, runs would be completed in ca. 20 hours.
Algorithm ‘sweet spot’
• Large run - McStas• Submit program breaks up –n#####
• Uploads new command line + data + executable
• Parameter scan, fixed neutron count• Send each run to a separate machine
Instrument simulation
Full diffraction simulation for HRPD
tof / microseconds60,00055,00050,00045,00040,00035,00030,000
No
rma
lise
d I
nte
nsi
ty
65
60
55
50
45
40
35
30
25
20
15
10
5
0
-5
cubic-ZrW2O8 100.00 %
tof / microseconds120,000100,00080,00060,00040,000
No
rma
lise
d I
nte
nsi
ty280
260
240
220
200
180
160
140
120
100
80
60
40
20
0
-20
-40
cubic-ZrW2O8 100.00 %
CalcObs (full MC simulation)Diff
5537 hours = 230 daysElapsed time =2.5 days
• Hardware – very few
• Software – a few, but excellent support
• Security concerns – encryption and tampering
• System administrators are suspicious of us !
• End user obtrusiveness• Perceived
• Real (memory grab with povray)
• Unanticipated
Problems / issues
Programs in the wild
clinical trial patientsside fx
general population{druginteractions
all connected PCs
test computer poolruns ok
{programinteractions