David Hoover Scientific Computing Branch, Division of Computer System Services CIT, NIH Swarms and...
-
date post
19-Dec-2015 -
Category
Documents
-
view
228 -
download
0
Transcript of David Hoover Scientific Computing Branch, Division of Computer System Services CIT, NIH Swarms and...
![Page 1: David Hoover Scientific Computing Branch, Division of Computer System Services CIT, NIH Swarms and Bundles: Bioinformatics and Biostatistics on Biowulf.](https://reader038.fdocuments.in/reader038/viewer/2022102622/56649d2c5503460f94a01bf3/html5/thumbnails/1.jpg)
David Hoover
Scientific Computing Branch, Division of Computer System Services
CIT, NIH
Swarms and Bundles: Bioinformatics and Biostatistics
on Biowulf
![Page 2: David Hoover Scientific Computing Branch, Division of Computer System Services CIT, NIH Swarms and Bundles: Bioinformatics and Biostatistics on Biowulf.](https://reader038.fdocuments.in/reader038/viewer/2022102622/56649d2c5503460f94a01bf3/html5/thumbnails/2.jpg)
Embarrassingly Parallel Problems
• GWAS, with huge numbers of SNPs
• Sequence analysis, assembly, and mapping
• Testing and validating statistical models
• Protein folding and threading
• Molecular docking and compound screening
• Tomographic reconstruction
![Page 3: David Hoover Scientific Computing Branch, Division of Computer System Services CIT, NIH Swarms and Bundles: Bioinformatics and Biostatistics on Biowulf.](https://reader038.fdocuments.in/reader038/viewer/2022102622/56649d2c5503460f94a01bf3/html5/thumbnails/3.jpg)
Tsai et al., Mol. Biochem. Parasitology, online preprint 2008
Protein folding calculations with Rosetta++100,000 cpu hours
Characterization of Surface Protein 3 from Malaria Parasite P. Falciparum
![Page 4: David Hoover Scientific Computing Branch, Division of Computer System Services CIT, NIH Swarms and Bundles: Bioinformatics and Biostatistics on Biowulf.](https://reader038.fdocuments.in/reader038/viewer/2022102622/56649d2c5503460f94a01bf3/html5/thumbnails/4.jpg)
How to run multiple independent processes in parallel
16 independent processes
input
command
output input output
command
![Page 5: David Hoover Scientific Computing Branch, Division of Computer System Services CIT, NIH Swarms and Bundles: Bioinformatics and Biostatistics on Biowulf.](https://reader038.fdocuments.in/reader038/viewer/2022102622/56649d2c5503460f94a01bf3/html5/thumbnails/5.jpg)
Biowulf Cluster Batch System
batch
job1
job1.out
script
batch
job16
job16.out
script
![Page 6: David Hoover Scientific Computing Branch, Division of Computer System Services CIT, NIH Swarms and Bundles: Bioinformatics and Biostatistics on Biowulf.](https://reader038.fdocuments.in/reader038/viewer/2022102622/56649d2c5503460f94a01bf3/html5/thumbnails/6.jpg)
Node 1 Node 2 Node 3 Node 4
job1 job2 job3 job4
job1.out job2.out job3.out job4.out
biowulf% swarm -f file
Swarm
![Page 7: David Hoover Scientific Computing Branch, Division of Computer System Services CIT, NIH Swarms and Bundles: Bioinformatics and Biostatistics on Biowulf.](https://reader038.fdocuments.in/reader038/viewer/2022102622/56649d2c5503460f94a01bf3/html5/thumbnails/7.jpg)
Node 1
job1
job1.out
biowulf% swarm -f file -b 4
Bundled Swarm
![Page 8: David Hoover Scientific Computing Branch, Division of Computer System Services CIT, NIH Swarms and Bundles: Bioinformatics and Biostatistics on Biowulf.](https://reader038.fdocuments.in/reader038/viewer/2022102622/56649d2c5503460f94a01bf3/html5/thumbnails/8.jpg)
Swarm Facts
• Written and maintained by Helix Systems Staff• swarm introduced in late 2000
• 82% of all batch jobs run on the cluster since 2002 are swarm jobs
• ~60% of all wall time spent on swarm jobs
• swarm has been shared with clusters around the world
![Page 9: David Hoover Scientific Computing Branch, Division of Computer System Services CIT, NIH Swarms and Bundles: Bioinformatics and Biostatistics on Biowulf.](https://reader038.fdocuments.in/reader038/viewer/2022102622/56649d2c5503460f94a01bf3/html5/thumbnails/9.jpg)
Swarm World Records
• Largest swarm: 683,445 commands
• Largest bundle: 24,000 commands per CPU
![Page 10: David Hoover Scientific Computing Branch, Division of Computer System Services CIT, NIH Swarms and Bundles: Bioinformatics and Biostatistics on Biowulf.](https://reader038.fdocuments.in/reader038/viewer/2022102622/56649d2c5503460f94a01bf3/html5/thumbnails/10.jpg)
Future Challenges
• How to deal with larger multicore nodes?
Node 1 Node 2 Node 3