O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project...
-
date post
22-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project...
![Page 1: O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project Parallelizing BLAST Research Alliance of Minorities.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59fa8/html5/thumbnails/1.jpg)
OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY
Cluster Computing Applications ProjectCluster Computing Applications Project Parallelizing BLAST Parallelizing BLAST
Research Alliance of Minorities (RAM), Computer Science and Mathematics Division
William BurkeYork College, City University of New York
John Mugler and Stephen ScottOak Ridge National Laboratory
![Page 2: O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project Parallelizing BLAST Research Alliance of Minorities.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59fa8/html5/thumbnails/2.jpg)
OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY
Parallelizing the BLAST Algorithm: Parallelizing the BLAST Algorithm: Feasible or NotFeasible or Not??
Bioinformatics Research needs faster text string matching algorithms.
The purpose of this project is to analyze the BLAST algorithm: Define the structure of BLAST.
State why it is a valuable Bioinformatics tool.
Explore parallelizations of BLAST.
BLAST matches query string fragments against a target database. Eliminates need to run a full text string comparison.
Speeds up search database search time.
Several methods of parallelizing BLAST have been explored.
![Page 3: O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project Parallelizing BLAST Research Alliance of Minorities.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59fa8/html5/thumbnails/3.jpg)
OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY
Introduction
Cluster infrastructure
Open Source Cluster Application Resources (OSCAR)
Cluster, Command and Control (C3)
eXtreme TORC (XTORC)
Cluster applications
Bioinformatics Toolsets
Basic Local Alignment Sequence Tool (BLAST)
![Page 4: O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project Parallelizing BLAST Research Alliance of Minorities.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59fa8/html5/thumbnails/4.jpg)
OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY
Infrastructure Overview
Red Hat Linux 7.2
OSCAR 1.3C3 - http://www.csm.ornl.gov/torc/C3/ LAM/MPI - http://www.lam-mpi.org/ Maui Scheduler - http://supercluster.org/maui/ MPICH - http://www-unix.mcs.anl.gov/mpi/mpich/ OpenSSH - http://www.openssh.com/ OpenSSL - http://www.openssl.org/ PBS - http://www.openpbs.org/ PVM - http://www.csm.ornl.gov/pvm/ SIS - http://www.sisuite.org/
![Page 5: O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project Parallelizing BLAST Research Alliance of Minorities.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59fa8/html5/thumbnails/5.jpg)
OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY
Red Hat Linux 7.2
Installation
Configuration
AdministrationNetwork Configuration.
Performance Monitoring.
Creating Scripts.
![Page 6: O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project Parallelizing BLAST Research Alliance of Minorities.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59fa8/html5/thumbnails/6.jpg)
OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY
OSCAR 1.3 and C3 Tools
OSCAR configures the head node.
OSCAR builds and configures compute nodes.
C3 reduces time and effort to operate and manage a cluster.
![Page 7: O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project Parallelizing BLAST Research Alliance of Minorities.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59fa8/html5/thumbnails/7.jpg)
OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY
eXtreme TORC
eXtreme TORC powered by OSCAR•65 Pentium IV Machines
•Peak Performance: 129.7 GFLOPS
•RAM memory: 50.152 GB
•Disk Capacity: 2.68 TB
•Dual interconnects
–Gigabit & Fast Ethernet
![Page 8: O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project Parallelizing BLAST Research Alliance of Minorities.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59fa8/html5/thumbnails/8.jpg)
OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY
The field ofThe field of
needs faster stringneeds faster string
BioinformaticsBioinformatics
matching algorithmsmatching algorithms
![Page 9: O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project Parallelizing BLAST Research Alliance of Minorities.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59fa8/html5/thumbnails/9.jpg)
OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY
Applications Overview
BLAST a Bioinformatics tool.
http://www.ncbi.nlm.nih.gov/BLAST/blast_overview.html
Parallelize BLAST’s algorithm.
BLASBLASTT
![Page 10: O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project Parallelizing BLAST Research Alliance of Minorities.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59fa8/html5/thumbnails/10.jpg)
OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY
BLAST a Bioinformatics Tool
What is BLAST?
A heuristic algorithm used for string matching query strings to a database.
How does BLAST algorithm work?String fragmentation.
Statistical means for comparison.
How can you parallelize BLAST on a computational cluster?
![Page 11: O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project Parallelizing BLAST Research Alliance of Minorities.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59fa8/html5/thumbnails/11.jpg)
OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY
Query word (W = 3)
QUERY: GSVEDTTGSQSLAALLNKCKTPQGQRLVNQWIKWPLMDKNRIEERLNLVEAFVEDA
PQG 18
neighborhood PEG 15
words PRG 14
PKG 14
PMG 13 neighborhood
PSG 13 score threshold
PQN 12 ( T = 13 )
Etc...
QUERY STRING SLAALLNKCKTPQGQWLVNQWIKWPLMDKNRIEERLN 365
----L--++K-P-G--+-----+-------------N
n DATABASE STRING GSWNLAALDKDPMGDKNRIEERLNLVEAIKWPLMDJN 330
The BLAST Search Algorithm
![Page 12: O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project Parallelizing BLAST Research Alliance of Minorities.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59fa8/html5/thumbnails/12.jpg)
OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY
Parallelization of BLAST
NBLAST SLRI Bioinformatics Toolkit
ParAlign
MOBLAST
www.usenix.org/publications/library/proceedings/ als2000/michalickova.html
DNA sequence matching processor
PARALIGN™
![Page 13: O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project Parallelizing BLAST Research Alliance of Minorities.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59fa8/html5/thumbnails/13.jpg)
OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY
Conclusion
BLAST algorithm has a diverse family
of programs.
Several implementations exist for parallelizing the BLAST algorithm.
Future work to include further
exploration of the various parallelized
BLAST algorithms on clusters.
![Page 14: O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project Parallelizing BLAST Research Alliance of Minorities.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59fa8/html5/thumbnails/14.jpg)
OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY
Acknowledgements
I would like to extend my thanks to Stephen L. Scott,
John Mugler, Thomas Naughton, and Brian Luethke for
their invaluable mentoring, Michaelangelo Salcedo for
his guidance, Debbie McCoy and Cheryl Hamby for their
support in the RAM program.
![Page 15: O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project Parallelizing BLAST Research Alliance of Minorities.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59fa8/html5/thumbnails/15.jpg)
OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY
Disclaimer
This research was performed under the Research Alliance for Minorities Program administered through the Computer Science and Mathematics Division, Oak Ridge National Laboratory. This Program is sponsored by the Mathematical, Information, and Computational Sciences Division; Office of Advanced Scientific Computing Research; U.S. Department of Energy. Oak Ridge National Laboratory is managed by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR22725. This research used resources of the Center for Computational Sciences at Oak Ridge National Laboratory, which is supported by the Office of Science, U.S. Department of Energy. This work has been authored by a contractor of the U.S. Government under contract DE-AC05-00OR22725. Accordingly, the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes.