Introduction to Bio Statistics 2nd Edition R. Sokal F. Rohlf Statistics Biology
High Performance Computing on an IBM Cell Processor Team May08-24: Kyle Byerly Matt Rohlf Bryan...
-
Upload
camron-west -
Category
Documents
-
view
213 -
download
0
Transcript of High Performance Computing on an IBM Cell Processor Team May08-24: Kyle Byerly Matt Rohlf Bryan...
High Performance Computing on an IBM Cell Processor
Team May08-24: Kyle Byerly
Matt Rohlf
Bryan Venteicher
Shannon McCormick
Faculty Adviser:
Team Website:
Zhao Zhang
http://seniord.ece.iastate.edu/may0824
Introduction
Problem StatementBiological researchers are faced with ever increasing computational time due to the exponentially growing data needed to be processed. Currently commodity computing hardware is unable to provide adequate performance.
User InterfaceBiologists and bioinformaticists will use the ported application the same way they would use the original, using the command line.
Assumptions• User has access to a PlayStation 3
running Linux• User knows how to use original application
Operating Environment• Dry• Temperature controlled (less than 70° F)
Deliverables• Application ported to Cell/B.E.• Benchmarks to document performance
improvement
Project and Design Requirements
Design ObjectiveTo parallelize and port a BioPerf application to the PlayStation 3 so that it takes full advantage of the performance of the Cell/B.E.
Functional Requirements
Nonfunctional Requirements• Algorithm must be parallelizable• Data must be able to be stored in the
limited memory of the PlayStation 3• Must run faster than the original
Engineering SpecificationInput/Output:• Text of DNA sequence / Parsimonious treeHardware:• PlayStation 3, Cell/B.E.Software:• Fedora Linux, DNAPenny 3.6User Interface:• Command line
Design Method & Results
Design MethodTwo possible ways to parallelize DNAPenny will be explored:
•Parallelize entire algorithm•Parallelize performance-critical sectionof algorithm Test Plan
Created script to ensure ported application produces the same output as the original application with a wide variety of input files.
Resources & Work Breakdown
Work Breakdown Structure Financial Resources
Other resources• Open source software packages (gcc, gdb,
gprof, vim, gnuplot, ssh, bash, lxr, svn, viewvc, diff, cscope)
• BioPerf suite (CLUSTALW, DNAPenny, and many others)
• Sample input data from NCBI GenBank
Item w/ labor w/o labor
PlayStation 3 (donated) $0 $0
Estimated Labor(@ $10.00/hr)
$5645 $0
Totals $5645 $0
Closing Summary
The team has successfully ported DNAPenny to the Cell/B.E. The ported version of DNAPenny produces the same output for the same input faster than the original application running on a typical desktop PC. With the ported application, bioinformaticists will have a cheap and efficient way to analyze DNA sequences.
• Ported application shall run on the Cell/B.E. • Ported application shall return the same
results as the original application.• The running time of the ported application
shall be recorded for comparison to the original application.
The team believes that the Cell Broadband Engine (Cell/B.E.) found in the PlayStation 3 (PS3) will offer superior performance to commodity computing hardware at an affordable price. The team will port an application from the BioPerf suite to the Cell/B.E. running on Linux. BioPerf is a benchmark suite of representative bioinformatics applications for use with high-performance computing.
Proposed Concept Sketch / System Description
System Block DiagramThe system block diagram below shows an overview of the project. The same input data is fed to two versions of the applications – the original code and the ported version – and identical output data is produced at a faster rate.
29%23%
23%25%
Bryan Kyle Shannon Matt
Total Hours = 564.5
Benchmarking MethodsCreated script to time the execution of significant revisions of the ported application and the original application. An additional script calculates the average run time and automatically generates graphs of the results.
A few examples of the generated graphs are shown below.
Literature Survey• V. Sachdeva, M. Kistler, E. Speight and
T.-H. K. Tzeng, Exploring the Viability of the Cell Broadband Engine for Bioinformatics Applications, March 2007.
• R. Desaraju, A Parallel Implementation Of A Parsimony-Based Method For Phylogenetic Inference, May 2005
Risks• Proposed implementation may not be
faster• Other teams may complete the same work
before the team does
PrototypeDNAPenny was ported to the Cell/B.E. taking advantage of the parallel nature of the hardware. Another parallelized version of DNAPenny was created that runs on a standard desktop PC.
Parts / Vendor List•PlayStation 3 was provided by the department•All software used is open source
Test Procedure / Results•Execute script and verify output has not changed with the original output by using the diff utility.•The output did not change.