GPU Computing for the SWAMP Sequence Alignmentssteinfa/files/occbio08_steinfadt...Parallel Sequence...
Transcript of GPU Computing for the SWAMP Sequence Alignmentssteinfa/files/occbio08_steinfadt...Parallel Sequence...
GPU Computing for the SWAMPSequence Alignment
OCCBIO 2008 -S. Steinfadt and J. Baker 1
GPU Computing for theSWAMP Sequence
Alignment
Shannon I. Steinfadt and Johnnie BakerParallel and Associative Computing Lab
Computer Science DepartmentKent State University
3rd Annual Ohio CollaborativeConference on Bioinformatics(OCCBIO ‘08)
Get It, Got It, Good (or Better)ctcgccgcgc ggcggacgct ccacgtgtcc cccgtctacc
gggccctcct ggctcccaac agcttctcag ttcccacttcHave This
Want This
Use This
NVIDIA C870 Tesla GPGPU518 Peak GFLOPS on Tesla170W peak, 120W typical
GPU Computing for the SWAMPSequence Alignment
OCCBIO 2008 -S. Steinfadt and J. Baker 2
Sequence Alignment
ctcgccgcgc ggcggacgct ccacgtgtcc cccgtctacc
gggccctcct ggctcccaac agcttctcag ttcccacttc
Given two sequences:DNA nucelotides {C,T,G,A} Amino Acids {a, r, n, d, c, q, e, g, h, i, l, k, m, f, p, s, t, w, y, v}
Align them to find the longest, mostcommon subsequence
Sequence AlignmentGiven two sequences:DNA nucelotides {C,T,G,A} Amino Acids {a, r, n, d, c, q, e, g, h, i, l, k, m, f, p, s, t, w, y, v}
Align them to find the longest, mostcommon subsequence
gcggacgct ccacg-tgtc--c --c- tcgccgcgc cc-cgtctacc
gggccct cctggctcccaac agc ttctcagttc ccacttc||:|:|| ||::|-|::|--| --|-||:|:|::| ||-|:||
Similar Characters Similar StructureSimilar Function
GPU Computing for the SWAMPSequence Alignment
OCCBIO 2008 -S. Steinfadt and J. Baker 3
Sequence Alignment
Similar CharactersSimilar FunctionSimilar Structure
Ancestral RelationshipsGene Functionality
Aid in Drug Discovery
Homologous Sequences
Aligning using Smith-WatermanAlgorithm
Cost KeyMatch +10
Miss -3
Insert a Gap -3Extend a Gap -1
Compare all possible combinations - but it hasdynamic programming data dependencies
GPU Computing for the SWAMPSequence Alignment
OCCBIO 2008 -S. Steinfadt and J. Baker 4
Aligning using Smith-WatermanAlgorithm
Cost KeyMatch +10
Miss -3
Insert a Gap -3Extend a Gap -1
Compare all possible combinations - but it hasdynamic programming data dependencies
Aligning using Smith-WatermanAlgorithm
Cost KeyMatch +10
Miss -3
Insert a Gap -3Extend a Gap -1
Compare all possible combinations - but it hasdynamic programming data dependencies
GPU Computing for the SWAMPSequence Alignment
OCCBIO 2008 -S. Steinfadt and J. Baker 5
Aligning using Smith-WatermanAlgorithm
Cost KeyMatch +10
Miss -3
Insert a Gap -3Extend a Gap -1
Compare all possible combinations - but it hasdynamic programming data dependencies
Aligning using Smith-WatermanAlgorithm
Cost KeyMatch +10
Miss -3
Insert a Gap -3Extend a Gap -1
Compare all possible combinations - but it hasdynamic programming data dependencies
GPU Computing for the SWAMPSequence Alignment
OCCBIO 2008 -S. Steinfadt and J. Baker 6
Aligning using Smith-WatermanAlgorithm
Cost KeyMatch +10
Miss -3
Insert a Gap -3Extend a Gap -1
Compare all possible combinations - but it hasdynamic programming data dependencies
Aligning using Smith-WatermanAlgorithm
Cost KeyMatch +10
Miss -3
Insert a Gap -3Extend a Gap -1
Compare all possible combinations - but it hasdynamic programming data dependencies
GPU Computing for the SWAMPSequence Alignment
OCCBIO 2008 -S. Steinfadt and J. Baker 7
Traceback in the Smith-WatermanAlgorithm
Cost KeyMatch +10
Miss -3
Insert a Gap -3Extend a Gap -1
1) Find the maximum computed value
Traceback in the Smith-WatermanAlgorithm
Cost KeyMatch +10
Miss -3
Insert a Gap -3Extend a Gap -1
Alignment:CATTGC - -TG
1) Find the maximum computed value2) Traceback until you reach ‘0’s
GPU Computing for the SWAMPSequence Alignment
OCCBIO 2008 -S. Steinfadt and J. Baker 8
Parallelizing the Algorithm
Parallelizing the Algorithm
GPU Computing for the SWAMPSequence Alignment
OCCBIO 2008 -S. Steinfadt and J. Baker 9
Parallelizing the Algorithm
Parallelizing the Algorithm
C
A
T
T
G
GPU Computing for the SWAMPSequence Alignment
OCCBIO 2008 -S. Steinfadt and J. Baker 10
SWAMP (Smith-Waterman usingAssociative Massive Parallelism)
Used PEs
Unused PEs
ASC: Associative Architecture
SIMD with special associative featuresFine-grained parallelism
Designed for fast associative searchesContent-based searches, not memory address
GPU Computing for the SWAMPSequence Alignment
OCCBIO 2008 -S. Steinfadt and J. Baker 11
ASC Advantages
Quick data movement in SIMD Move raw data in parallel At each step, PEs follow the algorithmic steps for data
movement in lock step
No message passing like MPI/PVM No store/forward No headers No explicit synchronizing
ASC: Associative Architecture
Very fast operations for:Finding Maximum / MinimumFinding if there are “Any Responders”“Pick One” active PE
GPU Computing for the SWAMPSequence Alignment
OCCBIO 2008 -S. Steinfadt and J. Baker 12
ASC on MetalASC
SIMD with Additional Features
ClearSpeed“SIMD” Accelerator (64-bit FP)
50 GFLOPS peak performance25W average power dissipation
Associative Functions
NVIDIA Tesla GPGPUStream Processor (32-bit FP)518 Peak GFLOPS on Tesla Series
170W peak, 120W typical
Associative Functions
Associative SearchSearch via Content, not
Memory Address
ASC on Metal
NVIDIA Tesla GPGPUStream Processor (32-bit FP)518 Peak GFLOPS on Tesla Series
170W peak, 120W typical
Associative Functions
Associative Functions
NVIDIA Tesla GPGPU x 2
GPU Computing for the SWAMPSequence Alignment
OCCBIO 2008 -S. Steinfadt and J. Baker 13
GPGPU Internal Organization
Multiple Levels ofParallelism Up to 512 threads
per block Communicate
through sharedmemory
Grids of threadblocks
SPMD ComputationModel All data processed
by the sameprogram (kernel)
From “Scalable Parallel Programming with CUDA.”From GPUs for Parallel Programming Vol. 6, No. 2 -March/April 2008 by John Nickolls, et. al.
ASC to GPGPU Mapping
PE Thread Local memory that belongs solely to PE / Thread
PE Interconnection Network Per-block SharedMemory
All PEs Block Limited here to 512 separate threads per block
Multiple ASC Model (MASC) GPGPU
Multiple Instruction Streams Multiple Blocks
Mulitple MASC programs Multiple Grids
ASC GPGPU
GPU Computing for the SWAMPSequence Alignment
OCCBIO 2008 -S. Steinfadt and J. Baker 14
Future Work
Work on extending SWAMP to SWAMP+, returningmultiple non-overlapping sequences during thetraceback
Questions ?
Contact Info:
Shannon Steinfadt
http://www.cs.kent.edu/~ssteinfa
GPU Computing for the SWAMPSequence Alignment
OCCBIO 2008 -S. Steinfadt and J. Baker 15
ReferencesCUDA Information
J. Nickolls, I. Buck, M. Garland, K. Skardon, “Scalable Parallel Programming withCUDA,” ACM Queue Magazine, pp. 41–53, March/April 2008.
Parallel Sequence Alignment
S. A Manavski and G. Valle, “CUDA Compatible GPU Cards as Efficient HardwareAccelerators for Smith-Waterman Sequence Alignment,” BMC Bioinformatics, March2008.
M. Farrar, “Striped Smith-Waterman Speeds Database Searches Six Times overOther SIMD Implementations,” Bioinformatics, pp. 156–161, Jan. 2007.
T. Rogens and E. Seeberg, “Six-fold speed-up of Smith-Waterman sequencedatabase searches using parallel processing on common microprocessors.”Bioinformatics 16(8): 699-706, 2000.
Others
W. Liu, B. Schmidt, G. Voss, A. Schröder, and W. Müller-Wittig, “Bio-SequenceDatabase Scanning on a GPU,” Proc. 20th IEEE Int'l Parallel and DistributedProcessing Symp. High Performance Computational Biology (HiCOMB) Workshop,2006.