GPU Computing for the SWAMP Sequence Alignmentssteinfa/files/occbio08_steinfadt...Parallel Sequence...

15
GPU Computing for the SWAMP Sequence Alignment OCCBIO 2008 -S. Steinfadt and J. Baker 1 GPU Computing for the SWAMP Sequence Alignment Shannon I. Steinfadt and Johnnie Baker Parallel and Associative Computing Lab Computer Science Department Kent State University 3rd Annual Ohio Collaborative Conference on Bioinformatics (OCCBIO ‘08) Get It, Got It, Good ( or Better) ctcgccgcgc ggcggacgct ccacgtgtcc cccgtctacc gggccctcct ggctcccaac agcttctcag ttcccacttc Have This Want This Use This NVIDIA C870 Tesla GPGPU 518 Peak GFLOPS on Tesla 170W peak, 120W typical

Transcript of GPU Computing for the SWAMP Sequence Alignmentssteinfa/files/occbio08_steinfadt...Parallel Sequence...

Page 1: GPU Computing for the SWAMP Sequence Alignmentssteinfa/files/occbio08_steinfadt...Parallel Sequence Alignment S. A Manavski and G. Valle, “CUDA Compatible GPU Cards as Efficient

GPU Computing for the SWAMPSequence Alignment

OCCBIO 2008 -S. Steinfadt and J. Baker 1

GPU Computing for theSWAMP Sequence

Alignment

Shannon I. Steinfadt and Johnnie BakerParallel and Associative Computing Lab

Computer Science DepartmentKent State University

3rd Annual Ohio CollaborativeConference on Bioinformatics(OCCBIO ‘08)

Get It, Got It, Good (or Better)ctcgccgcgc ggcggacgct ccacgtgtcc cccgtctacc

gggccctcct ggctcccaac agcttctcag ttcccacttcHave This

Want This

Use This

NVIDIA C870 Tesla GPGPU518 Peak GFLOPS on Tesla170W peak, 120W typical

Page 2: GPU Computing for the SWAMP Sequence Alignmentssteinfa/files/occbio08_steinfadt...Parallel Sequence Alignment S. A Manavski and G. Valle, “CUDA Compatible GPU Cards as Efficient

GPU Computing for the SWAMPSequence Alignment

OCCBIO 2008 -S. Steinfadt and J. Baker 2

Sequence Alignment

ctcgccgcgc ggcggacgct ccacgtgtcc cccgtctacc

gggccctcct ggctcccaac agcttctcag ttcccacttc

Given two sequences:DNA nucelotides {C,T,G,A} Amino Acids {a, r, n, d, c, q, e, g, h, i, l, k, m, f, p, s, t, w, y, v}

Align them to find the longest, mostcommon subsequence

Sequence AlignmentGiven two sequences:DNA nucelotides {C,T,G,A} Amino Acids {a, r, n, d, c, q, e, g, h, i, l, k, m, f, p, s, t, w, y, v}

Align them to find the longest, mostcommon subsequence

gcggacgct ccacg-tgtc--c --c- tcgccgcgc cc-cgtctacc

gggccct cctggctcccaac agc ttctcagttc ccacttc||:|:|| ||::|-|::|--| --|-||:|:|::| ||-|:||

Similar Characters Similar StructureSimilar Function

Page 3: GPU Computing for the SWAMP Sequence Alignmentssteinfa/files/occbio08_steinfadt...Parallel Sequence Alignment S. A Manavski and G. Valle, “CUDA Compatible GPU Cards as Efficient

GPU Computing for the SWAMPSequence Alignment

OCCBIO 2008 -S. Steinfadt and J. Baker 3

Sequence Alignment

Similar CharactersSimilar FunctionSimilar Structure

Ancestral RelationshipsGene Functionality

Aid in Drug Discovery

Homologous Sequences

Aligning using Smith-WatermanAlgorithm

Cost KeyMatch +10

Miss -3

Insert a Gap -3Extend a Gap -1

Compare all possible combinations - but it hasdynamic programming data dependencies

Page 4: GPU Computing for the SWAMP Sequence Alignmentssteinfa/files/occbio08_steinfadt...Parallel Sequence Alignment S. A Manavski and G. Valle, “CUDA Compatible GPU Cards as Efficient

GPU Computing for the SWAMPSequence Alignment

OCCBIO 2008 -S. Steinfadt and J. Baker 4

Aligning using Smith-WatermanAlgorithm

Cost KeyMatch +10

Miss -3

Insert a Gap -3Extend a Gap -1

Compare all possible combinations - but it hasdynamic programming data dependencies

Aligning using Smith-WatermanAlgorithm

Cost KeyMatch +10

Miss -3

Insert a Gap -3Extend a Gap -1

Compare all possible combinations - but it hasdynamic programming data dependencies

Page 5: GPU Computing for the SWAMP Sequence Alignmentssteinfa/files/occbio08_steinfadt...Parallel Sequence Alignment S. A Manavski and G. Valle, “CUDA Compatible GPU Cards as Efficient

GPU Computing for the SWAMPSequence Alignment

OCCBIO 2008 -S. Steinfadt and J. Baker 5

Aligning using Smith-WatermanAlgorithm

Cost KeyMatch +10

Miss -3

Insert a Gap -3Extend a Gap -1

Compare all possible combinations - but it hasdynamic programming data dependencies

Aligning using Smith-WatermanAlgorithm

Cost KeyMatch +10

Miss -3

Insert a Gap -3Extend a Gap -1

Compare all possible combinations - but it hasdynamic programming data dependencies

Page 6: GPU Computing for the SWAMP Sequence Alignmentssteinfa/files/occbio08_steinfadt...Parallel Sequence Alignment S. A Manavski and G. Valle, “CUDA Compatible GPU Cards as Efficient

GPU Computing for the SWAMPSequence Alignment

OCCBIO 2008 -S. Steinfadt and J. Baker 6

Aligning using Smith-WatermanAlgorithm

Cost KeyMatch +10

Miss -3

Insert a Gap -3Extend a Gap -1

Compare all possible combinations - but it hasdynamic programming data dependencies

Aligning using Smith-WatermanAlgorithm

Cost KeyMatch +10

Miss -3

Insert a Gap -3Extend a Gap -1

Compare all possible combinations - but it hasdynamic programming data dependencies

Page 7: GPU Computing for the SWAMP Sequence Alignmentssteinfa/files/occbio08_steinfadt...Parallel Sequence Alignment S. A Manavski and G. Valle, “CUDA Compatible GPU Cards as Efficient

GPU Computing for the SWAMPSequence Alignment

OCCBIO 2008 -S. Steinfadt and J. Baker 7

Traceback in the Smith-WatermanAlgorithm

Cost KeyMatch +10

Miss -3

Insert a Gap -3Extend a Gap -1

1) Find the maximum computed value

Traceback in the Smith-WatermanAlgorithm

Cost KeyMatch +10

Miss -3

Insert a Gap -3Extend a Gap -1

Alignment:CATTGC - -TG

1) Find the maximum computed value2) Traceback until you reach ‘0’s

Page 8: GPU Computing for the SWAMP Sequence Alignmentssteinfa/files/occbio08_steinfadt...Parallel Sequence Alignment S. A Manavski and G. Valle, “CUDA Compatible GPU Cards as Efficient

GPU Computing for the SWAMPSequence Alignment

OCCBIO 2008 -S. Steinfadt and J. Baker 8

Parallelizing the Algorithm

Parallelizing the Algorithm

Page 9: GPU Computing for the SWAMP Sequence Alignmentssteinfa/files/occbio08_steinfadt...Parallel Sequence Alignment S. A Manavski and G. Valle, “CUDA Compatible GPU Cards as Efficient

GPU Computing for the SWAMPSequence Alignment

OCCBIO 2008 -S. Steinfadt and J. Baker 9

Parallelizing the Algorithm

Parallelizing the Algorithm

C

A

T

T

G

Page 10: GPU Computing for the SWAMP Sequence Alignmentssteinfa/files/occbio08_steinfadt...Parallel Sequence Alignment S. A Manavski and G. Valle, “CUDA Compatible GPU Cards as Efficient

GPU Computing for the SWAMPSequence Alignment

OCCBIO 2008 -S. Steinfadt and J. Baker 10

SWAMP (Smith-Waterman usingAssociative Massive Parallelism)

Used PEs

Unused PEs

ASC: Associative Architecture

SIMD with special associative featuresFine-grained parallelism

Designed for fast associative searchesContent-based searches, not memory address

Page 11: GPU Computing for the SWAMP Sequence Alignmentssteinfa/files/occbio08_steinfadt...Parallel Sequence Alignment S. A Manavski and G. Valle, “CUDA Compatible GPU Cards as Efficient

GPU Computing for the SWAMPSequence Alignment

OCCBIO 2008 -S. Steinfadt and J. Baker 11

ASC Advantages

Quick data movement in SIMD Move raw data in parallel At each step, PEs follow the algorithmic steps for data

movement in lock step

No message passing like MPI/PVM No store/forward No headers No explicit synchronizing

ASC: Associative Architecture

Very fast operations for:Finding Maximum / MinimumFinding if there are “Any Responders”“Pick One” active PE

Page 12: GPU Computing for the SWAMP Sequence Alignmentssteinfa/files/occbio08_steinfadt...Parallel Sequence Alignment S. A Manavski and G. Valle, “CUDA Compatible GPU Cards as Efficient

GPU Computing for the SWAMPSequence Alignment

OCCBIO 2008 -S. Steinfadt and J. Baker 12

ASC on MetalASC

SIMD with Additional Features

ClearSpeed“SIMD” Accelerator (64-bit FP)

50 GFLOPS peak performance25W average power dissipation

Associative Functions

NVIDIA Tesla GPGPUStream Processor (32-bit FP)518 Peak GFLOPS on Tesla Series

170W peak, 120W typical

Associative Functions

Associative SearchSearch via Content, not

Memory Address

ASC on Metal

NVIDIA Tesla GPGPUStream Processor (32-bit FP)518 Peak GFLOPS on Tesla Series

170W peak, 120W typical

Associative Functions

Associative Functions

NVIDIA Tesla GPGPU x 2

Page 13: GPU Computing for the SWAMP Sequence Alignmentssteinfa/files/occbio08_steinfadt...Parallel Sequence Alignment S. A Manavski and G. Valle, “CUDA Compatible GPU Cards as Efficient

GPU Computing for the SWAMPSequence Alignment

OCCBIO 2008 -S. Steinfadt and J. Baker 13

GPGPU Internal Organization

Multiple Levels ofParallelism Up to 512 threads

per block Communicate

through sharedmemory

Grids of threadblocks

SPMD ComputationModel All data processed

by the sameprogram (kernel)

From “Scalable Parallel Programming with CUDA.”From GPUs for Parallel Programming Vol. 6, No. 2 -March/April 2008 by John Nickolls, et. al.

ASC to GPGPU Mapping

PE Thread Local memory that belongs solely to PE / Thread

PE Interconnection Network Per-block SharedMemory

All PEs Block Limited here to 512 separate threads per block

Multiple ASC Model (MASC) GPGPU

Multiple Instruction Streams Multiple Blocks

Mulitple MASC programs Multiple Grids

ASC GPGPU

Page 14: GPU Computing for the SWAMP Sequence Alignmentssteinfa/files/occbio08_steinfadt...Parallel Sequence Alignment S. A Manavski and G. Valle, “CUDA Compatible GPU Cards as Efficient

GPU Computing for the SWAMPSequence Alignment

OCCBIO 2008 -S. Steinfadt and J. Baker 14

Future Work

Work on extending SWAMP to SWAMP+, returningmultiple non-overlapping sequences during thetraceback

Questions ?

Contact Info:

Shannon Steinfadt

[email protected]

http://www.cs.kent.edu/~ssteinfa

Page 15: GPU Computing for the SWAMP Sequence Alignmentssteinfa/files/occbio08_steinfadt...Parallel Sequence Alignment S. A Manavski and G. Valle, “CUDA Compatible GPU Cards as Efficient

GPU Computing for the SWAMPSequence Alignment

OCCBIO 2008 -S. Steinfadt and J. Baker 15

ReferencesCUDA Information

J. Nickolls, I. Buck, M. Garland, K. Skardon, “Scalable Parallel Programming withCUDA,” ACM Queue Magazine, pp. 41–53, March/April 2008.

Parallel Sequence Alignment

S. A Manavski and G. Valle, “CUDA Compatible GPU Cards as Efficient HardwareAccelerators for Smith-Waterman Sequence Alignment,” BMC Bioinformatics, March2008.

M. Farrar, “Striped Smith-Waterman Speeds Database Searches Six Times overOther SIMD Implementations,” Bioinformatics, pp. 156–161, Jan. 2007.

T. Rogens and E. Seeberg, “Six-fold speed-up of Smith-Waterman sequencedatabase searches using parallel processing on common microprocessors.”Bioinformatics 16(8): 699-706, 2000.

Others

W. Liu, B. Schmidt, G. Voss, A. Schröder, and W. Müller-Wittig, “Bio-SequenceDatabase Scanning on a GPU,” Proc. 20th IEEE Int'l Parallel and DistributedProcessing Symp. High Performance Computational Biology (HiCOMB) Workshop,2006.