Introduction to SeqAn, an Open-source C++ Template Library
-
Upload
can-ozdoruk -
Category
Technology
-
view
663 -
download
1
description
Transcript of Introduction to SeqAn, an Open-source C++ Template Library
![Page 1: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/1.jpg)
Sign up for FREE GPU Test Drive on remotely hosted clusters
Develop your codes on latest GPUs today
Test Drive NVIDIA GPUs! Experience The Acceleration
www.nvidia.com/GPUTestDrive
![Page 2: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/2.jpg)
Prof. Dr. Knut Reinert Algorithmische Bioinformatik, FB Mathematik und Informatik
Intro to SeqAn An Open-Source C++ template library for biological sequence analysis Knut Reinert, David Weese Freie Universität Berlin Berlin Institute for Computer Science
![Page 3: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/3.jpg)
3
This talk
Why SeqAn?
SeqAn as SDK
Generic Parallelization
SeqAn concept/content
![Page 4: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/4.jpg)
4 Nvidia Webinar, 22.10.2013
~ 15 years ago...
Data volume and cost: In 2000 the 3 billion base pairs of the human genome were sequenced for about 3 billion US$ Dollar 100 million bp per day
![Page 5: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/5.jpg)
5 Nvidia Webinar, 22.10.2013
Sequencing today...
Within roughly ten years sequencing has become about 10 million times cheaper
Illumina HiSeq 100 Billion bps per DAY
![Page 6: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/6.jpg)
6 Nvidia Webinar, 22.10.2013
Future of NGS data analysis
![Page 7: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/7.jpg)
7 Nvidia Webinar, 22.10.2013
Software libraries bridge gap
Theoretical Considerations
Algorithm design
Prototype implementation
Maintainable tool
Analysis pipelines
Computer Scientists
Experimentalists
Algorithm libraries
RNA-Seq
ChIP-Seq
Structural variants Metagenomics abundance
Sequence assembly Cancer genomics
FM-index
Suffix arrays
Multicore
Hardware acceleration
K-mer filter
Fast I/O
Secondary memory
![Page 8: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/8.jpg)
8 Nvidia Webinar, 22.10.2013
SeqAn Now SeqAn/SeqAn tools have been cited more
than 360 times Among the institutions are (omitting German institutes): Department of Genetics, Harvard Medical School, Boston, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, J. Craig Venter Institute, Rockville MD, USA, Department of Molecular Biology, Princeton University, Applied Mathematics Program, Yale University, New Haven, IBM T.J. Watson Research Center, Yorktown Heights, The Ohio State University, Columbus, University of Minnesota, Australian National University, Canberra, Department of Statistics, University of Oxford, Swedish University of Agricultural Sciences (SLU), Uppsala, Graduate School of Life Sciences, University of Cambridge, Broad Institute, Cambridge, USA, EMBL-EBI, University of California, University of Chicago, Iowa State University, Ames, The Pennsylvania State University, Peking University, Beijing University of Science and Technology of China, BGI-Shenzhen, China, Beijing Institute of Genomics……
Is under BSD license and hence free for academic AND commercial use.
![Page 9: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/9.jpg)
9 Nvidia Webinar, 22.10.2013
SeqAn developers
0
2
4
6
8
10
12
14
16
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
External CSC BMBF DFG IMPRS FU
![Page 10: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/10.jpg)
10 Nvidia Webinar, 22.10.2013
SeqAn main concepts
![Page 11: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/11.jpg)
11 Nvidia Webinar, 22.10.2013
length(str)
Value<T>::Type
String<Subclass>
![Page 12: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/12.jpg)
12 Nvidia Webinar, 22.10.2013
void swap(string & str) { char help = str[1]; str[1] = str[0]; str[0] = help;
}
![Page 13: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/13.jpg)
13 Nvidia Webinar, 22.10.2013
template <typename T> void swap(T & str) { char help = str[1]; str[1] = str[0]; str[0] = help;
}
![Page 14: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/14.jpg)
14 Nvidia Webinar, 22.10.2013
template <typename T> void swap(T & str) { char help = str[1]; str[1] = str[0]; str[0] = help;
}
![Page 15: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/15.jpg)
15 Nvidia Webinar, 22.10.2013
template <typename T> void swap(String<T> & str) { T help = str[1]; str[1] = str[0]; str[0] = help;
}
![Page 16: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/16.jpg)
16 Nvidia Webinar, 22.10.2013
template <typename T> void swap(T & str) { T::value_type help = str[1]; str[1] = str[0]; str[0] = help;
}
![Page 17: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/17.jpg)
17 Nvidia Webinar, 22.10.2013
template <typename T> void swap(T & str) { Value<T>::Type help = str[1]; str[1] = str[0]; str[0] = help;
}
![Page 18: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/18.jpg)
18 Nvidia Webinar, 22.10.2013
template <typename T> struct Value { typedef T Type; };
Metafunction
![Page 19: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/19.jpg)
19 Nvidia Webinar, 22.10.2013
template <typename T> struct Value< String<T> > { typedef T Type; };
template <typename T> struct Value { typedef T Type; };
![Page 20: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/20.jpg)
20 Nvidia Webinar, 22.10.2013
template <typename T> struct Value< String<T> > { typedef T Type; };
template <typename T> struct Value { typedef T Type; };
template < > struct Value< char * > { typedef char Type; };
![Page 21: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/21.jpg)
21 Nvidia Webinar, 22.10.2013
template < > struct Value< char * > { typedef char Type; };
template <typename T> struct Value< String<T> > { typedef T Type; };
template < t_size N > struct Value< char [N] > { typedef char Type; };
![Page 22: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/22.jpg)
22 Nvidia Webinar, 22.10.2013
template <typename T> void swap(T & str) { Value<T>::Type help = str[1]; str[1] = str[0]; str[0] = help;
}
![Page 23: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/23.jpg)
23 Nvidia Webinar, 22.10.2013
template <typename T> void swap(T & str) { Value<T>::Type help = str[1]; str[1] = str[0]; str[0] = help;
}
![Page 24: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/24.jpg)
24 Nvidia Webinar, 22.10.2013
template <typename T> void swap(T & str) { Value<T>::Type help = value(str,1); value(str,1) = value(str,0); value(str,0) = help;
}
![Page 25: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/25.jpg)
25 Nvidia Webinar, 22.10.2013
template <typename T> Value<T> & value( T & str, int i) { return str[i]; };
Shim Function
![Page 26: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/26.jpg)
26 Nvidia Webinar, 22.10.2013
template <typename T> void swap(T & str) { Value<T>::Type help = value(str,1); value(str,1) = value(str,0); value(str,0) = help;
}
Generic Algorithm
![Page 27: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/27.jpg)
27 Nvidia Webinar, 22.10.2013
SeqAn Content - SDK
![Page 28: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/28.jpg)
28 Nvidia Webinar, 22.10.2013
SeqAn SDK Components - Tutorials
![Page 29: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/29.jpg)
29 Nvidia Webinar, 22.10.2013
SeqAn SDK Components – Reference Manual
![Page 30: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/30.jpg)
30 Nvidia Webinar, 22.10.2013
SeqAn SDK Components
CDash/CTest to automatically compile and test across platforms Review Board to ensure code quality Code coverage reports
![Page 31: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/31.jpg)
31 Nvidia Webinar, 22.10.2013
SeqAn Content algorithms & data structures
![Page 32: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/32.jpg)
32 Nvidia Webinar, 22.10.2013
Standard DP-Algorithms Global & Semi Global Alignments Local Alignments
Modified DP-Algorithms Split Breakpoint Detection Banded Chain Alignment
Unified Alignment Algorithms
For Example ...
Versatile & Extensible DP-Interface
![Page 33: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/33.jpg)
33 Nvidia Webinar, 22.10.2013
Unified Alignment Algorithms For Example ...
Banded Smith-Waterman with Affine Gap Costs: DPBand<BandOn>(lowerDiag, upperDiag),
DPProfile<LocalAlignment<>, AffineGaps, TracebackOn<> >
Semi-Global Gotoh without Traceback: DPProfile<GlobalAlignment<FreeEndGaps<True, False, True, False> >,
AffineGaps, TracebackOff>
Split-Breakpoint Detection for Right Anchor: DPProfile<SplitAlignment<>, AffineGaps, TracebackOn<GapsRight> >
Needleman-Wunsch with Traceback: DPProfile<GlobalAlignment<>, LinearGaps, TracebackOn<> >
![Page 34: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/34.jpg)
34 Nvidia Webinar, 22.10.2013
Support for Common File Formats Important file formats for HTS analysis
Sequences FASTA, FASTQ Indexed FASTA (FAI) for random access
Genomic Features GFF 2, GFF 3, GTF, BED
Read Mapping SAM, BAM (plus BAM indices)
Variants VCF
… or write your own parser
Tutorials and helper routines for writing your own parsers.
SequenceStream ss(“file.fa.gz”); while (!atEnd(ss)) { readRecord(id, seq, ss); cout << id << '\t' << seq << '\n'; }
BamStream bs(“file.bam”); while (!atEnd(bs)) { readRecord(record, bs); cout << record.qName << '\t' << record.pos << '\n’; }
![Page 35: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/35.jpg)
35 Nvidia Webinar, 22.10.2013
Journaled Sequences
Store Multiple Genomes Save Storage Capacities
StringSet<TJournaled, Owner<JournalSet> > set;
setGlobalReference(set, refSeq);
appendValue(set, seq1);
join(set, idx, JoinConfig<>());
String<Dna, Journaled<Alloc<> > > ���
G1:
G2:
GN:
Ref:
���
![Page 36: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/36.jpg)
36 Nvidia Webinar, 22.10.2013
Fragment Store (Multi) Read Alignments
Read alignments can be easily imported: … and accessed as a multiple alignment, e.g. for visualization:
std::ifstream file("ex1.sam"); read(file, store, Sam());
AlignedReadLayout layout; layoutAlignment(layout, store); printAlignment(svgFile, Raw(), layout, store, 1, 0, 150, 0, 36);
![Page 37: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/37.jpg)
37 Nvidia Webinar, 22.10.2013
Unified Full-‐Text Indexing Framework Available Indices
All indices support multiple strings and external memory construction/usage.
Index<TSeq, IndexEsa<> > Index<StringSet<TSeq>, FMIndex<> >
Suffix Trees: • suffix array • enhanced suffix array • lazy suffix tree
Prefix Trie: • FM-index
q-Gram Indices: • direct addressing • open addressing • gapped
All indices support the (sequential) find interface:
Finder<TIndex> finder(index); while (find(finder, "TATAA")) cout << "Hit at position" << position(finder) << endl;
Index Lookup Interface
![Page 38: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/38.jpg)
38 Nvidia Webinar, 22.10.2013
SeqAn Performance
![Page 39: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/39.jpg)
39 Nvidia Webinar, 22.10.2013
Masai read mapper
![Page 40: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/40.jpg)
40 Nvidia Webinar, 22.10.2013
Algorithm is based on the simultaneous traversal of two string indices (e.g., FM-‐index, Enhanced suffix array, Lazy suffix tree)
ACGCTTCATCGCCCT…
Index of reads (Radix tree of seeds)
Index of genome (e.g. FM-‐index)
Reads
Chr. 2 Chr. 1
Chr. X
Genome
Masai read mapper
![Page 41: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/41.jpg)
41 Nvidia Webinar, 22.10.2013
Read Mapping: Masai
Faster and more accurate than BWA and BowLe2 Timings on a single core
![Page 42: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/42.jpg)
42 Nvidia Webinar, 22.10.2013
Easily exchange index….
![Page 43: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/43.jpg)
43 Nvidia Webinar, 22.10.2013
Collaboration to parallelize indices and verification algorithms in SeqAn, to speed up any applications making use of indices
What about multi-core implementation?
![Page 44: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/44.jpg)
44 Nvidia Webinar, 22.10.2013
SeqAn going parallel
GOAL Parallelize the finder interface of SeqAn
so it works on CPU and accelerators like GPU
Will be replaced by hg18 and 10 million 20-‐mers
![Page 45: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/45.jpg)
45 Nvidia Webinar, 22.10.2013
SeqAn going parallel
Construct FM-‐index on reverse genome
Set # OMP threads Call generic count funcLon
![Page 46: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/46.jpg)
46 Nvidia Webinar, 22.10.2013
SeqAn going parallel : NVIDIA GPUs
SAME count funcLon as on CPU !
Copy needles and index to GPU
![Page 47: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/47.jpg)
47 Nvidia Webinar, 22.10.2013
…12... 2.66 sec
18.6 sec 1 X
Intel Xeon Phi 7120, 244 threads
2.18 sec
SeqAn going parallel
Count occurrences of 10 million 20-‐mers in the human genome using an FM-‐index
47 X
7 X
NVIDIA Tesla K20
I7,3.2 GHz
8.5 X
0.4 s
![Page 48: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/48.jpg)
48 Nvidia Webinar, 22.10.2013
66.1 s
…12...
1 X
SeqAn going parallel
Approx. count occurrences of 1.2 million 33-‐mers in the human genome using an FM-‐index
20.7 X
7.3 X
NVIDIA Tesla K20
I7,3.2 GHz
16.9 X
9.0 s
3.9 s
3.2 s
Intel Xeon Phi 7120, 244 threads
![Page 49: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/49.jpg)
49 Nvidia Webinar, 22.10.2013
Part II: The details
![Page 50: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/50.jpg)
Parallelization on the GPU
Nvidia Webinar, 22.10.2013
![Page 51: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/51.jpg)
CUDA preliminaries
Nvidia Webinar, 22.10.2013
In order to use CUDA we first had to adapt some parts of SeqAn:
• CUDA requires each funcLon to be prefixed with domain qualifiers __host__ or __device__ in order to generate CPU/GPU code
• We prefixed all basic template funcLons with a SEQAN_HOST_DEVICE macro
• StaLc const arrays are not allowed in the way SeqAn defines them
• We replaced alphabet conversion lookup tables (e.g. Dna<--> char) by conversion funcLons
#ifdef __CUDACC__ !#define SEQAN_HOST_DEVICE inline __device__ __host__ !#else!#define SEQAN_HOST_DEVICE inline !#endif!
![Page 52: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/52.jpg)
• Instead of defining a new CUDA string we simply use the Thrust library:
• Provides host_vector and device_vector classes, which are vectors with buffers in host or device memory
• However, Thrust funcLons are callable only from host-‐side
• We made both vectors accessible from SeqAn
• SeqAn strings have to provide a set of global (meta-‐)funcLons, e.g. Value<>, resize(), …
• We simply defined the required wrapper funcLons for these two vectors
Strings
Nvidia Webinar, 22.10.2013
![Page 53: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/53.jpg)
Standard Strings
• Up to here, all strings can only be used on the side of their scope
Nvidia Webinar, 22.10.2013
Device Memory Host Memory
thrust::host_vector! Buffer
seqan::String ! Buffer seqan::String ! Buffer
thrust::device_vector! Buffer
![Page 54: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/54.jpg)
• How to access a device_vector from device-‐side?
• We could pass (POD) iterators to the kernel
• However, many SeqAn algorithms work on more complex containers
• We need the same interface of the container on the device side
• For strings we developed a so-‐called ContainerView (POD type)
• Provides a container interface given the begin/end pointers of vector buffer
• The view() funcLon creates the ContainerView object for a given device_vector!
Host-Device String
Nvidia Webinar, 22.10.2013
![Page 55: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/55.jpg)
Host-Device String
Nvidia Webinar, 22.10.2013
Device Memory Host Memory
thrust::device_vector! Buffer
view() !
seqan::ContainerView! seqan::ContainerView!kernel launch !
• How to use a device_vector on the device
![Page 56: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/56.jpg)
• For generic GPU programming:
• The Device metafuncLon returns the device-‐memory equivalent of a class
• The View metafuncLon returns the (POD) view type of a class
// Replaces String with thrust::device_vector. !template <typename TValue, typename TSpec> !struct Device<String<TValue, TSpec> > !{ ! typedef thrust::device_vector<TValue> Type; !}; !
// Returns a view type that can be passed to a CUDA kernel.!template <typename TValue, typename TAlloc> !struct View<thrust::device_vector<TValue, TAlloc> > !{ ! typedef ContainerView<thrust::device_vector<TValue, TAlloc> > Type; !}; !
Device and View metafunctions
Nvidia Webinar, 22.10.2013
![Page 57: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/57.jpg)
• A simple example to reverse a string on the GPU
// A standard SeqAn string over the Dna alphabet.!String<Dna> myString = "ACGT"; !!// A Dna string on device global memory.!typename Device<String<Dna> >::Type myDeviceString; !!// Copy the string to global memory.!assign(myDeviceString, myString); !!// Pass a view of the device string to the CUDA kernel.!myKernel<<<1,1>>>(view(myDeviceString)); !!// TString is ContainerView<device_vector<Dna> >.!template <typename TString> !__global__ void myKernel(TString string) !{ ! printf(”length(string) = %d\n", length(string)); ! reverse(string); !} !
Hello world
Nvidia Webinar, 22.10.2013
![Page 58: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/58.jpg)
• More complex structures (e.g. Index, Graph) can only be ported to the GPU if they …
• don’t use pointers
• use only strings of POD types (String<Dna>, but not String<String<…> >)
• use only 1-‐dimensional StringSets (ConcatDirect)
• Nested classes are no problem
• View metafuncLon converts all member types into their view types
• view() funcLon is called recursively on all members
Porting complex data structures
Nvidia Webinar, 22.10.2013
![Page 59: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/59.jpg)
Example: FM Index
Nvidia Webinar, 22.10.2013
![Page 60: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/60.jpg)
The FM-index (BWT, LF-mapping)
Nvidia Webinar, 22.10.2013
![Page 61: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/61.jpg)
The FM-index (search ssi)
Nvidia Webinar, 22.10.2013
a3 = C(‘i’) + Occ(‘i’,0) + 1 = 1 + 0 + 1 b3 = C(‘i’) + Occ(‘i’,12) = 1 + 4
![Page 62: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/62.jpg)
The FM-index (backwards search)
Nvidia Webinar, 22.10.2013
a1 = C(‘s’) + Occ(‘s’,8) + 1 = 8 + 2 + 1 b1 = C(‘s’) + Occ(‘s’,10) = 8 + 4
![Page 63: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/63.jpg)
• The FM-‐index can be implemented using a number of string-‐based lookup tables
• ... as well as other indices, e.g. enhanced suffix array, q-‐gram index
• There is a space-‐Lme tradeoff between all these indices
• The FM index has the minimal memory requirements
The FM-index in SeqAn
Nvidia Webinar, 22.10.2013
![Page 64: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/64.jpg)
• SeqAn‘s FM-‐index consists of some nested classes storing Strings
FM-‐index (host-‐only)
A generic FM-index
Nvidia Webinar, 22.10.2013
![Page 65: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/65.jpg)
• The Device type of the FM index uses device_vector instead of String !
• The view of this object (= device-‐part) is the same tree, where leaves are replaced by ContainerViews of device_vectors
GPU FM-‐index (host-‐part)
A generic FM-index
Nvidia Webinar, 22.10.2013
![Page 66: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/66.jpg)
CPU vs. GPU
• Invoking an FM-‐index based search on CPU and GPU:
// Select the index type.!typedef Index<DnaString, FMIndex<> > TIndex; !!// Type is Index<device_vector<Dna>, FMIndex<> >.!typedef typename Device<TIndex>::Type TDeviceIndex; !!// ======== On CPU ======== // ========== On GPU ===========!!// Create an index. // Create a device index.!TIndex index("ACGTTGCAA"); TIndex index("ACGTTGCAA"); ! TDeviceIndex deviceIndex; ! assign(deviceIndex, index); !!// Use the FM-index on CPU. // Use the FM-index in a CUDA kernel.!findCPU(index,…); findGPU<<<...>>>(view(deviceIndex),…); !!template <typename TIndex> template <typename TIndex> !void __global__ void!findCPU(TIndex & index,…); findGPU(TIndex index,…); !
Nvidia Webinar, 22.10.2013
The findGPU kernel AND the findCPU function will invoke many
instances of the SAME generic function which will perform a backtracking algorithm on our
generic index interface
![Page 67: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/67.jpg)
do { ! if (finder.score == finder.scoreThreshold) ! { ! if (goDown(textIt, suffix(pattern, patternIt))) delegate(finder); ! goUp(textIt); ! if (isRoot(textIt)) break; ! } ! else if (finder.score < finder.scoreThreshold) ! { ! if (atEnd(patternIt)) delegate(finder); ! else if (goDown(textIt)) ! { ! finder.score += parentEdgeLabel(textIt) != value(patternIt); ! goNext(patternIt); ! continue; ! } ! } !! do { ! goPrevious(patternIt); ! finder.score -= parentEdgeLabel(textIt) != value(patternIt); ! } while (!goRight(textIt) && goUp(textIt)); !! if (isRoot(textIt)) break; ! finder.score += parentEdgeLabel(textIt) != value(patternIt); ! goNext(patternIt); !} !while (true); !
Approximate search via backtracking
Nvidia Webinar, 22.10.2013
![Page 68: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/68.jpg)
Outlook for GPU support
Nvidia Webinar, 22.10.2013
• Our next steps are:
• Provide parallelFor() to hide CUDA kernel call/OpenMP for-‐loop
• Develop classes for concurrent access (String, job queues)
• Port more indices and index iterators to be used with CUDA
• Port SeqAn‘s alignment module
• Develop a CPU/GPU version of the FM-‐index based read mapper Masai
• ...
• Follow our development:
• Sources: hqps://github.com/seqan/seqan/tree/develop
• Code examples: hqp://trac.seqan.de/wiki/HowTo/DevelopCUDA
![Page 69: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/69.jpg)
69 Nvidia Webinar, 22.10.2013
Generic Parallelization
![Page 70: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/70.jpg)
Multicore parallelization
struct Serial_; !typedef Tag<Serial_> Serial; !!struct Parallel_; !typedef Tag<Parallel_> Parallel; !
• We first introduced Tags to switch between serial and parallel algorithms:
template <typename T> !inline T atomicInc(T &x, Serial) !{ ! return ++x; !} !!template <typename T> !inline T atomicInc(volatile T &x, Parallel) !{ ! __sync_add_and_fetch(&x, 1); !} !
• Then we defined basic atomic operaLons required for thread safety:
![Page 71: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/71.jpg)
• To this end, we developed the Splitter<TValue, TSpec> to compute a parLLon into subintervals of (almost) equal length …
Splitter<unsigned> splitter(10, 20, 3); !for (unsigned i = 0; i < length(splitter); ++i) ! cout << '[' << splitter[i] << ',' << splitter[i+1] << ')' << endl; !!// [10,14) !// [14,17) !// [17,20)!
Splitter
![Page 72: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/72.jpg)
• The Spliqer can also be used with iterators directly
• The Serial / Parallel tag divides an interval range into 1 / #thread_num many intervals
• The parallel tag can be used to switch off the parallel behaviour
template <typename TIter, typename TVal, typename TParallelTag> !inline void arrayFill(TIter begin_, TIter end_, ! TVal const &value, Tag<TParallelTag> parallelTag) !{ ! Splitter<TIterator> splitter(begin_, end_, parallelTag); !! SEQAN_OMP_PRAGMA(parallel for) ! for (int job = 0; job < (int)length(splitter); ++job) ! arrayFill(splitter[job], splitter[job + 1], value, Serial()); !} !
Splitter
![Page 73: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/73.jpg)
73
…12... 2.66 sec
18.6 sec 1 X
Intel Xeon Phi 7120, 244 threads
2.18 sec
SeqAn going parallel
Count occurrences of 10 million 20-‐mers in the human genome using an FM-‐index
47 X
7 X
NVIDIA Tesla K20
I7,3.2 GHz
8.5 X
0.4 s
Thank you for your attention
![Page 74: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/74.jpg)
Upcoming GTC Express Webinars
Register at www.gputechconf.com/gtcexpress
October 23 - Revolutionize Virtual Desktops with the One Missing Piece: A Scalable GPU
October 30 - OpenACC 2.0 Enhancements for Cray Supercomputers
October 31 - Getting the Most out of NVIDIA GRID vGPU with Citrix XenServer
November 5 - Accelerating Face-in-the-Crowd Recognition with GPU Technology
November 6 - Bright Cluster Manager: A CUDA-ready Management Solution for GPU-based HPC
![Page 75: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/75.jpg)
GTC 2014 Call for Posters
Posters should describe novel or interesting topics in
§ Science and research
§ Professional graphics
§ Mobile computing
§ Automotive applications
§ Game development
§ Cloud computing
Call opens October 29
www.gputechconf.com
![Page 76: Introduction to SeqAn, an Open-source C++ Template Library](https://reader034.fdocuments.in/reader034/viewer/2022051314/54bac6c34a795987508b4593/html5/thumbnails/76.jpg)
Sign up for FREE GPU Test Drive on remotely hosted clusters
Develop your codes on latest GPUs today
Test Drive NVIDIA GPUs! Experience The Acceleration
www.nvidia.com/GPUTestDrive