1 SQL vs GPU Richard Wilton Johns Hopkins University Department of Physics and Astronomy.
-
Upload
dylan-payne -
Category
Documents
-
view
213 -
download
0
Transcript of 1 SQL vs GPU Richard Wilton Johns Hopkins University Department of Physics and Astronomy.
![Page 1: 1 SQL vs GPU Richard Wilton Johns Hopkins University Department of Physics and Astronomy.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649ee05503460f94bf106f/html5/thumbnails/1.jpg)
1
SQL vs GPU
Richard Wilton
Johns Hopkins University
Department of Physics and Astronomy
![Page 2: 1 SQL vs GPU Richard Wilton Johns Hopkins University Department of Physics and Astronomy.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649ee05503460f94bf106f/html5/thumbnails/2.jpg)
2
Rationale
• Terabyte-scale data is best managed in a DBMS• Some classes of computation are best implemented
using a GPU
![Page 3: 1 SQL vs GPU Richard Wilton Johns Hopkins University Department of Physics and Astronomy.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649ee05503460f94bf106f/html5/thumbnails/3.jpg)
3
How to do GPU computation in a terabyte-scale database
• Do the computation “outside” the database (export the data / compute / re-import the data)
• Do the computation “inside” the database• Something in between
![Page 4: 1 SQL vs GPU Richard Wilton Johns Hopkins University Department of Physics and Astronomy.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649ee05503460f94bf106f/html5/thumbnails/4.jpg)
IPCSQL code
SQLCLR procedure or function
Special-case functionality
SQL Server Out-of-process server
SQLCLR out-of-process server
Why?• Avoid memory, threading, and permissions restrictions on SQLCLR implementations• Load dynamic-link libraries• Invoke native-code methods• Exploit lower-level APIs (e.g. SqlClient, bulk insert) to facilitate data movement between SQL Server and non-SQL computational resources
The basic concept• Implement computational functionality in a separate process from the SQL Server process• Access that functionality using interprocess communication (IPC)
![Page 5: 1 SQL vs GPU Richard Wilton Johns Hopkins University Department of Physics and Astronomy.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649ee05503460f94bf106f/html5/thumbnails/5.jpg)
5
IPCSQL code
SQLCLR procedure CUDA functionality
SQL Server Out-of-process server
declare @sql nvarchar(max) set @sql = N'exec SqA1.dbo.SWGPerf @qOffset=311, @chrNum=7, @dimTile=128' exec dbo.ExecSWG @sqlCmd=@sql, @targetTable='##tmpX08'
SQL
SQLCLR implementation (in C#)• Allocates IPC buffer• Initializes IPC buffer with the result set from a specified SQL query • Signals out-of-process server• Waits for response from out-of-process server• Returns data from the IPC buffer as a SQL result set
Out-of-process server implementation (in C#)
CUDA implementation (in C++)
• Loads the CUDA implementation (dynamic link library)• Invokes a method in the DLL with a reference to the IPC buffer• Waits for completion• Signals the SQLCLR implementation• Stand-alone application or Windows service
• Implements GPU functionality• Uses the IPC buffer for both incoming and outgoing data
SQLCLR out-of-process server implementation
![Page 6: 1 SQL vs GPU Richard Wilton Johns Hopkins University Department of Physics and Astronomy.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649ee05503460f94bf106f/html5/thumbnails/6.jpg)
6
A test case: genomic sequence data
• Nucleic acid sequence data• Data is processed in a “loose” workflow by a
sequence of software tools• Example:
– Align short sequences to a long reference sequence– Identify alignment regions of interest– Map regions of interest to annotation data (genome
map, known variations, etc.)
![Page 7: 1 SQL vs GPU Richard Wilton Johns Hopkins University Department of Physics and Astronomy.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649ee05503460f94bf106f/html5/thumbnails/7.jpg)
7
A test case: genomic sequence data
• Typical data-management scenario:– All data is maintained in file-system files– Processing software reads and writes files– “Database-like” operations (e.g. joins) are performed
using command-line tools (e.g. Perl scripts)
• Can we use a DBMS as a repository?– All data is maintained in the DBMS– Processing software reads and writes tables– “Database-like” operations (e.g. joins) are performed
using SQL
![Page 8: 1 SQL vs GPU Richard Wilton Johns Hopkins University Department of Physics and Astronomy.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649ee05503460f94bf106f/html5/thumbnails/8.jpg)
8
A test case: genomic sequence data
• Nucleic acid sequence data• DBMS: data repository• GPU: implementation of sequence alignment algorithms
– Gapped alignment: Smith Waterman– Non-gapped alignment: bit vector mapping through a hash table
![Page 9: 1 SQL vs GPU Richard Wilton Johns Hopkins University Department of Physics and Astronomy.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649ee05503460f94bf106f/html5/thumbnails/9.jpg)
9
A test case: genomic sequence data
• Results: so far, so good …– Sequence data can be managed using a DBMS
• Use low-level (C#, C++) implementations to manipulate string and binary data
– Sequence data from the DBMS can be processed using GPU implementations
• GPU sequence alignment implementations can exploit task parallelism and execute 10x (or more) faster than equivalent CPU implementations
• But …– DBMS data abstractions (tables, columns) must be mapped to
and from native GPU data formats and layouts– Data movement out of and into the DBMS is slow
![Page 10: 1 SQL vs GPU Richard Wilton Johns Hopkins University Department of Physics and Astronomy.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649ee05503460f94bf106f/html5/thumbnails/10.jpg)
10
Going forward…
• Compromise: maintain GPU-formatted data as BLOBs in the file system– BLOBs are in database “native” format
– BLOBs correspond to database tables or table partitions
• Much programming and testing remain to be done…