SQLite Running entirely on the GPU Sky Morey Chief ...€¦ · SQLite –Running entirely on the...
Transcript of SQLite Running entirely on the GPU Sky Morey Chief ...€¦ · SQLite –Running entirely on the...
SQLite – Running entirely on
the GPU
Sky Morey
Chief Architect @DEG
degdigital.com
Library: GpuEx
The team was working on an n-body
simulator with a requirement to dynamically
add, remove, backup, restore, and query the
elements in the simulator. A SQL solution
was the natural fit.
How did we get here?
GTC :: April 2016
DEG/15/5/ATS
Part 1– The Application
– Strategies
– Testing the Stack
Goals– Try the NuGet packages
– Think about ways to use them in your own projects
Part 2– Development Process
– Runtime Layer
– System Layer
– Data Layer
Agenda
GTC :: April 2016
2TLK/50M/Q
Part 1
The Application
see it in action
Start with why
Working on an n-body
simulation
Had an engine running
with some nodes
Next needed to
– Insert, update and
delete nodes
– Backup and restore the
nodes
– Query the current state
Good fit for SQL
SQLite used as an
embedded database
SQLite has a small
code base
SQLite was under the
MIT license
SQLite was best match
GTC :: April 2016
ISTD/C/C+/REF/STA
Block Diagram
GTC :: April 2016
CPU
CPU GPU
dSql JimTcl TinyTcl
Runtime Host
System
Win
Unix
Runtime
Sentinel
Runtime-JimTcl
Runtime-TinyTcl
Runtime
Runtime-JimTcl
Runtime-TinyTcl
Sentinel
System
SysEx
System.Data
DataEx
Win
Unix
Pager
VDBE
B-Tree
System.Data
DataEx
Pager
VDBE
B-Tree
System
SysEx
Map
Runtime Host
Gpu
…
NuGet Simplicity
GpuEx-TinyTcl
GpuEx-JimTcl
GpuEx-dSql
GpuEx-Runtime
GpuEx-Runtime.TinyTcl
GpuEx-Runtime.JimTcl
GpuEx-System
GpuEx-System.Data
GTC :: April 2016
Compute Microarch Library x86 x64 Exceptions
None CPU {library}_cpu.lib Yes Yes
11,12,13 Tesla {library}_11.lib Yes - Runtime only
20,21 Fermi {library}_20.lib Yes Yes
30,32 Kepler {library}_30.lib Yes Yes
35,37 {library}_35.lib Yes Yes
50 Maxwell {library}_50.lib Yes Yes
52, 53 {library}_52.lib Yes Yes
60 Pascal {library}_60.lib Yes Yes PKG/LIB35/ENV/PSAKE
Video :: NuGet Runtime
GTC :: April 2016
Video :: NuGet TinyTcl
GTC :: April 2016
Video :: NuGet DataEx
GTC :: April 2016
Video :: NuGet dSql
GTC :: April 2016
dSql Examples
Memory database
x86 or x64
DDL, DML
– Create table
– Insert
– Select
– Delete
– Drop table
File database
x64 only
DML
– Limit
– Join
– Aggregate
GTC :: April 2016
Video :: Memory Database
GTC :: April 2016
Video :: File Based Database
GTC :: April 2016
Strategies
single threaded limitation and using with CUDA
Intermixing – single/multi kernel calls
Execution plan – plan single, exec multi
Ganging – warp execution as single thread
Strategies
GTC :: April 2016
Strategy :: Intermixing
Intermix data probes
with application kernels
GTC :: April 2016
Kernel<<<N, 32>>>()
DataProbe<<<1, 1>>>()
Kernel<<<N, 32>>>()
Kernel<<<N, 32>>>()
Intermixing GPU threaded application kernel calls with single threaded SQLite kernel calls.
Strategy :: Execution Plan
Build plan single-
threaded
Execute plan muli-
threaded
GTC :: April 2016
ExecutePlan<<<N, 32>>>(plan)
ExecutePlan<<<N, 32>>>(plan)
plan = BuildPlan<<<1, 1>>>()
Having SQLite execution plan generation single threaded, while its execution is GPU threaded.
Strategy :: Ganging
Singular warp default
Primary warp method
Per warp pattern
GTC :: April 2016
return single malloc
single instruction
single instruction
c c c c c c c
m
single instruction
T1 T2 T3 T4 T5 T6 T7
Introduce “Ganging” for single thread kernel acceleration. Ganging executes in 32 thread form with
a primary thread and 31 supporting threads for localized search or computation acceleration.
Testing the Stack
how do we know it works?
In-place and xUnit tests for unit testing
TCL scripts for unit and integration testing
Other patterns for automated testing
Testing Codebase
GTC :: April 2016
Try the NuGet packages
Think about ways to use them in your own
projects
End of Part 1
GTC :: April 2016
TRNS/Q
Part 2
The Development Processhow did this get built?
SQLite is a single thread application, so it is
implemented as such
Converted from C to C++ and CUDA, and
segmented into three additive parts
– Runtime – represents lower level operations with
stdio/stdlib functions
– System – OS layer abstraction and sentinel
message bus
– System.Data – SQLite core data engine
Porting
GTC :: April 2016
Packaging for multiple architectures and compilations issues– Project file changes, and build scripts for multiple
build targets and NuGet packaging, with variations of release\debug, and win32\x64, and cpu\gpu20-35
– Multiple libraries, and hard-coded lib building for CUDA
– Multiple cu files combined to hide context switching costs for faster build
Build and Package
GTC :: April 2016
LIB
Video :: Packaging
GTC :: April 2016
Video :: Nvcc Context Switching Costs
GTC :: April 2016
Video :: Ellipse
GTC :: April 2016
Native file system access with Sentinel
– Sentinel is a host to device message bus that
solves GPU access to host resources
Future: GPU only device file system
– Host file system calls though an IPC to GPU files
– dcat, dcmp, dcp, dgrep, dls, dmkdir, dmore, dmv,
drm, drmdir
– dchgrp, dchmod, dchown
File System
GTC :: April 2016
IPC/CTX/INPROC
Sentinel
GTC :: April 2016
CPUGPU
MEMORY
fopen
fopen
FILE *f = _fopen("fopen.txt", "w");
_fprintfR(f, "The quick brown fox jumps over the lazy dog");
_fcloseR(f);
fprintf fclose fopen fprintf fclose
fopen.txt, w0 0x1234
fprintf 0x1234, The quick brow…0 43
fclose 0x12340 0
20
20
20
NuGet Packaging
GTC :: April 2016
Lemon generator for lexical processing, converted to .net tool with CUDA output
TCL ported to CUDA to run unit tests on hardware
Tools ported
GTC :: April 2016
Lemon generator
TCL
PKG/LIB35
The Runtime Layer
Layers
Block Diagram
GTC :: April 2016
CPU
CPU GPU
Runtime Host
Runtime
Sentinel
Runtime
Sentinel
Runtime Host
VCRT/HEA P/SENT
Separate heap for _printf, _throw, _assert data shuffling to host
C runtime replacement in GPU, like stdio/stdlib
– Memory management, alloc, realloc, free and debugging
– atof, atoi64, atoi, itoa methods
– toupper, isupper, isspace, isalnum, isalpha, isdigit, isxdigit, isidchar, tolower methods
– strcpy, strncpy, strcat, strchr, strstr, strcmp, strncmp, memcpy, memstr, memchr, memcmp, memmove, strlen, hextobyte methods
– snprintf methods
Methods prefixed with “_” to avoid naming collisions
Runtime
GTC :: April 2016
SQLITE/DIY
The System Layer
Layers
Block Diagram
GTC :: April 2016
CPU
CPU GPU
Runtime Host
System
Win
Unix
Runtime
Sentinel
Runtime
Sentinel
System
SysEx
Win
Unix
System
SysEx
Map
Runtime Host
Gpu
ISYS/GPU
Host only implementation of the Unix and
Windows Systems
Device only version of a “map system” which
sits in the GPU and shuffles messages back
to the Host version using Sentinal
System
GTC :: April 2016
ABSTRACT/SENTINEL
The Data LayerLayers
Block Diagram
GTC :: April 2016
CPU
CPU GPU
Runtime Host
System
Win
Unix
Runtime
Sentinel
Runtime
Sentinel
System
SysEx
System.Data
DataEx
Win
Unix
Pager
VDBE
B-Tree
System.Data
DataEx
Pager
VDBE
B-Tree
System
SysEx
Map
Runtime Host
Gpu
B-Tree
Pager
VDBE
Stack had to be big, would
crash if not > 5Meg
System.Data
GTC :: April 2016
CODEVOLUME/STACK
Conclusion :: NVidia Ask
GTC :: April 2016
Thank you for your time
Ask: Add a native SATA / NVMe interface to the GeForce line.
GPUASSET/DB