Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API...
Transcript of Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API...
![Page 1: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/1.jpg)
Maciej Maciejewski, Krzysztof Czuryło
LinuxCon, Berlin ’16
![Page 2: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/2.jpg)
Salvador DalíThe Persistence of Memory
2
![Page 3: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/3.jpg)
3
![Page 4: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/4.jpg)
Establishing the Open Industry NVM Programming Model
36+ Member Companies
http://snia.org/sites/default/files/NVMProgrammingModel_v1.pdf
SNIA Technical Working GroupDefined 4 programming modes required by developers
Spec 1.0 developed, approved by SNIA voting members and published
Interfaces for PM-aware file system accessing
kernel PM support
interfaces for application accessing a PM-aware file
system
Kernel support for block NVM
extensions
Interfaces for legacy applications to access block NVM extensions
4
![Page 5: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/5.jpg)
Persistent memory programming model
5
NVDIMM Hardware
UserSpace
KernelSpace
Standard
File API
NVDIMM Driver
Application
File System
ApplicationApplication
Standard
Raw Device
Access
Load/Store
Management Library
Management UI
Standard
File API
pmem-AwareFile System
MMU
Mappings
![Page 6: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/6.jpg)
OPERATING SYSTEM
6
![Page 7: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/7.jpg)
ACPI NFIT
7
![Page 8: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/8.jpg)
E820
8
![Page 9: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/9.jpg)
What can we do with it?
9
![Page 10: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/10.jpg)
Memory
10
![Page 11: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/11.jpg)
Memory - libvmmalloc
11
libvmem
jemalloc
Application
PM
KernelSpace
UserSpace
malloc
vmmalloc
mmap
DRAM
vmem_pool_mallocvmem_pool_create
constructor
temporary file
![Page 12: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/12.jpg)
Memory - libvmem
12
libvmem
jemalloc (3.6.0)
Application
Persistent Memory
KernelSpace
UserSpace
libc(or other)
vmem_malloc
vmem
fallocate/mmap
malloc
DRAM
mmap/sbrk
vmem_create
vmem_pool_mallocmalloc
vmem_pool_create
temporary file
![Page 13: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/13.jpg)
Memory - https://github.com/memkind/memkind
13
![Page 14: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/14.jpg)
Block storage
14
![Page 15: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/15.jpg)
15
![Page 16: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/16.jpg)
Byte persistency
16
![Page 17: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/17.jpg)
Data persistency
17
![Page 18: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/18.jpg)
Atomicity
18
Flushing to Persistence
open(…);
mmap(…);
strcpy(pmem, „Hello");
msync(pmem, 6, MS_SYNC);
pmem_persist(pmem, 6);
strcpy(pmem, „Hello, World!");
pmem_persist(pmem, 14);
Crossing the 8-Byte Store
Result?
1. „\0\0\0\0\0\0\0\0\0\0...”
2. „Hello, W\0\0\0\0\0\0...”
3. „\0\0\0\0\0\0\0\0orld!\0”
4. „Hello, \0\0\0\0\0\0\0\0”
5. „Hello, World!\0”crash
![Page 19: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/19.jpg)
Location
files/pools
mmap(2)
allocation mechanism
bookkeeping
replication & recovery
19
![Page 20: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/20.jpg)
20
PersistentMemory
UserSpace
KernelSpace
Application
Load/Store
MMUMappings
NVDIMM Driver
file
StandardFile API
PM-awareFile System
NVML
![Page 22: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/22.jpg)
nvml Persistent Libraries
libpmem – Basic persistency handling
libpmemblk – Block access to persistent memory
libpmemlog - Log file on persistent memory (append-mostly)
libpmemobj - Transactional Object Store on persistent memory
libpmempool – Pool management utilities
librpmem - Replication
22
https://github.com/pmem/nvml/tree/master/src/examples
http://pmem.io/blog/
![Page 23: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/23.jpg)
Applications
23
![Page 24: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/24.jpg)
Modifying application allocations
Which objects to store at PM?
How to distinguish whether it’s better to allocate/store at HBM/DRAM/PM/SSD/HDD
Do all need to be persistent?
When to guarantee persistence?
24
![Page 25: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/25.jpg)
Modifying application engine
for (int i=0; i<NUMBER_OF_ITERATIONS; i++) {result = calculateThis(i, result);
}
25
i_pm = 0; //at first runtime of app only
...
...
for (i_pm; i_pm<NUMBER_OF_ITERATIONS; i_pm++) {TX_BEGIN(pool) {
result_pm = calculateThis(i_pm, result_pm);} TX_END
} i_pm = 0;
![Page 26: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/26.jpg)
Redis key/value store
26
*All following results come from some old developer machine with Persistent Memory emulated on DDR3
![Page 27: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/27.jpg)
27
0
5
10
15
20
25
30
32 64 128 256 512 1024 2048 4096 8192
seco
nd
s
object size
Startup time
RDB
AOF
PM
![Page 28: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/28.jpg)
28
0
1000
2000
3000
4000
5000
6000
7000
8000
32 64 128 256 512 1024 2048 4096 8192
DR
AM
all
oca
tio
ns
[MB
]
objects size
DRAM usage
AOF
RDB
PM
![Page 29: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/29.jpg)
29
0
20000
40000
60000
80000
100000
120000
36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141
op
era
tio
ns
/ se
c
% of OS memory
Running out of DRAM
No Persist
![Page 30: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/30.jpg)
30
0
20000
40000
60000
80000
100000
120000
36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141
op
era
tio
ns
/ se
c
% of OS memory
Running out of DRAM
No Persist
RDB
![Page 31: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/31.jpg)
31
0
20000
40000
60000
80000
100000
120000
36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141
op
era
tio
ns
/ se
c
% of OS memory
Running out of DRAM
No Persist
RDB
AOF
![Page 32: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/32.jpg)
32
0
20000
40000
60000
80000
100000
120000
36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141
op
era
tio
ns
/ se
c
% of OS memory
Running out of DRAM
No Persist
RDB
PM
AOF
![Page 33: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/33.jpg)
33
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
32 64 128 256 512 1024 2048 4096 8192
Redis, Transactional API
pmem
AOF
![Page 34: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/34.jpg)
34
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
32 64 128 256 512 1024 2048 4096 8192
Redis, Transactional API
pmem
AOF
PM 4x
AOF 4x
![Page 35: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/35.jpg)
35
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
32 64 128 256 512 1024 2048 4096 8192
Redis, Transactional API
pmem
AOF
PM 10x
AOF 10x
PM 4x
AOF 4x
![Page 36: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/36.jpg)
36
0
0,2
0,4
0,6
0,8
1
1,2
1,4
1,6
1,8
2
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
By
tes/
se
con
d
Bil
lio
ns
Object size [B]
Total Data throughput
pmem
AOF
pmem 10x
AOF 10x
![Page 37: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/37.jpg)
PETSc
Portable, Extensible Toolkit for Scientific Computation
A suite of data structures and routines developed by Argonne National Laboratory for the scalable (parallel) solution of scientific applications modeled by partial differential equations.
It employs the Message Passing Interface (MPI) standard for all message-passing communication.
PETSc is the world’s most widely used parallel numerical software library for partial differential equations and sparse matrix computations with over 760 publications.
PETSc includes a large suite of parallel linear and nonlinear equation solvers that are easily used in application codes written in C, C++, Fortran and now Python.
PETSc provides mechanisms needed within parallel application code, that allow the overlap of communication and computation.
https://www.mcs.anl.gov/petsc/
37
![Page 38: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/38.jpg)
Sparse Matrix multiplication
Original:
Time (sec): 2.070e+03
Memory: 1.361e+10
Count Time (sec)
MatAssemblyBegin 4 1.4782e-05
MatAssemblyEnd 4 2.8858e+00
MatLoad 2 1.6146e+01
MatMatMultSym 1 9.0468e+02
MatMatMultNum 1 1.1468e+03
Persistent Memory:
Time (sec): 2.109e+03
Memory: 1.048e+06
Count Time (sec)
MatAssemblyBegin 2 7.1526e-06
MatAssemblyEnd 2 2.6408e+00
MatLoad 2 3.6855e-03
MatMatMultSym 1 9.1570e+02
MatMatMultNum 1 1.1929e+03
Compute 2% slower
Preparation 680% faster
38
![Page 39: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/39.jpg)
Sparse matrix solverOriginal Matrixes in PM All in PM
MatAssemblyBegin 2.6226e-06 1.6443e-02 1.6618e-02
MatAssemblyEnd 1.1275e-01
MatLoad 1.7201e+00
MatMult 9.3791e-02 1.1111e-01 1.1177e-01
VecSet 5.0273e-03 4.6258e-03 9.6970e-03
MatMult 2.6623e+01 2.8851e+01 2.9268e+01
MatView 9.5606e-05 9.3222e-05 1.0443e-04
VecMDot 7.0743e+00 7.0108e+00 8.3750e+00
VecNorm 6.4105e-01 6.4236e-01 6.4869e-01
VecScale 7.0098e-01 7.0180e-01 7.0580e-01
VecCopy 1.5669e-02 1.5939e-02 1.5724e-02
VecSet 4.0993e-02 4.0953e-02 8.0090e-02
VecAXPY 6.5127e-02 6.6045e-02 6.8618e-02
VecMAXPY 1.0328e+01 1.0488e+01 8.7134e+00
VecPointwiseMult 1.1661e+00 1.2202e+00 1.3907e+00
VecNormalize 1.3422e+00 1.3443e+00 1.3546e+00
KSPGMRESOrthog 1.6762e+01 1.6844e+01 1.6556e+01
KSPSetUp 4.5981e-03 4.6084e-03 2.6381e+00
KSPSolve 4.6762e+01 4.9142e+01 6.7208e+01
PCSetUp 2.6226e-06 2.3842e-06 2.8610e-06
PCApply 1.2530e+00 1.3039e+00 1.4794e+00
Time [sec]
Stage 1: File loading
Stage 2: Vector duplication and multiplication
Stage 3: Solver stage
0
100
200
300
400
500
600
Time [s] Memory [MB]
48,67
557
49,33
260
74,89
7
GMRES Sparse Matrix solver
Original Matrix data in PM All in PM
39
![Page 40: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/40.jpg)
No universal receipt
Different data usage scenarios
A lot of architectural work
Even more coding
40Credit: Uwe Kils http://www.ecoscope.com/iceberg/
SNIAProgramming models
NVMLHW
RDMA
RAS
Languages bindingsreplication
OS addressing space limit
OS boot up / hibernation
POSIX
Wear leveling
JVM memory management
Virtualization
Space management
TLB
![Page 41: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management](https://reader033.fdocuments.in/reader033/viewer/2022052010/60211463237cf668b30e279f/html5/thumbnails/41.jpg)