Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7...
Transcript of Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7...
![Page 1: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/1.jpg)
Ben Walker
Darek Stojaczyk
![Page 2: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/2.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit
Agenda
• Direct Memory Access
• Hugepage Management
• Threading Model
2
![Page 3: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/3.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit
Direct memory Access
3
![Page 4: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/4.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit 4
DIRECT memory ACCESS (DMA)
NVMe Memory MMU CPUpaddr vaddr
Memory MMU CPUvaddr
NVMeiova
IOMMU
![Page 5: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/5.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit 5
HOW TO DO DMA in SPDKspdk_malloc()
• Backed by hugepages
• Pinned
• Shared between processes in a multi-process group
![Page 6: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/6.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit
HUGEPAGE MANAGEMENT
6
![Page 7: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/7.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit 7
Pre-reserve big chunk of hugepages
Dynamic allocation within pre-reserved region
Legacy Memory ManagementSPDK application
Challenge:
• Need to calculate how much memory the program needs up front
HP HP HP
alloc allocalloc
HP HP
![Page 8: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/8.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit 8
Implemented in DPDK 18.05, hardened in DPDK 18.08
Directly in response to SPDK (and other feedback)
Dynamically allocates hugepages as needed
NEW DYNAMIC MEMORY MANAGEMENT
HP HP HP
SPDK application
alloc allocalloc
Available since SPDK 18.10 and DPDK 18.05.1
![Page 9: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/9.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit 9
register your own memoryspdk_mem_register(void *vaddr, size_t len)
spdk_mem_unregister(void *vaddr, size_t len)
Restrictions:
• If no IOMMU, must use hugepages
• The memory address and length need to be a multiple of 2MB
![Page 10: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/10.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit 10
Memory maps• Translation tables
• virtual address uint64_t
• There are multiple registered maps
• Maps are notified when users register memory
map = spdk_mem_map_alloc(notify_cb, …)
spdk_mem_map_set_translation(map, vaddr, size, translation)
spdk_mem_map_translate(map, vaddr, *size)
![Page 11: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/11.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit 11
Memory mapsName Key Value User
PCIe Virtual address Physical/busaddress
NVMe, I/OAT
RDMA Virtual address ibv_mr * NVMe-oF initiator and target
Vhost-user Virtual address Virtual address Vhost-user PMD
![Page 12: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/12.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit
Dynamic Threading
12
![Page 13: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/13.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit 13
Reminder: Performance via ConcurrencyModern CPUs provide many cores
Modern I/O devices provide many independent queues
Goal: Architect software to match the hardware
…Core 0 Core NCore 1
…I/O Device I/O Device I/O Device
![Page 14: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/14.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit 14
SPDK 18.07 and EarlierSystem thread running a loop
• Pinned to a specific CPU core
• Called a “reactor”
• Polls event ring for incoming work
• Communication via event passing
Implemented in lib/event
Reactor 1Reactor 0 Reactor N…
Core 0 Core NCore 1
Events Events Events
![Page 15: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/15.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit 15
Challenges• Scaling Up/Down
• Rebalancing
• Integration
Reactor 1Reactor 0 Reactor N…
Core 0 Core NCore 1
Events Events Events
Always Spinning!
![Page 16: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/16.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit
Other Libraries
16
Threading Abstractions: Goals• Create Abstraction layer between the reactor framework and other SPDK libraries
• Allow applications to plug in their own framework
• Trickling in over the last year – fully abstracted in 18.10
Event Framework
Thread
![Page 17: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/17.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit 17
Threading Abstractions: Concepts• spdk_thread
• A thread spinning in a loop, processing events. We tie data to spdk_thread.
• spdk_msg
• An event – a function pointer passed from one thread to another.
• spdk_poller
• A repeated event on a thread with an optional delay between calls.
• spdk_io_channel
• An abstraction for per-thread context.
![Page 18: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/18.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit 18
Threading Abstractions: Polling• spdk_thread_poll
• Runs any expired pollers
• Runs all incoming messages
• Returns
• As of 19.01, reactors do the following on each loop:
• Call spdk_thread_poll on their single spdk_thread
• Check for incoming events (the old style events)
![Page 19: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/19.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit 19
Lightweight Threading: The BasicsConcept Address Space Multiplex Method Scheduler
Process Isolated Pre-emptive OS
Thread Shared Pre-emptive OS
Fiber Shared Cooperative Language Run Time, Framework
Green Thread Either Either Not OS
Today, spdk_thread is a Thread. In 19.04, an spdk_threadbecomes both a Fiber and a Green Thread.
![Page 20: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/20.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit 20
Lightweight Threading: The Implementation
Reactor 1Reactor 0 Reactor N
…
Core 0 Core NCore 1
Threads Threads Threads
spdk_thread
spdk_thread
spdk_thread_createSchedule thread
![Page 21: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/21.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit 21
Lightweight Threading: RECAP• spdk_thread_lib_init
• Provide a “schedule” callback
• Reactors contain a list of spdk_threads
• Each loop, call spdk_thread_poll on each spdk_thread
• When thread created, call schedule callback. Get placed on reactor.
![Page 22: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/22.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit 22
Lightweight Threading: Advanced Scheduling• spdk_thread_poll returns whether it did any “work”
• Each poller reports whether it accomplished anything (true/false)
• Any message also counts as work
Use Busy Indications to Reschedule Threads!
![Page 23: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/23.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit 23
Lightweight Threading: Advanced Scheduling
Reactor 1Reactor 0 Reactor N
…
Core 0 Core NCore 1
Threads Threads Threads
spdk_thread spdk_thread
spdk_thread spdk_thread
zzzZZZZZ
![Page 24: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/24.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit 24
Lightweight Threading: Advanced Scheduling
Reactor 1Reactor 0 Reactor N
…
Core 0 Core NCore 1
Threads Threads Threads
spdk_thread spdk_thread
spdk_thread spdk_thread
Slow down!
![Page 25: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/25.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit 25
Lightweight Threading: Advanced Scheduling
Intel Speed Select Technology
• Intel SST-Base Frequency
• Intel SST-Core Power
Available on Cascade Lake N SKUs
![Page 26: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/26.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit 26
Future Direction• Smarter thread schedulers
• More configuration parameters (trade latency vs power consumption)
• Thread affinity
![Page 27: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/27.jpg)
![Page 28: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/28.jpg)
SPDK, PMDK & Vtune™ Amplifier Summit 28
Backup
![Page 29: Ben Walker Darek Stojaczyk · HUGEPAGE MANAGEMENT 6. SPDK, PMDK & Vtune™ Amplifier Summit 7 Pre-reserve big chunk of hugepages Dynamic allocation within pre-reserved region Legacy](https://reader035.fdocuments.in/reader035/viewer/2022062508/60854edd1b1c724a9911889c/html5/thumbnails/29.jpg)