Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging,...
Transcript of Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging,...
![Page 1: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/1.jpg)
Exploring System Challenges of Ultra-Low Latency Solid State Drives
Sungjoon Koh
Changrim Lee, Miryeong Kwon, and Myoungsoo Jung
Computer Architecture and Memory systems Lab
![Page 2: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/2.jpg)
Executive Summary
Motivation. Ultra-low latency (ULL) is emerging, but not characterized by far.
Contributions.
- Characterizing the performance behaviors of ULL SSD.
- Studying several system-level challenges of the current storage stack.
Key Observations.
- ULL SSD minimizes the I/O interferences (interleaving reads and writes).
- NVMe queue mechanisms are required to be optimized for ULL SSDs.
- Polling-based I/O completion routine isn’t effective for current NVMe SSDs.
![Page 3: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/3.jpg)
Architectural Change of SSD
MCH
(North Bridge)
PCI Express
DRAM
CPU
PCI Express
DRAM
ICH
(South Bridge)
SATA
Direct Access
High
bandwidth
SATA SSD
NVMe SSD
![Page 4: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/4.jpg)
Evolution of SSDs
NVMe SSD
Read: 2.4GB/s
Write: 1.2 GB/s
SATA SSD
Read: 0.5 GB/s
Write: 0.5 GB/s
Changes
Bandwidth almost reaches the
maximum performance.
Still, long latency (far from DRAM)
New flash memory, called “Z-NAND”
![Page 5: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/5.jpg)
New Flash Memory
Existing 3D NAND
Read: 45-120 𝜇s
Write: 660-5000 𝜇s
Z-NAND [1]
Read: 3𝝁s (15~20x)
Write: 100𝝁s (6~7x)
Z-NAND [1]
TechnologySLC based 3D NAND
48 stacked word-line layer
Capacity 64Gb
Page Size 2kB/Page
Z-NAND based archives “Z-SSD”
![Page 6: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/6.jpg)
Characterization Categories
Performance Analysis.
- Average latency.
- Long-tail latency.
- Bandwidth.
- I/O interference impact.
Polling vs. Interrupt
- Overall latency comparison.
- CPU utilization analysis.
- Memory requirement.
- Five-nines latency.
![Page 7: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/7.jpg)
Evaluation Settings
Benchmark: Flexible I/O Tester (FIO v2.99)
OS: Linux 4.14.10
CPU: Intel® Core™ i7-4790K (4-core, 4.00GHz)
Memory: DDR4 DRAM (16GB)
SSD
- ULL SSD: Z-SSD Prototype (800GB)
- NVMe SSD: Intel® SSD 750 Series (400GB) <Our testbed w/ Z-SSDs>
Z-SSD Prototype
![Page 8: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/8.jpg)
Performance Analysis
![Page 9: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/9.jpg)
Overview
Host
SSD
Request Queue
NVMe Controller
NVMe Driver
4KB 4KB 4KB 4KB 4KB 4KB 4KB 4KB
Increase queue depth
Rd Wr Rd Wr Rd Wr Rd Wr
① Average latency & Long-tail
latency
② Bandwidth
③ Read latency under
Read & Write intermixed workload
Wr
![Page 10: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/10.jpg)
Average Latency of ULL SSD
5.1x 1 2 3 46
9
12
15
18
21
“Split-DMA & Super-Channel”
1.8x
4KB DMA = 8𝝁s ( =3t𝑅 𝜇s)
t𝑅 t𝐷𝑀𝐴
11 𝜇s
2 4 6 8 10 12 14 160
30
60
90
120
150
NVMe
Ave
rag
e L
ate
ncy (
μse
c)
I/O Depth
ULL
2 4 6 8 10 12 14 160
30
60
90
120
150
NVMe
Ave
rag
e L
ate
ncy (
μse
c)
I/O Depth
ULL
Sequential WriteSequential Read
2 4 6 8 10 12 14 160
5
10
15
20
25
30
35
40
SeqRd RndRd
SeqWr RndWr
Avera
ge L
ate
ncy (
μsec)
I/O Depth
![Page 11: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/11.jpg)
Channel 1
Channel 0
Channel 1
Split-DMA & Super-Channel
4KB
Request
Z-SSD
Split DMA
Engine
2KB
2KB
Split
Channel 0
Channel 2
Channel 4
Channel 3
Channel 5
Super
Channel
𝑡𝐷𝑀𝐴 = 4𝜇𝑠
Reference: Cheong, Woosung et al., “A flash memory controller for 15μs ultra-low-
latency SSD using high-speed 3D NAND flash with 3μs read time”, ISSCC, 2018
![Page 12: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/12.jpg)
Long-tail Latency of ULL SSD
“Split DMA” &
“Suspend/Resume”
Resource conflict
Insufficient internal buffer,
Internal tasks
2 4 6 8 10 12 14 1601234567
ULL
SeqRd RndRd
SeqWr RndWr
99
.99
9th
La
ten
cy (
mse
c)
I/O Depth
NVMe
SeqRd RndRd
SeqWr RndWr
![Page 13: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/13.jpg)
Suspend/Resume DMA Technique
DMA (for write request)Way 1
Way 2 CMD𝑡𝑅 Data Out𝑡𝑅Reduce read latency &
Increase QoS
Way 1
Way 2 CMD𝑡𝑅 Data Out
DMA (for write request)
Suspend Resume
Wait
Suspend/Resume [1]
Read
Reference: Cheong, Woosung et al., “A flash memory controller for 15μs ultra-low-
latency SSD using high-speed 3D NAND flash with 3μs read time”, ISSCC, 2018
![Page 14: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/14.jpg)
Flush operation / meta data writes
in file system are
intermixed with user requests
I/O Interference
0 20 40 60 800
100
200
300
400
500
600
27 32 31 34 37
Re
ad
La
ten
cy (
μse
c)
Write fraction (%)
Average
NVMe SSD
ULL SSD
0 20 40 60 800
100
200
300
400
500
600
Re
ad
La
ten
cy (
μse
c)
Write fraction (%)
Average
NVMe SSDSignificant performance
degradation in intermixed
workloads.How about ULL SSD?
Remains almost constant
“Suspend/resume”, … [1]
ULL SSD can be applied to real-life
storage stack w/o performance
degradation.
Great performance bottleneck of conventional SSDs.
![Page 15: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/15.jpg)
Queue Analysis
50 100 150 200 2500.0
0.2
0.4
0.6
0.8
1.0
SeqRd RndRd
SeqWr RndWr
No
rma
lize
d B
an
dw
idth
I/O Depth4 8 12 16 20
0.0
0.2
0.4
0.6
0.8
1.0
No
rma
lize
d B
an
dw
idth
I/O Depth
SeqRd RndRd
SeqWr RndWr
50 100 150 200 2500.0
0.2
0.4
0.6
0.8
1.0
No
rma
lize
d B
an
dw
idth
I/O Depth
Only 6 entries required
NVMe SSD ULL SSD
Short write latency
Only 50% of Max BWAlmost Max BW
Requires more than 100 entries.
Light queue mechanisms (ex. NCQ)
are not sufficient.
Requires rich queue mechanism
Well-aligned with light queue
mechanisms (ex. NCQ).
NVMe needs to be lightened
Too long write latencyI/O request rescheduling within queue.
![Page 16: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/16.jpg)
Polling vs. Interrupt
Two different I/O completion methods
![Page 17: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/17.jpg)
Interrupt / Polling
Systems with short waiting time adopts polling-based
waiting strategy.(even though it incurs lots of overheads)
Does it really need for current NVMe SSDs?
For example, “spin lock”, “network message passing”
applies polling-based waiting strategy.
Polling is currently implemented to NVMe storage stack.
![Page 18: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/18.jpg)
Interrupt / Polling
Submit request SleepCS Complete requestCS
Command Execution
ISRCS
Submit request Polling Complete request
Command Execution
Interrupt.
Polling.
CS CS
Gain
NVMe Controller② Raise IRQ
③ Wake
SSD
SSD
Done??
① Finishes
Shorter
Low latency
Larger portion
![Page 19: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/19.jpg)
Overall Performance
4KB8KB
16KB32KB
141618202224262830
Avera
ge L
ate
ncy (
sec)
Interrupt
Polling
4KB8KB
16KB32KB
80
100
120
140
160
180
Avera
ge L
ate
ncy (
sec)
Interrupt
Polling
4KB8KB
16KB32KB
10
12
14
16
18
20
22
Polling
Avera
ge L
ate
ncy (
sec)
Interrupt
4KB8KB
16KB32KB
8
12
16
20
24
28
32
36
Avera
ge L
ate
ncy (
sec)
Interrupt
Polling
NVMe SSD ULL SSD
Decreases only
Read: 0.9% & Write: 8.2%
Decreases by
Read: 7.5% & Write: 13.2%
Read Write Read Write
Polling-based I/O
services are not
effective for current
NVMe SSDs.
Does polling-based
I/O works on ULL
SSD?
Future lower latency SSD can achieve
remarkable performance improvement with
polling-based I/O completion routine.
![Page 20: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/20.jpg)
4KB8KB
16KB32KB
0
20
40
60
80
100
Me
mo
ry B
ou
nd
(%
)
Interrupt
System Challenges
4KB8KB
16KB32KB
0
20
40
60
80
100
Me
mo
ry B
ou
nd
(%
) Polling
Interrupt CPU
Core 1
CPU
Core n
NVMe Controller
SQ Tail Doorbell
CQ Head Doorbell
Host
Check CQ updateNVMe Controller Memory Space
Spin lock for
head/tail pointer
Synchronization
<Memory Bound>
Core 0
CQSQ
0
20
40
60
80
100
Time
CP
U U
tiliz
ation (
%)
Interrupt
0
20
40
60
80
100
Time
CP
U U
tiliz
ation (
%)
PollingCore always
Working
4KB8KB
16KB32KB
4.24.34.44.54.64.74.84.95.0
99.9
99%
Late
ncy (
msec)
ULL Write
Interrupt
Polling
<CPU Uitlization>
Polling does not
release CPU
CQ
Head
Tail SQ Head
Tail
Polling-based I/O services incur
significant system-level overheads
Needs to be addressed
High CPU utilization Frequent memory access
Memory bound
= Fraction of slots where
pipeline could be stalled
due to load/store.
High memory bound
= Frequent memory access
![Page 21: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/21.jpg)
Conclusion
Motivation. Ultra-low latency (ULL) is emerging, but not characterized by far.
Contributions.
- Characterizing the performance behaviors of ULL SSD.
- Studying several system-level challenges of the current storage stack.
Key Insights.
- ULL SSDs can be effectively applied to real-life storage stack. (RW mixed)
- NVMe queue mechanisms are required to be optimized for ULL SSDs.
- Polling-based I/O completion routine isn’t effective for current NVMe SSDs.
![Page 22: Exploring System Challenges of Ultra-Low Latency Solid ... · Ultra-low latency (ULL) is emerging, but not characterized by far. Contributions. - Characterizing the performance behaviors](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea0a6246cb4c8202409ad93/html5/thumbnails/22.jpg)
Thank you
Q&A