Post on 28-Jul-2020
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 107/08/2019
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research
11th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’19)
Zhen Cao1, Geoff Kuenning2, Klaus Mueller1, Anjul Tyagi1, and Erez Zadok1
1Stony Brook University; 2Harvey Mudd College
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 207/08/2019
Outlinel Motivationl Related Workl Data Collectionl Design of ICEl Case Studyl Future Workl Conclusions
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 307/08/2019
Motivationl Analyzing storage systems is important and challenging
uStatistics, machine learning, etc.u2-D visualization.
l Key challengeuStorage systems are often affected by many factors
§ Tunable parameters, workloads, hardware, etc.
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 407/08/2019
# Workload I/O Size(KB)
HDD Results SSD ResultsBase
(% Diff)Optimized
(% Diff)Base
(% Diff)Optimized
(% Diff)1234
seq-rd-1th-1f
4 - 2.5 + 1.7 - 0.5 - 0.932 - 0.2 - 2.2 + 0.8 + 0.3
128 - 0.9 - 2.1 + 0.4 + 1.71024 - 0.9 - 2.2 + 0.2 - 0.3
5678
seq-rd-32th-32f
4 - 36.9 - 26.9 - 0.1 - 0.232 - 41.5 - 30.3 - 0.1 - 1.8
128 - 41.3 - 29.8 - 0.1 - 0.21024 - 41.0 - 28.3 - 0.0 - 2.1
• Workload• I/O size• HDD vs. SSD• Base vs. Optimized
A Partial Example
[FAST’17 & TOS’18]
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 507/08/2019
[FAST’17]
[HotStorage’15]
[FAST’17]For Peer Review
Table 9 – continued from previous page# Workload HDD Results SSD ResultsI/O Size
(KB) Ext4(%)
SBase(%Di�)
SOpt(%Di�)
Ext4(%)
SBase(%Di�)
SOpt(%Di�)
9 4 28 - 35.7# - 28.6# 95 0.0‡ 0.0‡10 32 30 - 43.3# - 30.0# 98 0.0‡ - 1.0†11 128 30 - 43.3# - 30.0# 98 0.0‡ - 1.0†12
seq-rd-32th-32f
1024 30 - 43.3# - 30.0# 98 0.0‡ - 1.0†13 4 1 0.0‡ 0.0‡ 13 - 30.8# - 38.5#14 32 4 0.0‡ 0.0‡ 45 - 17.8⇤ - 24.4⇤15 128 14 0.0‡ 0.0‡ 75 - 13.3⇤ - 12.0⇤16
rnd-rd-1th-1f
1024 59 + 3.4‡ 0.0‡ 89 - 3.4† 0.0‡17 4 1 0.0‡ 0.0‡ 69 - 82.6! - 24.6⇤18 32 10 - 60.0! - 20.0⇤ 94 - 55.3! - 1.1†19 128 21 - 33.3# - 9.5⇤ 98 - 27.6# 0.0‡20
rnd-rd-32th-1f
1024 27 - 33.3# - 11.1⇤ 98 - 9.2⇤ 0.0‡
21 4 92 - 26.1# - 1.1† 95 - 7.4⇤ 0.0‡22 32 92 - 17.4⇤ - 1.1† 95 - 2.1† 0.0‡23 128 92 - 16.3⇤ - 1.1† 95 - 1.0† 0.0‡24
seq-wr-1th-1f
1024 92 - 17.4⇤ - 1.1† 95 - 1.0† 0.0‡25 4 86 - 2.3† - 1.2† 95 0.0‡ 0.0‡26 32 86 - 2.3† - 1.2† 95 0.0‡ 0.0‡27 128 85 - 1.2† 0.0‡ 95 0.0‡ 0.0‡28
seq-wr-32th-32f
1024 85 - 1.2† 0.0‡ 95 0.0‡ 0.0‡29 4 2 0.0‡ 0.0‡ 46 0.0‡ - 26.1#30 32 14 0.0‡ 0.0‡ 93 0.0‡ - 9.7⇤31 128 28 0.0‡ 0.0‡ 95 0.0‡ 0.0‡32
rnd-wr-1th-1f
1024 50 0.0‡ 0.0‡ 95 0.0‡ - 1.0†33 4 2 0.0‡ 0.0‡ 46 0.0‡ - 26.1#34 32 14 0.0‡ 0.0‡ 93 0.0‡ - 12.9⇤35 128 28 0.0‡ 0.0‡ 95 0.0‡ 0.0‡36
rnd-wr-32th-1f
1024 50 0.0‡ - 2.0† 95 0.0‡ - 1.0†
37 �les-cr-1th 4 31 - 58.1! - 80.6! 42 - 61.9! - 83.3!38 �les-cr-32th 4 37 - 48.6# - 59.5! 54 - 57.4! - 61.1!39 �les-del-1th 4 2 0.0‡ - 50.0! 21 - 23.8⇤ - 57.1!40 �les-del-32th 4 3 - 33.3# 0.0‡ 66 - 74.2! - 33.3#41 �les-rd-1th - 5 0.0‡ 0.0‡ 42 - 33.3# - 61.9!42 �les-rd-32th - 5 0.0‡ 0.0‡ 48 - 39.6# - 54.2!
43 �le-server - 25 - 24.0⇤ 0.0‡ 95 - 41.0# - 2.1†44 mail-server - 11 - 45.4# - 9.1⇤ 76 - 60.5! - 17.1⇤45 web-server - 7 - 42.9# 0.0‡ 81 - 74.1! - 9.9⇤
5.2.2 Disk Bandwidth Utilization Observations. In our experiments we �rst measured disk band-width in MB/sec and then normalized it by the maximum possible disk bandwidth achieved with asequential I/O on a raw device without any �le system. Table 9 shows the results for the normalized
32
Page 33 of 61 ACM Transactions on Storage
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
HDD Results SSD Results# Workload I/O Size(KB) EXT4
(ops/sec)StackfsBase(%Diff)
StackfsOpt(%Diff)
EXT4(ops/sec)
StackfsBase(%Diff)
StackfsOpt(%Diff)
1 4 38382 - 2.45+ + 1.7+ 30694 - 0.5+ - 0.9+
2 32 4805 - 0.2+ - 2.2+ 3811 + 0.8+ + 0.3+
3 128 1199 - 0.86+ - 2.1+ 950 + 0.4+ + 1.7+
4
seq-rd-1th-1f
1024 150 - 0.9+ - 2.2+ 119 + 0.2+ - 0.3+
5 4 1228400 - 2.4+ - 3.0+ 973450 + 0.02+ + 2.1+
6 32 153480 - 2.4+ - 4.1+ 121410 + 0.7+ + 2.2+
7 128 38443 - 2.6+ - 4.4+ 30338 + 1.5+ + 1.97+
8
seq-rd-32th-1f
1024 4805 - 2.5+ - 4.0+ 3814.50 - 0.1+ - 0.4+
9 4 11141 - 36.9# - 26.9# 32855 - 0.1+ - 0.16+
10 32 1491 - 41.5# - 30.3# 4202 - 0.1+ - 1.8+
11 128 371 - 41.3# - 29.8# 1051 - 0.1+ - 0.2+
12
seq-rd-32th-32f
1024 46 - 41.0# - 28.3# 131 - 0.03+ - 2.1+
13 4 243 - 9.96* - 9.95* 4712 - 32.1# - 39.8#
14 32 232 - 7.4* - 7.5* 2032 - 18.8* - 25.2#
15 128 191 - 7.4* - 5.5* 852 - 14.7* - 12.4*
16
rnd-rd-1th-1f
1024 88 - 9.0* -3.1+ 114 - 15.3* -1.5+
17 4 572 - 60.4! -23.2* 24998 - 82.5! -27.6#
18 32 504 - 56.2! -17.2* 4273 - 55.7! -1.9+
19 128 278 - 34.4# -11.4* 1123 - 29.1# -2.6+
20
rnd-rd-32th-1f
1024 41 - 37.0# -15.0* 126 - 12.2* -1.9+
21 4 36919 -26.2# - 0.1+ 32959 - 9.0* + 0.1+
22 32 4615 - 17.8* - 0.16+ 4119 - 2.5+ + 0.12+
23 128 1153 - 16.6* - 0.15+ 1030 - 2.1+ + 0.1+
24
seq-wr-1th-1f
1024 144 - 17.7* -0.31+ 129 - 2.3+ - 0.08+
25 4 34370 - 2.5+ + 0.1+ 32921 + 0.05+ + 0.2+
26 32 4296 - 2.7+ + 0.0+ 4115 + 0.1+ + 0.1+
27 128 1075 - 2.6+ - 0.02+ 1029 - 0.04+ + 0.2+
28
seq-wr-32th-32f
1024 134 - 2.4+ - 0.18+ 129 - 0.1+ + 0.2+
29 4 1074 - 0.7+ - 1.3+ 16066 + 0.9+ - 27.0#
30 32 708 - 0.1+ - 1.3+ 4102 - 2.2+ - 13.0*
31 128 359 - 0.1+ - 1.3+ 1045 - 1.7+ - 0.7+
32
rnd-wr-1th-1f
1024 79 - 0.01+ - 0.8+ 129 - 0.02+ - 0.3+
33 4 1073 - 0.9+ - 1.8+ 16213 - 0.7+ - 26.6#
34 32 705 + 0.1+ - 0.7+ 4103 - 2.2+ - 13.0*
35 128 358 + 0.3+ - 1.1+ 1031 - 0.1+ + 0.03+
36
rnd-wr-32th-1f
1024 79 + 0.1+ - 0.3+ 128 + 0.9+ - 0.3+
37 files-cr-1th 4 30211 - 57! - 81.0! 35361 - 62.2! - 83.3!
38 files-cr-32th 4 36590 - 50.2! - 54.9! 46688 - 57.6! - 62.6!
39 files-rd-1th 4 645 + 0.0+ - 10.6* 8055 - 25.0* - 60.3!
40 files-rd-32th 4 1263 - 50.5! -4.5+ 25341 - 74.1! -33.0#
41 files-del-1th - 1105 - 4.0+ - 10.2* 7391 - 31.6# - 60.7!
42 files-del-32th - 1109 - 2.8+ - 6.9* 8563 - 42.9# - 52.6!
43 file-server - 1705 - 26.3# -1.4+ 5201 - 41.2# -1.5+
44 mail-server - 1547 - 45.0# -4.6+ 11806 - 70.5! -32.5#
45 web-server - 1704 - 51.8! +6.2+ 19437 - 72.9! -17.3*
Table 3: List of workloads and corresponding performance results. Green class (marked with +) indicates that the performanceeither degraded by less than 5% or actually improved; Yellow class (*) includes results with the performance degradation in the5–25% range; Orange class (#) indicates that the performance degradation is between 25–50%; And finally, the Red class (!) isfor when performance decreased by more than 50%.
8
[TOS’18]
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 607/08/2019
0 25 50 75
100 125 150
file-rre
ad-4
KB
-1th
file-rre
ad-3
2K
B-1
th
file-rre
ad-1
28K
B-1
th
file-rre
ad-1
MB
-1th
file-rre
ad-4
KB
-32th
file-rre
ad-3
2K
B-3
2th
file-rre
ad-1
28K
B-3
2th
file-rre
ad-1
MB
-32th
file-rw
rite-4
KB
-1th
file-rw
rite-3
2K
B-1
th
file-rw
rite-1
28K
B-1
th
file-rw
rite-1
MB
-1th
file-rw
rite-4
KB
-32th
file-rw
rite-3
2K
B-3
2th
file-rw
rite-1
28K
B-3
2th
file-rw
rite-1
MB
-32th
file-sre
ad-4
KB
-1th
file-sre
ad-3
2K
B-1
th
file-sre
ad-1
28K
B-1
th
file-sre
ad-1
MB
-1th
file-sre
ad-4
KB
-32th
file-sre
ad-3
2K
B-3
2th
file-sre
ad-1
28K
B-3
2th
file-sre
ad-1
MB
-32th
file-sw
rite-4
KB
-1th
file-sw
rite-3
2K
B-1
th
file-sw
rite-1
28K
B-1
th
file-sw
rite-1
MB
-1th
32file
s-sread-4
KB
-32th
32file
s-sread-3
2K
B-3
2th
32file
s-sread-1
28K
B-3
2th
32file
s-sread-1
MB
-32th
32file
s-swrite
-4K
B-3
2th
32file
s-swrite
-32K
B-3
2th
32file
s-swrite
-128K
B-3
2th
32file
s-swrite
-1M
B-3
2th
files-cre
ate
-1th
files-cre
ate
-32th
files-re
ad-4
KB
-1th
files-re
ad-4
KB
-32th
files-d
ele
te-1
th
files-d
ele
te-3
2th
Web-se
rver-1
00th
Mail-se
rver-1
6th
File
-serve
r-50th
Rela
tive P
erf
orm
ance
(%
)
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
2K-128-journal-
rel-ddln-sata
2K-128-ordered-
rel-cfq-sata
2K-128-w
rback-
rel-ddln-sata
2K-128-journal-
no-cfq-sas
2K-128-journal-
no-cfq-sata
2K-128-ordered-
no-cfq-sas
2K-512-w
rback-
no-ddln-sata
2K-2K
-wrback-
rel-cfq-sas
2K-2K
-ordered-
no-noop-sas
2K-2K
-ordered-
no-cfq-sas
2K-2K
-wrback-
no-noop-sata
4K-128-journal-
no-cfq-sata
4K-128-w
rback-
no-cfq-sas
4K-512-ordered-
rel-noop-sas
4K-512-ordered-
rel-noop-sata
4K-512-w
rback-
rel-noop-sata
4K-512-journal-
no-cfq-sata
4K-512-journal-
no-ddln-sata
4K-2K
-wrback-
rel-cfq-sas
4K-2K
-wrback-
no-noop-sas
0
500
1000
1500
2000
2500
Y1:
Ran
ge/
Avg.
Y2:
Avg. T
hro
ughput
(IO
PS
)
Y1: Mailserver Fileserver Webserver Y2: Mailserver Fileserver Webserver
[HotStorage’15]
[FAST’17]
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 707/08/2019
Challengesl Storage systems are often affected by many factors
uTunable parameters, workloads, hardware, etc.
l Lack of interpretabilityl Difficult to infuse domain knowledge
Proposed solution: Interactive Visual Analytics
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 807/08/2019
Key Contributionsl Prototyped Interactive Configuration Explorer (ICE)l Demonstrate ICE can help the analysis of storage
performancel ICE will be open-sourced
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 907/08/2019
Related Workl Interactive visual analytics have been successfully applied in
exploring and analyze real-world datasetsuPlotly, Tableau, etc.uParallel Coordinates, Parallel Sets, Data Context Maps, etc.
l Visualization in storage researchuMostly 2D techniques such as histograms, box plots, etc., and 3D
versions such as surface plotsuVisualizing block I/O workloads [Rodeh et al.]
l Other domainsuNetworkuDatabase query optimization
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 1007/08/2019
Data Collectionl Settings
uHardware: 1 Intel Xeon quad-core 2.4GHz CPU, 24GB RAM, 4 drives
uBenchmarks: Filebench§Dbserver, mailserver, fileserver, webserver
l Parameter spacesufile system, inode size, block size, block group, journal
options, mount options, special options, I/O schedulers§ 6,222 unique combinations (over 500k data points collected)
u4 workload × 4 devices§Datasets published (link)
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 1107/08/2019
Outlinel Motivationl Related Workl Data Collectionl Design of ICEl Case Studyl Future Workl Conclusions
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 1207/08/2019
Design of ICEl Lessons from common 2D visualization
uDifficult to analyze multiple factors simultaneouslyuDifficult to infuse domain knowledge
l Lessons from existing interactive visual analyticsuDifficult to visualize categorical parameters [ATC’18]
l Design principlesuDesigned for storage analysisuEasy to use
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 1307/08/2019
Max throughput
90th percentile
75th percentileMedian
25th percentile
10th percentileMin Throughput
Mean throughput
Throughput distribution
(Magenta region)
Max to 90th percentile
90th percentile to 75th percentile
75th percentile to 25th percentile
25th percentile to 10th percentile
10th percentile to Min
Design of ICE (1 of 3)
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 1407/08/2019
Design of ICE (2 of 3)
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 1507/08/2019
Design of ICE (3 of 3)
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 1607/08/2019
Design of ICE (3 of 3)
1
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 1707/08/2019
Design of ICE (3 of 3)
21
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 1807/08/2019
Outlinel Motivationl Related Workl Data Collectionl Design of ICEl Case Studyl Future Workl Conclusions
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 1907/08/2019
ICE Case Studiesl Case Study 1: optimize throughput under a fixed
workloadl Case Study 2: optimize performance stabilityl Case Study 3: optimize with constraintsl Case Study 4: optimize performance stability &
achieve good throughput
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 2007/08/2019
ICE: Case Study 4
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 2107/08/2019
ICE: Case Study 4
1
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 2207/08/2019
ICE: Case Study 4
1 2
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 2307/08/2019
ICE: Case Study 4
1 2 3
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 2407/08/2019
ICE: Case Study 4
1 2 34
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 2507/08/2019
Outlinel Motivationl Related Workl Data Collectionl Design of ICEl Case Studyl Future Workl Conclusions
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 2607/08/2019
Future Workl Scale ICE with hundreds of dimensionsl Support multi-objective analysisl Aid data collection and performance tuningl Apply other visualization techniques for storage and
system analysisuE.g., Context Maps
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 2707/08/2019
Conclusionsl Propose to utilize interactive visual analytics in storage
researchl Prototyped Interactive Configuration Explorer (ICE)l Demonstrated effectiveness of ICEl Make ICE open-source
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research (HotStorage’19) 2807/08/2019
Zhen Cao, Geoff Kuenning, Klaus Mueller, Anjul Tyagi, and Erez Zadok
Graphs Are Not Enough: Using Interactive Visual Analytics in Storage Research