A Benchmark Suite for Distributed Stream Processing Systems
-
Upload
maycon-viana-bordin -
Category
Software
-
view
23 -
download
0
Transcript of A Benchmark Suite for Distributed Stream Processing Systems
A Benchmark Suite for Distributed Stream Processing
Systems
Maycon Viana Bordin
Claudio Geyer Advisor
April, 2017
1
2
HUGE amounts of data
are being generated in real-time
3
4
500M tweets
are sent per day
5
6
4.75B shares
4.5B likes
420M status updates
300M photos
EVERY DAY.
7
8
They need to process…
9
They need to process…
large volumes of data
10
They need to process…
large volumes of data
in real-time
11
They need to process…
large volumes of data
in real-time
continuously
12
They need to process…
large volumes of data
in real-time
continuously
producing actionable information
13
14
Stream Processing
15
16
17
B
18
B
19
20
Data Stream
21
B
22
B
23
B 1 2 3 4 5 6 7
24
Data from the stream source may or may not be structured
25
The amount of data is usually unbounded in size
26
The input rate is variable and typically unpredictable
27
There are many platforms on the market
28
Problem:
How to know which platform is better for an specific type
of application?
29
Problem:
Current stream processing benchmarks are composed
mostly of synthetic applications.
30
Problem:
Benchmarks for other Big Data platforms use more real
world applications, e.g. BigDataBench and HiBench.
31
Goals:
32
Specific Goals:
•
•
33
34
Benchmarks for Stream Processing
35
Linear Road Benchmark [Ara04]
•
•
•
36
StreamBench [Lu14]
•
•
•
•
•
37
Yahoo Streaming Benchmark
•
•
•
38
BigDataBench [Wan14]
•
•
•
39
StreamBench [Wan16]
•
•
•
•
•40
RIoTBench [Wan17]
•
•
•
•
41
HiBench [Hua10]
•
•
•
42
Comparison
43
44
Benchmark Architecture
45
46
47
48
49
API
•
•
•
•
50
Metrics
•
•
•
51
Scripts for automation…
•
•
•
•
•
•
•
52
Benchmark Applications
53
54
55
56
•
•
•
•
57
58
59
60
61
62
63
64
65
66
Benchmark Metrics
67
68
69
𝐿𝑎𝑡𝑒𝑛𝑐𝑦 = 𝑇𝑒𝑛𝑑 − 𝑇𝑒𝑛𝑑
70
71
72
73
𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 = 𝑁𝑢𝑚. 𝑃𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑑 𝑇𝑢𝑝𝑙𝑒𝑠
𝑅𝑢𝑛𝑡𝑖𝑚𝑒
74
75
76
77
Comparison with the other Benchmarks
78
79
80
Results Set-up
81
82
•
•
•
•
83
•
•
•
84
85
86
87
n1_x1_x5_x6_x3 n1_x1_x2_x1_x4_x2 n1_x4_x2_x2
n1_x2_x5_x6_x3 n1_x2_x2_x1_x4_x2 n4_x2_x2_x2
n1_x3_x5_x6_x3 n1_x4_x2_x1_x4_x2 n4_x8_x2_x2
n2_x1_x5_x6_x3 n1_x8_x2_x1_x4_x2
n2_x2_x5_x6_x3 n2_x1_x2_x1_x4_x2
n2_x3_x5_x6_x3 n2_x2_x2_x1_x4_x2
n4_x1_x5_x6_x3 n2_x4_x2_x1_x4_x2
n4_x2_x5_x6_x3 n2_x8_x2_x1_x4_x2
n4_x3_x5_x6_x3 n4_x1_x2_x1_x4_x2
n8_x1_x5_x6_x3 n4_x2_x2_x1_x4_x2
n8_x2_x5_x6_x3 n4_x4_x2_x1_x4_x2
n8_x3_x5_x6_x3 n4_x8_x2_x1_x4_x2
n8_x1_x2_x1_x4_x2
n8_x2_x2_x1_x4_x2
n8_x4_x2_x1_x4_x2
n8_x8_x2_x1_x4_x2 88
Results Word Count: Storm
89
90
91
92
93
n8_x4
n4_x2
n4_x2_x10_x12_x6
n2_x1_x5_x6_x3
n1_x2_x5_x6_x3
94
Results Word Count: Spark
95
96
97
98
99
Results Log Processing: Storm
100
101
102
103
n8_x3
n4_x1_x2_x1_x4_x2
n2_x1_x2_x1_x4_x2
n1_x4_x2_x1_x4_x2
104
Results Log Processing: Spark
105
106
107
108
109
Results Traffic Monitoring: Storm
110
111
112
113
Results Traffic Monitoring: Spark
114
115
116
117
118
119
120
121
Conclusion
122
•
•
•
123
•
•
124
Future Work
125
•
•
•
•
126
•
•
Publications
127
A Benchmark Suite for Distributed Stream Processing
Systems
Maycon Viana Bordin
Claudio Geyer Advisor
April, 2017
128
129
130
131
132
133
134