A Benchmark Suite for Distributed Stream Processing Systems

134
A Benchmark Suite for Distributed Stream Processing Systems Maycon Viana Bordin Claudio Geyer Advisor April, 2017 1

Transcript of A Benchmark Suite for Distributed Stream Processing Systems

Page 1: A Benchmark Suite for Distributed Stream Processing Systems

A Benchmark Suite for Distributed Stream Processing

Systems

Maycon Viana Bordin

Claudio Geyer Advisor

April, 2017

1

Page 2: A Benchmark Suite for Distributed Stream Processing Systems

2

Page 3: A Benchmark Suite for Distributed Stream Processing Systems

HUGE amounts of data

are being generated in real-time

3

Page 4: A Benchmark Suite for Distributed Stream Processing Systems

4

Page 5: A Benchmark Suite for Distributed Stream Processing Systems

500M tweets

are sent per day

5

Page 6: A Benchmark Suite for Distributed Stream Processing Systems

6

Page 7: A Benchmark Suite for Distributed Stream Processing Systems

4.75B shares

4.5B likes

420M status updates

300M photos

EVERY DAY.

7

Page 8: A Benchmark Suite for Distributed Stream Processing Systems

8

Page 9: A Benchmark Suite for Distributed Stream Processing Systems

They need to process…

9

Page 10: A Benchmark Suite for Distributed Stream Processing Systems

They need to process…

large volumes of data

10

Page 11: A Benchmark Suite for Distributed Stream Processing Systems

They need to process…

large volumes of data

in real-time

11

Page 12: A Benchmark Suite for Distributed Stream Processing Systems

They need to process…

large volumes of data

in real-time

continuously

12

Page 13: A Benchmark Suite for Distributed Stream Processing Systems

They need to process…

large volumes of data

in real-time

continuously

producing actionable information

13

Page 14: A Benchmark Suite for Distributed Stream Processing Systems

14

Page 15: A Benchmark Suite for Distributed Stream Processing Systems

Stream Processing

15

Page 16: A Benchmark Suite for Distributed Stream Processing Systems

16

Page 17: A Benchmark Suite for Distributed Stream Processing Systems

17

Page 18: A Benchmark Suite for Distributed Stream Processing Systems

B

18

Page 19: A Benchmark Suite for Distributed Stream Processing Systems

B

19

Page 20: A Benchmark Suite for Distributed Stream Processing Systems

20

Page 21: A Benchmark Suite for Distributed Stream Processing Systems

Data Stream

21

Page 22: A Benchmark Suite for Distributed Stream Processing Systems

B

22

Page 23: A Benchmark Suite for Distributed Stream Processing Systems

B

23

Page 24: A Benchmark Suite for Distributed Stream Processing Systems

B 1 2 3 4 5 6 7

24

Page 25: A Benchmark Suite for Distributed Stream Processing Systems

Data from the stream source may or may not be structured

25

Page 26: A Benchmark Suite for Distributed Stream Processing Systems

The amount of data is usually unbounded in size

26

Page 27: A Benchmark Suite for Distributed Stream Processing Systems

The input rate is variable and typically unpredictable

27

Page 28: A Benchmark Suite for Distributed Stream Processing Systems

There are many platforms on the market

28

Page 29: A Benchmark Suite for Distributed Stream Processing Systems

Problem:

How to know which platform is better for an specific type

of application?

29

Page 30: A Benchmark Suite for Distributed Stream Processing Systems

Problem:

Current stream processing benchmarks are composed

mostly of synthetic applications.

30

Page 31: A Benchmark Suite for Distributed Stream Processing Systems

Problem:

Benchmarks for other Big Data platforms use more real

world applications, e.g. BigDataBench and HiBench.

31

Page 32: A Benchmark Suite for Distributed Stream Processing Systems

Goals:

32

Page 33: A Benchmark Suite for Distributed Stream Processing Systems

Specific Goals:

33

Page 34: A Benchmark Suite for Distributed Stream Processing Systems

34

Page 35: A Benchmark Suite for Distributed Stream Processing Systems

Benchmarks for Stream Processing

35

Page 36: A Benchmark Suite for Distributed Stream Processing Systems

Linear Road Benchmark [Ara04]

36

Page 37: A Benchmark Suite for Distributed Stream Processing Systems

StreamBench [Lu14]

37

Page 38: A Benchmark Suite for Distributed Stream Processing Systems

Yahoo Streaming Benchmark

38

Page 39: A Benchmark Suite for Distributed Stream Processing Systems

BigDataBench [Wan14]

39

Page 40: A Benchmark Suite for Distributed Stream Processing Systems

StreamBench [Wan16]

•40

Page 41: A Benchmark Suite for Distributed Stream Processing Systems

RIoTBench [Wan17]

41

Page 42: A Benchmark Suite for Distributed Stream Processing Systems

HiBench [Hua10]

42

Page 43: A Benchmark Suite for Distributed Stream Processing Systems

Comparison

43

Page 44: A Benchmark Suite for Distributed Stream Processing Systems

44

Page 45: A Benchmark Suite for Distributed Stream Processing Systems

Benchmark Architecture

45

Page 46: A Benchmark Suite for Distributed Stream Processing Systems

46

Page 47: A Benchmark Suite for Distributed Stream Processing Systems

47

Page 48: A Benchmark Suite for Distributed Stream Processing Systems

48

Page 49: A Benchmark Suite for Distributed Stream Processing Systems

49

Page 50: A Benchmark Suite for Distributed Stream Processing Systems

API

50

Page 51: A Benchmark Suite for Distributed Stream Processing Systems

Metrics

51

Page 52: A Benchmark Suite for Distributed Stream Processing Systems

Scripts for automation…

52

Page 53: A Benchmark Suite for Distributed Stream Processing Systems

Benchmark Applications

53

Page 54: A Benchmark Suite for Distributed Stream Processing Systems

54

Page 55: A Benchmark Suite for Distributed Stream Processing Systems

55

Page 56: A Benchmark Suite for Distributed Stream Processing Systems

56

Page 57: A Benchmark Suite for Distributed Stream Processing Systems

57

Page 58: A Benchmark Suite for Distributed Stream Processing Systems

58

Page 59: A Benchmark Suite for Distributed Stream Processing Systems

59

Page 60: A Benchmark Suite for Distributed Stream Processing Systems

60

Page 61: A Benchmark Suite for Distributed Stream Processing Systems

61

Page 62: A Benchmark Suite for Distributed Stream Processing Systems

62

Page 63: A Benchmark Suite for Distributed Stream Processing Systems

63

Page 64: A Benchmark Suite for Distributed Stream Processing Systems

64

Page 65: A Benchmark Suite for Distributed Stream Processing Systems

65

Page 66: A Benchmark Suite for Distributed Stream Processing Systems

66

Page 67: A Benchmark Suite for Distributed Stream Processing Systems

Benchmark Metrics

67

Page 68: A Benchmark Suite for Distributed Stream Processing Systems

68

Page 69: A Benchmark Suite for Distributed Stream Processing Systems

69

Page 70: A Benchmark Suite for Distributed Stream Processing Systems

𝐿𝑎𝑡𝑒𝑛𝑐𝑦 = 𝑇𝑒𝑛𝑑 − 𝑇𝑒𝑛𝑑

70

Page 71: A Benchmark Suite for Distributed Stream Processing Systems

71

Page 72: A Benchmark Suite for Distributed Stream Processing Systems

72

Page 73: A Benchmark Suite for Distributed Stream Processing Systems

73

Page 74: A Benchmark Suite for Distributed Stream Processing Systems

𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 = 𝑁𝑢𝑚. 𝑃𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑑 𝑇𝑢𝑝𝑙𝑒𝑠

𝑅𝑢𝑛𝑡𝑖𝑚𝑒

74

Page 75: A Benchmark Suite for Distributed Stream Processing Systems

75

Page 76: A Benchmark Suite for Distributed Stream Processing Systems

76

Page 77: A Benchmark Suite for Distributed Stream Processing Systems

77

Page 78: A Benchmark Suite for Distributed Stream Processing Systems

Comparison with the other Benchmarks

78

Page 79: A Benchmark Suite for Distributed Stream Processing Systems

79

Page 80: A Benchmark Suite for Distributed Stream Processing Systems

80

Page 81: A Benchmark Suite for Distributed Stream Processing Systems

Results Set-up

81

Page 82: A Benchmark Suite for Distributed Stream Processing Systems

82

Page 83: A Benchmark Suite for Distributed Stream Processing Systems

83

Page 84: A Benchmark Suite for Distributed Stream Processing Systems

84

Page 85: A Benchmark Suite for Distributed Stream Processing Systems

85

Page 86: A Benchmark Suite for Distributed Stream Processing Systems

86

Page 87: A Benchmark Suite for Distributed Stream Processing Systems

87

Page 88: A Benchmark Suite for Distributed Stream Processing Systems

n1_x1_x5_x6_x3 n1_x1_x2_x1_x4_x2 n1_x4_x2_x2

n1_x2_x5_x6_x3 n1_x2_x2_x1_x4_x2 n4_x2_x2_x2

n1_x3_x5_x6_x3 n1_x4_x2_x1_x4_x2 n4_x8_x2_x2

n2_x1_x5_x6_x3 n1_x8_x2_x1_x4_x2

n2_x2_x5_x6_x3 n2_x1_x2_x1_x4_x2

n2_x3_x5_x6_x3 n2_x2_x2_x1_x4_x2

n4_x1_x5_x6_x3 n2_x4_x2_x1_x4_x2

n4_x2_x5_x6_x3 n2_x8_x2_x1_x4_x2

n4_x3_x5_x6_x3 n4_x1_x2_x1_x4_x2

n8_x1_x5_x6_x3 n4_x2_x2_x1_x4_x2

n8_x2_x5_x6_x3 n4_x4_x2_x1_x4_x2

n8_x3_x5_x6_x3 n4_x8_x2_x1_x4_x2

n8_x1_x2_x1_x4_x2

n8_x2_x2_x1_x4_x2

n8_x4_x2_x1_x4_x2

n8_x8_x2_x1_x4_x2 88

Page 89: A Benchmark Suite for Distributed Stream Processing Systems

Results Word Count: Storm

89

Page 90: A Benchmark Suite for Distributed Stream Processing Systems

90

Page 91: A Benchmark Suite for Distributed Stream Processing Systems

91

Page 92: A Benchmark Suite for Distributed Stream Processing Systems

92

Page 93: A Benchmark Suite for Distributed Stream Processing Systems

93

Page 94: A Benchmark Suite for Distributed Stream Processing Systems

n8_x4

n4_x2

n4_x2_x10_x12_x6

n2_x1_x5_x6_x3

n1_x2_x5_x6_x3

94

Page 95: A Benchmark Suite for Distributed Stream Processing Systems

Results Word Count: Spark

95

Page 96: A Benchmark Suite for Distributed Stream Processing Systems

96

Page 97: A Benchmark Suite for Distributed Stream Processing Systems

97

Page 98: A Benchmark Suite for Distributed Stream Processing Systems

98

Page 99: A Benchmark Suite for Distributed Stream Processing Systems

99

Page 100: A Benchmark Suite for Distributed Stream Processing Systems

Results Log Processing: Storm

100

Page 101: A Benchmark Suite for Distributed Stream Processing Systems

101

Page 102: A Benchmark Suite for Distributed Stream Processing Systems

102

Page 103: A Benchmark Suite for Distributed Stream Processing Systems

103

Page 104: A Benchmark Suite for Distributed Stream Processing Systems

n8_x3

n4_x1_x2_x1_x4_x2

n2_x1_x2_x1_x4_x2

n1_x4_x2_x1_x4_x2

104

Page 105: A Benchmark Suite for Distributed Stream Processing Systems

Results Log Processing: Spark

105

Page 106: A Benchmark Suite for Distributed Stream Processing Systems

106

Page 107: A Benchmark Suite for Distributed Stream Processing Systems

107

Page 108: A Benchmark Suite for Distributed Stream Processing Systems

108

Page 109: A Benchmark Suite for Distributed Stream Processing Systems

109

Page 110: A Benchmark Suite for Distributed Stream Processing Systems

Results Traffic Monitoring: Storm

110

Page 111: A Benchmark Suite for Distributed Stream Processing Systems

111

Page 112: A Benchmark Suite for Distributed Stream Processing Systems

112

Page 113: A Benchmark Suite for Distributed Stream Processing Systems

113

Page 114: A Benchmark Suite for Distributed Stream Processing Systems

Results Traffic Monitoring: Spark

114

Page 115: A Benchmark Suite for Distributed Stream Processing Systems

115

Page 116: A Benchmark Suite for Distributed Stream Processing Systems

116

Page 117: A Benchmark Suite for Distributed Stream Processing Systems

117

Page 118: A Benchmark Suite for Distributed Stream Processing Systems

118

Page 119: A Benchmark Suite for Distributed Stream Processing Systems

119

Page 120: A Benchmark Suite for Distributed Stream Processing Systems

120

Page 121: A Benchmark Suite for Distributed Stream Processing Systems

121

Page 122: A Benchmark Suite for Distributed Stream Processing Systems

Conclusion

122

Page 123: A Benchmark Suite for Distributed Stream Processing Systems

123

Page 124: A Benchmark Suite for Distributed Stream Processing Systems

124

Page 125: A Benchmark Suite for Distributed Stream Processing Systems

Future Work

125

Page 126: A Benchmark Suite for Distributed Stream Processing Systems

126

Page 127: A Benchmark Suite for Distributed Stream Processing Systems

Publications

127

Page 128: A Benchmark Suite for Distributed Stream Processing Systems

A Benchmark Suite for Distributed Stream Processing

Systems

Maycon Viana Bordin

Claudio Geyer Advisor

April, 2017

128

Page 129: A Benchmark Suite for Distributed Stream Processing Systems

129

Page 130: A Benchmark Suite for Distributed Stream Processing Systems

130

Page 131: A Benchmark Suite for Distributed Stream Processing Systems

131

Page 132: A Benchmark Suite for Distributed Stream Processing Systems

132

Page 133: A Benchmark Suite for Distributed Stream Processing Systems

133

Page 134: A Benchmark Suite for Distributed Stream Processing Systems

134