SCIENCE FICTION VS. FANTASY Speculative Fiction. SPECULATIVE FICTION Speculative fiction is science…
Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative...
Transcript of Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative...
![Page 1: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/1.jpg)
Deterministic Model for Distributed Speculative Stream Processing
DISCAN 2018
Igor Kuralenok, Artem Trofimov, Nikita Marshalkin, and Boris Novikov
JetBrains Research, Saint Petersburg State University
![Page 2: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/2.jpg)
Outline
• Deterministic computations
• Stream processing computational model
• Optimistic determinism: drifting state
• Experiments
• Exactly-once on top of determinism
• Yet another experiment
2
![Page 3: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/3.jpg)
Deterministic computations
Given a particular input, the same output will be produced after any number of reruns
For streaming it means: 𝐹 𝐼𝑛 … 𝐼0 : ∀𝑘, 𝐹(𝐼𝑘𝐼𝑘−1 … 𝐼0 ) = 𝐽𝑘
Usually, it is considered as: 𝐹 𝐼𝑛, 𝑆𝑛 : ∀𝑘, ∃𝑆𝑘 = 𝑆 𝐼𝑘−1 … 𝐼0 , 𝐹(𝐼𝑘, 𝑆𝑘) = 𝐽𝑘
3
![Page 4: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/4.jpg)
Why is determinism important?
Determinism is a desired property in many CS areas
• Natural for users (people got used to think sequentially)
• Computations are reproducible and predictable
• Implies consistency [Stonebreaker et al. The 8 requirements of real-time stream processing. ACM SIGMOD Record 2005]
4
![Page 5: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/5.jpg)
Determinism: simple way
Determinism can be easily achieved if
• All computations are sequential
• All transformations are pure functions
5
![Page 6: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/6.jpg)
Stream processing
• Shared-nothing distributed runtime
• Record-at-a-time model
• Latency is a key performance metric
6 6
![Page 7: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/7.jpg)
Determinism in stream processing
• It is considered that determinism is too difficult too achieve
• Systems usually provide low-level interfaces, which do not guarantee any level of determinism
• Trade-off between determinism and latency [Zacheilas et al. Maximizing Determinism in Stream Processing Under Latency Constraints. DEBS 2017]
7
![Page 8: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/8.jpg)
What is about batch processing?
• MapReduce is usually implemented deterministically
• Micro-batching (spark streaming, storm trident) is also deterministic
8
![Page 9: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/9.jpg)
The ultimate question of life, the universe, and everything
Is it possible to combine low-latency and determinism within distributed stream processing?
9
![Page 10: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/10.jpg)
The ultimate question of life, the universe, and everything
Is it possible to combine low-latency and determinism within distributed stream processing?
• In spite of asynchronous distributed processing
10
![Page 11: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/11.jpg)
The ultimate question of life, the universe, and everything
Is it possible to combine low-latency and determinism within distributed stream processing?
• In spite of asynchronous distributed processing
• Avoiding input buffering
11
![Page 12: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/12.jpg)
Outline
• Deterministic computations
• Stream processing computational model
• Optimistic determinism: drifting state
• Experiments
• Exactly-once on top of determinism
• Yet another experiment
12
![Page 13: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/13.jpg)
Dataflow
• Dataflow is a potentially unlimited sequence of data items • Timestamps can be assigned to data items to define an
order • Dataflow is expressed in the form of a graph • Vertices are operations, which are implemented by user-
defined functions • Edges declare an order between operations
13
![Page 14: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/14.jpg)
Physical deployment
14
![Page 15: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/15.jpg)
Outline
• Deterministic computations
• Stream processing computational model
• Optimistic determinism: drifting state
• Experiments
• Exactly-once on top of determinism
• Yet another experiment
15
![Page 16: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/16.jpg)
What do we require to achieve determinism?
• Total order and transformations as pure functions – We can define synthetic order by assigning timestamps at system entry
16
![Page 17: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/17.jpg)
What do we require to achieve determinism?
• Total order and transformations as pure functions – We can define synthetic order by assigning timestamps at system entry
• We need to care about the order only in the operations that are order-sensitive and before output
17
![Page 18: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/18.jpg)
What do we require to achieve determinism?
• Total order and transformations as pure functions – We can define synthetic order by assigning timestamps at system entry
• We need to care about the order only in the operations that are order-sensitive and before output
• Calculations are partitioned, and order between items from different partitions does not influence the result (if they will not be merged)
18
![Page 19: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/19.jpg)
Unrealistic requirement
• Total order preservation
19
![Page 20: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/20.jpg)
Unrealistic requirement
• Total order preservation
• Let’s try to rethink streaming computations
20
![Page 21: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/21.jpg)
Drifting state: idea
21
𝐼𝑘 → Op 𝐼𝑘 , 𝑆𝑘 → 𝐽𝑘
↑ ↓
𝑆𝑘 𝑆𝑘+1
![Page 22: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/22.jpg)
Drifting state: idea
22
𝐼𝑘 → Op 𝐼𝑘 , 𝑆𝑘 → 𝐽𝑘
↑ ↓
𝑆𝑘 𝑆𝑘+1 newState = combine(prevState, newItem) handler.update(newState) return newState
![Page 23: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/23.jpg)
Drifting state: idea
23
𝐼𝑘 → Op 𝐼𝑘 , 𝑆𝑘 → 𝐽𝑘
↑ ↓
𝑆𝑘 𝑆𝑘+1
What if we put state directly into the stream?
𝐼𝑘 , 𝑆𝑘 → Op 𝐼𝑘 , 𝑆𝑘 → 𝐽𝑘 , 𝑆𝑘+1
![Page 24: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/24.jpg)
Drifting state: implementation
• Any stateful transformation can be decomposed into map and windowed grouping operation with a cycle
• Map operation is stateless == order insensitive and pure
• Grouping operation is pure (and even does not contain user-define logic)
24
![Page 25: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/25.jpg)
Drifting state: optimistic grouping
• Grouping operation is pure, but order-sensitive • Buffers before each grouping can increase latency [Li et al.
Out-of-order processing: a new architecture for high-performance stream systems. VLDB 2008]
• Grouping can be implemented optimistically without blocking
25
![Page 26: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/26.jpg)
Drifting state: optimistic grouping
• Grouping operation is pure, but order-sensitive • Buffers before each grouping can increase latency [Li et al.
Out-of-order processing: a new architecture for high-performance stream systems. VLDB 2008]
• Grouping can be implemented optimistically without blocking
26
![Page 27: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/27.jpg)
Drifting state: the only buffer
• Optimistic approach produces invalid items • Invalid items must be filtered out before they are
sent to consumer • Punctuations (low watermarks) allow releasing
items
27
![Page 28: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/28.jpg)
Something is wrong here
• Dataflow graphs are cyclic
• It is unclear how to send low watermarks through the cycles
28
![Page 29: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/29.jpg)
Implementation notes: acker
29
![Page 30: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/30.jpg)
Implementation notes: modified acker
30
![Page 31: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/31.jpg)
Discussion: drifting state pros
• Determinism is closer than you think! • We moved from “state as a special item” to “state
as an ordinary item” – Business-logic becomes stateless – All guarantees that system provides regarding
ordinary items are satisfied for state – Any stateful dataflow graph can be expressed using
drifting state model – Transformations can be non-commutative, but should
be pure – Single buffer before output per dataflow graph
31
![Page 32: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/32.jpg)
Discussion: drifting state cons
• It is harder to write code
– A need for a convenient API
• Optimistic technique can potentially generate a lot of additional items
• How does drifting state behaves within real-world problem?
32
![Page 33: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/33.jpg)
Outline
• Deterministic computations
• Stream processing computational model
• Optimistic determinism: drifting state
• Experiments
• Exactly-once on top of determinism
• Yet another experiment
33
![Page 34: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/34.jpg)
Experiments: prototype
• FlameStream [https://github.com/flame-stream]
• Java + Akka, Zookeper
34
![Page 35: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/35.jpg)
Experiments: task
Incremental inverted index building
– Requires stateful operations
– Contains network shuffle
– Workload is unbalanced due to Zipf’s law
35
![Page 36: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/36.jpg)
Experiments: setup
• 10 EC2 micro instances
– 1 GB RAM
– 1 core CPU
• Wikipedia documents as a dataset
36
![Page 37: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/37.jpg)
Experiments: overhead
37
![Page 38: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/38.jpg)
Experiments: latency scalability
38
![Page 39: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/39.jpg)
Experiments: throughput scalability
39 Nodes
Do
cum
ents
/sec
![Page 40: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/40.jpg)
Experiments: comparison with conservative approach
40
• Posting lists update is order-sensitive operation
• Buffer elements before this operation
• Buffer is flushed on low watermarks
• Low watermarks are sent after each input element to minimize overhead
• [Li et al. Out-of-order processing: a new architecture for high-performance stream systems. VLDB 2008]
• Apache Flink as stream processing system
![Page 41: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/41.jpg)
Experiments: comparison with conservative approach (at most once)
41
10 nodes 5 nodes
![Page 42: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/42.jpg)
Experiments: comparison with conservative approach (at most once)
42
10 nodes 5 nodes
![Page 43: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/43.jpg)
Experiments: comparison with conservative approach
43
![Page 44: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/44.jpg)
Experiments: comparison with conservative approach
44
Do
cum
ents
/sec
Nodes
![Page 45: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/45.jpg)
Drifting state: conclusion
• Determinism and low-latency is achieved
• Overhead is low
• Throughput is not significantly degraded
• Model is suitable for any stateful dataflows
– If all transformations are pure
45
![Page 46: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/46.jpg)
• Map is not pure? – Determinism is lost by definition – Correctness is lost
• Multiple input nodes?
– Timestamp = timestamp@node_id – Latency ≥ out of sync time – Possible to sync in ~10ms
• Acker fails? – Replication – Separate ackers for timestamp ranges
What if…
46
![Page 47: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/47.jpg)
Outline
• Deterministic computations
• Stream processing computational model
• Optimistic determinism: drifting state
• Experiments
• Exactly-once on top of determinism
• Yet another experiment
47
![Page 48: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/48.jpg)
What do we need to add to achieve exactly-once?
• Input replay
• Restore consistent state in groupings
• Deduplicate items only at the barrier
48
![Page 49: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/49.jpg)
What do we need to add to achieve exactly-once?
• Input replay
• Restore consistent state in groupings
• Deduplicate items only at the barrier – Output items atomically
– Sink stores timestamp of the last received item
– Simply compare timestamps!
49
![Page 50: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/50.jpg)
Exactly-once: discussion
• Pure streaming
• Deduplication only at the barrier
• Snapshotting and outputting are independent (are not connected into transaction)
50
![Page 51: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/51.jpg)
Exactly-once: roadmap
51
![Page 52: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/52.jpg)
Outline
• Deterministic computations
• Stream processing computational model
• Optimistic determinism: drifting state
• Experiments
• Exactly-once on top of determinism
• Yet another experiment
52
![Page 53: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/53.jpg)
Experiments: latency (50 ms between checkpoints)
53
50 ms between state snapshots
![Page 54: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/54.jpg)
Experiments: latency (1000 ms between checkpoints)
54
1000 ms between state snapshots
![Page 55: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/55.jpg)
Experiments: throughput
55
Do
cum
ents
/sec
Nodes
![Page 56: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/56.jpg)
Conclusions
• Single extra requirement: all transformations are pure
• Results look promising
• A lot of work
– Understand properties and limitations
– Real-life deployment
• We are open for collaboration
56
![Page 57: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/57.jpg)
Future work
• Real-life deployment
• Efficient determinism and exactly-once can be used for system-level acceptance testing
– [Trofimov. Consistency maintenance in distributed analytical stream processing. ADBIS DC 2018]
57
![Page 58: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.fdocuments.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/58.jpg)
Papers
• Kuralenok et al. FlameStream: Model and Runtime for Distributed Stream Processing. BeyondMR@SIGMOD 2018
• Kuralenok et al. Deterministic model for distributed speculative stream processing. ADBIS 2018
• Long paper about exactly-once is on the anvil
58