How We Built an Event-Time Merge of Two Kafka-Streams with Spark Streaming
Ralf Sigmund, Sebastian SchröderOtto GmbH & Co. KG
Agenda• Who are we?• Our Setup• The Problem• Our Approach• Requirements• Advanced Requirements• Lessons learned
Who are we?• Ralf
– Technical Designer Team Tracking– Cycling, Surfing– Twitter: @sistar_hh
• Sebastian– Developer Team Tracking– Taekwondo, Cycling– Twitter: @Sebasti0n
Who are we working for?
• 2nd largest ecommerce company in Germany• more than 1 million visitors every day• our blog: https://dev.otto.de• our jobs: www.otto.de/jobs
Our Setupotto.de
Varnish Reverse Proxy Tracking-Server
Vertical 1Vertical 0
Vertical n
Service 0
Service 1
Service n
The Problem
otto.de
Varnish Reverse Proxy
Vertical 1Vertical 0
Vertical n
Service 0
Service 1
Service n
Has a limit on the size of messages
Needs to send large tracking messages
Tracking-Server
Our Approach
otto.de
Varnish Reverse Proxy
Tracking-Server
Vertical 1Vertical 0
Vertical n
S0
S1
Sn
Spark Service
Requirements
Messages with the same key are merged
t: 1id: 1
t: 3id: 2
t: 2id: 3
t: 0id: 2
t: 4id: 0
t: 2id: 3
t: 1id: 1
t: 0id: 2
t: 3id: 2
t: 2id: 3
t: 2id: 3 t: 4
id: 0
Messages have to be timed out
t: 1id: 1
t: 3id: 2
t: 2id: 3
t: 0id: 2
t: 4id: 0
t: 2id: 3
t: 1id: 1
t: 0id: 2
t: 3id: 2
t: 2id: 3
t: 2id: 3 t: 4
id: 0
max(business event timestamp)
time-out after 3 sec
Event Streams might get stuck
t: 1id: 1
t: 3id: 2
t: 2id: 3
t: 0id: 2
t: 4id: 0
t: 2id: 3
t: 1id: 1
t: 0id: 2
t: 3id: 2
t: 2id: 3
t: 2id: 3 t: 4
id: 0
clock A
clock B
clock: min(clock A, clock B)
There might be no messages
t: 1id: 1
t: 4id: 2
t: 2id: 3
t: 5heart beat
t: 1id: 1
t: 4id: 2
t: 2id: 3
time-out after 3 sec
t: 5heart beat
clock: min(clock A, clock B) : 4
Requirements Summary• Messages with the same key are merged• Messages are timed out by the combined event
time of both source topics • Timed out messages are sorted by event time
Solution
Merge messages with the same Id• UpdateStateByKey, which is defined on pairwise
RDDs• It applies a function to all new messages and
existing state for a given key• Returns a StateDStream with the resulting
messages
UpdateStateByKey
DStream
[Inpu
tMes
sage
]
DStream
[Merg
edMes
sage
]
1
2
3
1
2
3
History
RDD
[Merg
edMes
sage
]
1
2
3
UpdateFunction: (Seq[InputMessage], Option[MergedMessage]) => Option[MergedMessage]
Merge messages with the same Id
Merge messages with the same Id
Flag And Remove Timed Out Msgs
DStream
[Inpu
tMes
sage
]
DStream
[Merg
edMes
sage
]
1
MMTO2
2
3
History
RDD
[Merg
edMes
sage
]
MM
MMTO MM
TO
Filter (t
imed out)
Filter !(
timed out)
2
3
1 MM
Flag newly timed out
1 MM
3
needs globalBusiness Clock
UpdateFunction
Messages are timed out by event time
• We need access to all messages from the current micro batch and the history (state) => Custom StateDStream
• Compute the maximum event time of both topics and supply it to the updateStateByKey function
Messages are timed out by event time
Messages are timed out by event time
Messages are timed out by event time
Messages are timed out by event time
Advanced Requirements
topi
c B
topi
c A
Effect of different request rates in source topics
10K
5K
0
10K
5K
0
catching upwith 5K per microbatch
data read too early
Handle different request rates in topics
• Stop reading from one topic if the event time of other topic is too far behind
• Store latest event time of each topic and partition on driver
• Use custom KafkaDStream with clamp method which uses the event time
Handle different request rates in topics
Handle different request rates in topics
Handle different request rates in topics
Handle different request rates in topics
Guarantee at least once semantics
o: 1t: a
o: 7 t: a
o: 2t: a
o: 5t: b
o: 9t: b
t: 6t: b
Current State Sink Topic
a-offset: 1b-offset: 5a-offset: 1
b-offset: 5a-offset: 1b-offset: 5
On Restart/RecoveryRetrieve last message
Timeout
Spark Merge Service
Guarantee at least once semantics
Guarantee at least once semantics
Everything put together
Lessons learned• Excellent Performance and Scalability• Extensible via Custom RDDs and Extension to
DirectKafka• No event time windows in Spark Streaming
See: https://github.com/apache/spark/pull/2633
• Checkpointing cannot be used when deploying new artifact versions
• Driver/executor model is powerful, but also complex
THANK YOU.Twitter: @sistar_hh@Sebasti0n
Blog: dev.otto.deJobs: otto.de/jobs