No data loss pipeline with apache kafka
-
Upload
jiangjie-qin -
Category
Technology
-
view
8.819 -
download
1
Transcript of No data loss pipeline with apache kafka
No Data Loss Pipeline with
Apache KafkaJiangjie (Becket) Qin @ LinkedIn
● Data losso producer.send(record) is called but
record did not end up in consumer as expected
● Message reorderingo send(record1) is called before
send(record2)o record2 shows in broker before record1
doeso matters in cases like DB replication
Data loss and message reordering
Kafka based data pipelineKafka Cluster
(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker
Today’s Agenda:● No data loss● No message reordering● Mirror maker enhancement
○ Customized consumer rebalance listener○ Message handler
Synchronous send is safe but slow...
producer.send(record).get()
ProducerKafka Cluster
(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker
Using asynchronous send with callback can be trickyproducer.send(record,callback)
ProducerKafka Cluster
(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker
Producer can cause data loss when● block.on.buffer.full=false● retries are exhausted● sending message without using acks=all
ProducerKafka Cluster
(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker
Is this good enough?producer.send(record,callback)● block.on.buffer.full=TRUE● retries=Long.MAX_VALUE● acks=all● resend in callback when message send failed
ProducerKafka Cluster
(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker
Message reordering might happen if:● max.in.flight.requests.per.connection > 1, AND ● retries are enabled
ProducerKafka Cluster
(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker
Kafka BrokerProducer
message 0
message 1
message 0 failed
retry message 0
Timeline
Message reordering might also happen if:● producer is closed carelessly
o close producer in user thread, oro close without using close(0)
Producer
RecordAccumulator
Sender Thread
Kafka BrokerTimeline
1.msg 02.callback(msg 0) ack expt.
User Thread
close prod.
3.msg 1
notify
● close producer in the callback on error● close producer with close(0) to prevent further
sending after previous message send failed
Producer
RecordAccumulator
Sender Thread
Kafka BrokerTimeline
1.msg 02.callback(msg 0) ack expt.
User Thread
close(0)
notify
To prevent data loss:● block.on.buffer.full=TRUE● retries=Long.MAX_VALUE (for some use cases)● acks=allTo prevent reordering:● max.in.flight.requests.per.connection=1● close producer in callback with close(0) on send
failure
ProducerKafka Cluster
(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker
Not a perfect solution:● Producer needs to be closed to guarantee
message order. E.g. In mirror maker, one message send failure to a topic should not affect the whole pipeline.
● When producer is down, message in buffer will still be lost
ProducerKafka Cluster
(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker
Correct producer setting is not enough● acks=all still can lose data when
unclean leader election happens.● Two replicas are needed at any time
to guarantee data persistence.
Kafka BrokersKafka Cluster
(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker
● replication factor >= 3● min.isr = 2● Replication factor > min.isr
o If replication factor = min.isr, partition will be offline when one replica is down
Kafka BrokersKafka Cluster
(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker
Settings we use:● replication factor = 3● min.isr = 2● unclean leader election disabled
Kafka BrokersKafka Cluster
(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker
● Consumer might lose message when offsets are committed carelessly. E.g. commit offsets before processing messages completelyo Disable auto.offset.commito Commit offsets only after the messages are
processed
ConsumerKafka Cluster
(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker
Kafka based data pipelineKafka Cluster
(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker
Today’s Agenda:● No data loss● No message reordering● Mirror maker enhancement
○ Customized consumer rebalance listener○ Message handler
● Consume-then-produce pattern● Only commit consumer offsets of
messages acked by target cluster● Default to no-data-loss and no-
reordering settings
Mirror Maker EnhancementKafka Cluster
(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker
● Customized Consumer Rebalance Listenero Can be used to propagate topic change
from source cluster to target cluster. E.g. partition number change, new topic creation.
Mirror Maker EnhancementKafka Cluster
(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker
● Customized Message Handler, useful foro partition-to-partition mirroro filtering out messageso message format conversiono other simple message processing
Mirror Maker EnhancementKafka Cluster
(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker
● startup/shutdown accelerationo parallelized startup and shutdowno 26 nodes cluster with 4 consumer each
takes about 1 min to startup and shutdown
Mirror Maker EnhancementKafka Cluster
(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker
Q&A