Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream...
Transcript of Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream...
![Page 1: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/1.jpg)
Kafka in Production
Andrey Panasyuk, @defascat
![Page 2: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/2.jpg)
Introduction
2
![Page 3: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/3.jpg)
Remote CallsTypes
1. Synchronous calls2. Asynchronous calls
Limitations
1. Peer-to-Peer2. Retries3. Load balancing4. Durability5. Backpressure
3
![Page 4: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/4.jpg)
Message Queues1. External tool2. Asynchronous communication protocol
4
![Page 5: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/5.jpg)
Lets get to Kafka!!!
5
![Page 6: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/6.jpg)
Apache KafkaApache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is essentially a "massively scalable pub/sub message queue architected as a distributed transaction log".
Wikipedia
6
![Page 9: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/9.jpg)
Concepts. Data Flow
9
![Page 10: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/10.jpg)
Concepts. Distributed Log
10
![Page 13: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/13.jpg)
Concepts. Architecture
13
![Page 14: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/14.jpg)
I’ve heard in other presentations. Lets get to it!
14
![Page 15: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/15.jpg)
Kafka. Controller1. One of brokers2. Managing state of partitions3. Managing state of replicas4. Partitions manipulations5. High-availability
15
![Page 16: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/16.jpg)
Kafka + ZooKeeper1. Cluster membership2. Electing leader3. Topic configuration4. Offsets for a Group/Topic/Partition combination
16
![Page 17: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/17.jpg)
Kafka. Guarantees1. Delivery guarantees
a. At least once (by default)b. At most oncec. Exactly once
2. Fault-tolerance vs latencya. No ackb. Acks from leaderc. Acks from followers
3. Message order in a single partition
17
![Page 18: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/18.jpg)
Kafka. Adding a broker 1. Adds a new machine into ISR2. Starts rebalancing partitions (if automatic rebalance enabled)
a. Too much partitions can cause an issue3. Notifies consumers4. Notifies producers
18
![Page 19: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/19.jpg)
Kafka. Failure Scenarios1. In-Sync-Replicas2. Leader election3. CAP
a. Partition Toleranceb. Availabilityc. Consistency*
19
![Page 20: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/20.jpg)
I’m a Java Developer. Show me the code!
20
![Page 21: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/21.jpg)
Kafka. ProducerProperties properties = new Properties();
properties.setProperty(" bootstrap.servers", brokers);
properties.setProperty("key.serializer","o.a.k.c.s.StringSerializer");
properties.setProperty("value.serializer","o.a.k.c.s.StringSerializer");
KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
KeyedMessage<String, String> data = new KeyedMessage<>( "sync", userId, steps);
producer.send(data);
21
![Page 22: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/22.jpg)
Kafka. Real-world Producers1. Topic name validation2. Adding metrics3. Adding default metadata
22
![Page 23: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/23.jpg)
Kafka. Message availability
23
![Page 24: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/24.jpg)
Kafka. ConsumerProperties properties = new Properties();
properties.setProperty(" bootstrap.servers", brokers);
properties.setProperty("key.deserializer","o.a.k.c.s.StringDeserializer");
properties.setProperty("value.deserializer","o.a.k.c.s.StringDeserializer");
properties.setProperty(" group.id", groupId);
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(properties);
consumer.subscribe(“sync”);
while(true) {
consumer.poll(100)
.forEach(r -> System.out.println(r.key() + ": " + r.value());
}24
![Page 25: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/25.jpg)
Kafka. Real-world Consumers1. Metrics2. Invalid message queue3. Separating message processing in KafkaMessageProcessor4. Different implementations
a. 1 thread for all partitions vs 1 thread per 1 partitionb. Autocommitc. Poll periodsd. Batch supporte. Rebalancing considerations
25
![Page 26: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/26.jpg)
Kafka. Serializationpublic interface Deserializer<T> {
public void configure(Map<String, ?> configs, boolean isKey);public T deserialize(String topic, byte[] data);public void close();
}
public interface Serializer<T> {public void configure(Map<String, ?> configs, boolean isKey);public byte[] serialize(String topic, T data);public void close();
}
26
![Page 27: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/27.jpg)
Kafka. Consumer Failure1. Wait for ZooKeeper timeout2. Controller processes event from ZooKeeper3. Controller notifies consumers4. Consumers select new partition consumer
27
![Page 28: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/28.jpg)
Do you really have all this mess working?
28
![Page 29: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/29.jpg)
Kafka. Corporate Challenge Usages1. User Sync Processing2. Analytics
29
![Page 30: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/30.jpg)
Kafka. Our Deployment1. Yahoo kafka-manager
2. MirrorMaker
30
![Page 31: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/31.jpg)
Kafka. Practices1. Topics manually created on prod, automatically on QA envs2. Do not delete topics (KAFKA-1397, KAFKA-2937, KAFKA-4834, ...)3. IMQ implementation4. Use identical versions on all brokers
31
![Page 32: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/32.jpg)
Kafka. Tuning1. 20-100 brokers per cluster; hard limit of 10,000 partitions per cluster (Netflix)2. Increase replica.lag.time.max.ms and replica.lag.max.messages3. Increase num.replica.fetchers4. Reduce retention5. Increase rebalance.max.retries, rebalance.backoff.ms
32
![Page 33: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/33.jpg)
Monitoring And Alerting1. Consumer metrics2. Producer metrics3. Kafka Broker metrics4. Zookeeper metrics5. PagerDuty alerts
33
![Page 34: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/34.jpg)
Current State. Message Input Rate
34
![Page 35: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/35.jpg)
Current State. Producer Latency
35
![Page 36: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/36.jpg)
Lets wrap this up!
36
![Page 37: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/37.jpg)
Kafka. Extension Points● Storages
○ Amazon S3 (Sink)○ Files (Source)○ Elasticsearch (Sink)○ HDFS (Sink)○ JDBC (Source, Sink)○ C* (Sink)○ PostgreSQL (Sink)○ Oracle/MySQL/MSSQL (Sink)○ Vertica (Source, Sink)○ Ignite (Source, Sink)
37
● Protocols/Queues○ MQTT (Source)○ SQS (Source)○ JMS (Sink)○ RabbitMQ (Source)
● Others○ Mixpanel (Sink)
![Page 38: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/38.jpg)
Alternatives. ActiveMQ1. Pros
a. Simplicityb. Way more rich features (standard protocols, TTLs, in-memory)c. DLQd. Extension points
2. Consa. Delivery guaranteesb. Loosing messages under high loadc. Failure Handling scenariosd. Throughput in transactional mode
38
![Page 39: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/39.jpg)
Alternatives. RabbitMQ● Pros
○ Simpler to start○ More features
■ Ability to query/filter■ Federated queues■ Sophisticated routing
○ Plugins● Cons
○ Scales vertically mostly○ Consumers are mostly online assumption○ Delivery guarantees are less rich
39
![Page 40: Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java.](https://reader030.fdocuments.in/reader030/viewer/2022040112/5ec88cdf3a33f068f424263a/html5/thumbnails/40.jpg)
Kafka. Strengths and Weaknesses1. Strengths
a. Horizontal scalabilityb. Rich delivery guarantee modelsc. Disk persistance
2. Weaknessesa. Need for ZooKeeperb. Lack of any kind of backpressurec. Lack of useful features othe queues havrd. Lack of any kind of DLQe. Limited number of extension pointsf. Complex internal protocolsg. Too smart clients
40