Distributed messaging with Apache Kafka
-
Upload
saumitra-srivastav -
Category
Data & Analytics
-
view
7.277 -
download
0
Transcript of Distributed messaging with Apache Kafka
![Page 1: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/1.jpg)
1
Distributed messaging withApache Kafka
Saumitra Srivastav@_saumitra_
http://www.meetup.com/Bangalore-Apache-Kafka-Group/
![Page 2: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/2.jpg)
2
Introduction
Kafka is a:• distributed• replicated• persistent• partitioned• high throughput• pub-sub
messaging system.
Incubated at LinkedIn. Written in Scala.
![Page 3: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/3.jpg)
3
Demo Application
Twitter stream analytics
![Page 4: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/4.jpg)
4
StreamProducer
Broker-1 Broker-2 Broker-3
Twitter Streaming API
Kafka Cluster
Solr-1
Realtime search
Solr-2 Cassandra-1
Data Store for longer retention
Cassandra-2
Sentiment Analysis
![Page 5: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/5.jpg)
5
Terminology
Topics: categories in which message feed is maintained
Producer: Processes that publish messages to a Kafka topic.
Consumers: processes that subscribe to topics and process the feed of published messages
Brokers: Servers which form a kafka cluster and act as a data transport channel between producers and consumers.
Producer Producer
Consumer Consumer
Broker
Kafka Cluster
Broker Broker
![Page 6: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/6.jpg)
6
Simplified View of a Kafka System
ZookeeperBroker 1 Broker 2 Broker 3
Producer 1 Producer 2
Consumer 1 Consumer 2 Consumer 3
![Page 7: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/7.jpg)
7
Topics and Partitions
TOPIC – 1 (error log)
TOPIC – 2 (security log)
![Page 8: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/8.jpg)
8
Partitions
• Each partition is an ordered, immutable sequence of messages.
• Messages are continuously appended to it.
• Each message in partition is assigned a unique sequential id number called offset.
• Any message in partition can be accessed using this offset.
![Page 9: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/9.jpg)
9
Partitions
• Partition servers 2 purposes:1. Scaling2. Parallelism
• Scaling A topic can be divided into multiple partition, and each partition can be on different servers.
• ParallelismA consumer can consume from multiple partitions at same time(while maintaining ordering guarantee).
![Page 10: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/10.jpg)
10
Distribution & Replication
• The partitions of the log are distributed over Kafka cluster
• Each server handles data and requests for some number of partition
• Each partition is replicated for fault tolerance.
• Each partition has one server which acts as the leader.
• The leader handles all read and write requests for the partition.
• Followers keep replicating the leader.
![Page 11: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/11.jpg)
11
Producers
• Producers publish data to the topics of their choice.
• Producer can choose the topic’s partition to which message should be assigned.
• Partition can be selected in a round robin manner for load balancing.
• Kafka doesn’t care about serialization format. All it need is a byte array.
![Page 12: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/12.jpg)
12
Consumers
• Other messaging systems basically follow 2 models:• Queuing• Publish-Subscribe
• Kafka uses a concept of consumer group which generalizes both these models.
• Consumers label themselves with a consumer group name
• Each message published to a topic, is delivered to one consumer instance, within each subscribing consumer group.
![Page 13: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/13.jpg)
13
Consumers
![Page 14: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/14.jpg)
14
Consumer Groups
ZookeeperBroker 1 Broker 2 Broker 3
Producer 1 Producer 2
Consumer 1 Consumer 2 Consumer 3
Consumer-Group A Consumer-Group B
![Page 15: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/15.jpg)
15
Consumer groups
ZookeeperBroker 1
Topic-1
Broker 2
Topic-1
Broker 3
Topic-1
Producer 1 Producer 2
Consumer 1Consumer-Group A Consumer-Group B
P0 P3 P5 P2 P4
Consumer 2 Consumer 3
![Page 16: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/16.jpg)
16
Message Persistence
• Unlike other messaging system, message are not deleted on consumption.
• Message are retained until a configurable period of time after which they are deleted (even if they are NOT consumed).
• Consumers can re-consume any chunk of older message using message offset.
• Kafka performance is effectively constant with respect to data size, so huge data size is not an issue.
![Page 17: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/17.jpg)
17
DemoRunning a multi-broker kafka cluster
![Page 18: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/18.jpg)
18
Guarantees
1. Ordering guarantee• Messages sent by a producer to a particular topic partition will be
appended in the order they are sent.• A consumer instance sees messages in the order they are stored in the
log.
2. At least once delivery
3. Fault toleranceFor a topic with replication factor N, up to N-1 server failures will not cause any data loss.
4. No corruption of data:• over the network• On the disk
![Page 19: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/19.jpg)
19
DemoConsumer/Producer Java API
![Page 20: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/20.jpg)
20
Misc Design features
1. Stateless broker• Each consumer maintains its own state(offset)
2. Load balancing3. Asynchronous send4. Push/pull model instead of Push/Push5. Consumer Position6. Offline Data Load7. Simple API8. Low Overhead9. Batch send and receive10. No message caching in JVM11. Rely on file system buffering• mostly sequential access patterns
12. Zero-copy transfer: file->socket
![Page 21: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/21.jpg)
21
Use Cases
1. Messaging2. Website Activity Tracking3. Metrics4. Log Aggregation5. Stream Processing
![Page 22: Distributed messaging with Apache Kafka](https://reader031.fdocuments.in/reader031/viewer/2022020203/58f9a973760da3da068b6ebe/html5/thumbnails/22.jpg)
22
Thanks
Website: http://kafka.apache.org/Doc: http://kafka.apache.org/documentation.htmlMailing Lists: [email protected]
Questions?