kafka-steaming-data
-
Upload
bryan-jacobs -
Category
Documents
-
view
162 -
download
0
Transcript of kafka-steaming-data
KafkaStreaming Data Platform
Traditional Messaging System• Queue• Topic• After Consumed Removed• Out of order messaging
What is Kafka• Messaging system• Polyglot Consumers / Producers• Topics and Partitions• Scalable• Configurable Message Retention• Guaranteed order
Topic
Use Cases• Ordered Messaging• Log Aggregation• Metrics• Web Activity Tracking• Stream Processing
Kafka Brokers – Clusters and Replication• Topics can be replicated• Data stored across various nodes• Kafka clusters require broker.id=0• Zookeeper• Offsets• Topic names• partitions
Demo – Local Kafka• Startup zookeeper• bin/zookeeper-server-start.sh config/zookeeper.properties
• Start kafka• bin/kafka-server-start.sh config/server.properties
Demo Command line tools• bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-
factor 1 --partitions 1 --topic test• bin/kafka-topics.sh --list --zookeeper localhost:2181• bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test• bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic
test --from-beginning
Example Producer• <CODE>
Example Consumer• <CODE>
Deployment Options• Stand alone deployment • Confluent.io• Horton Works• AWS
HortonWorks Data Platform on AWS
Big Data in a one stop shop
Determine Cluster Sizing• Implement a producer and consumer• Use your data structures• 3 Zookeeper nodes and 3 Kafka nodes• Java Heap = 2GB• Network Saturation (1 gigabit / 10 gigabit)• Avro Data Serialization
Producer for testing throughput• <CODE>
Architectural Possibilities• Streaming data platform• Common interface• High throughput
WARNING• Kafka 0.8.x has a major bug…deletes data• Make sure to use 0.9.0.x
Question & [email protected]