Post on 08-Jan-2017
New Kafka Input Operator based on Kafka 0.9 Consumer API
Siyuan Hua, DataTorrent, Committer Apache ApexMarch 28th 2016
0.8 vs 0.9 Overview
Apache Apex Meetup
0.8 (Simple Consumer) 0.9
LoC 5900 2406
Fault-Tolerant Yes (At least once, exactly once) Yes (At least once, exactly once)
Scalability Scale with Kafka(static and dynamic)
Scale with Kafka(static and dynamic)
Multi-Cluster/Topic Yes Yes
Throughput throttle Yes Yes
Idempotent Yes Yes
2
0.8 vs 0.9 Overview
Apache Apex Meetup
0.8 (Simple Consumer) 0.9
Offset Management Customized management Implicit but out-of-box management
Partition Strategy 1:1, 1:M, Dynamic(Unstable), Customized
1:1, 1:M, Customized
Dependency Both public and internal API Public API
Metrics Using old Counters API Using new Apex @AutoMetric
Maturity Used in production Not mature yet
3
0.8 Consumer problem
Apache Apex Meetup
● 0.8 has Simple Consumer and High-level Consumer
● High-level Consumer couldn’t meet our requirement
● High-level Consumer doesn’t support customized assignor
● Simple Consumer is not easy to use
● Simple Consumer has no knowledge about the brokers
● Simple Consumer doesn’t handle metadata change on broker side automatically
4
0.9 Consumer API
Apache Apex Meetup
● 0.9 replaced Simple Consumer and High-level consumer with subscribe API and assign API and combine them together into one consumer class
● Assign API is good replacement for Simple Consumer in the new Kafka Input Operator
● You can explicitly assign partitions for each operator instance
● Once consumer is assigned it handle partition metadata change internally
● Confused with mixed API in one class
5
Workflow
Apache Apex Meetup
6
Partition Strategy
Apache Apex Meetup
1 to 1 Partition 1 to N Partition
7
Partition Strategy
Apache Apex Meetup
Public abstract class AbstractKafkaPartitioner{
...abstract List<Set<PartitionMeta>> assign(Map<String,
Map<String,List<PartitionInfo>>> metadata)...void partitioned(Map<Integer, Partition<AbstractKafkaInputOperator>>
map)…Response processStats(BatchedOperatorStats batchedOperatorStats)
} Customized Partition Strategy
8
Partition Strategy
Apache Apex Meetup
● Sticky Partition (Each operator instance only consumes from Kafka partitions that are assigned by AM)
● We cannot use subscribe API with customized Assignor
● 0.9 Partition logic is decoupled from Operator code
9
Offset management for 0.9 operator
Apache Apex Meetup
Offset Checkpointing:
W = last offset in window i
W W W
Current offset
Downstream operator window
. . . . . . . . . . . .
Check pointed offsets with window id
Resume from offsets of any window below
i
k ji
10
Offset management for 0.9 operator
Apache Apex Meetup
Offset committed:
W = last offset in window i
. . . . . . . . . . . .
W
Current offset
. . .
Commit Window i
Offset Topic contains App name
Offset is saved in kafka
i
i
11
Some important configuration (0.9)
Apache Apex Meetup
● initialOffset
● topics
● clusters
● strategy
● maxTuplesPerWindow
● initialPartitionCount
● consumerProps
12
MapR Streams support
Apache Apex Meetup
● MapR Streams is compatible with 0.9 Kafka client API
● The 0.9 Input Operator has been tested with MapR sandbox and all major features are working without any code change
● Use MapR Streams Client library instead of Kafka one
● Leave “clusters” property empty because MapR doesn’t require broker host name settings
● Support special character “/” in topic name because MapR Streams topic name is just path to the topic file
● Multi-cluster is not supported
13
Demo 1
Demo 2
Performance : Kafka Input Operator
Apache Apex Meetup
● 4 Kafka Brokers - 8 partitions
● 1 Zookeeper
● Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
● 256GB RAM
● 10 GigE between nodes
Load Generator
Q & A
Apache Apex Meetup
Follow Apex meetups:http://apex.incubator.apache.org/announcements.html
Learn more about Apex:http://apex.incubator.apache.org/docs.html
18
Resources
19
• Apache Apex - http://apex.apache.org/• Subscribe - http://apex.apache.org/community.html• Download - https://www.datatorrent.com/download/• Twitter
ᵒ @ApacheApex; Follow - https://twitter.com/apacheapexᵒ @DataTorrent; Follow – https://twitter.com/datatorrent
• Meetups - http://www.meetup.com/topics/apache-apex• Webinars - https://www.datatorrent.com/webinars/• Videos - https://www.youtube.com/user/DataTorrent• Slides - http://www.slideshare.net/DataTorrent/presentations • Startup Accelerator Program - Full featured enterprise product
ᵒ https://www.datatorrent.com/product/startup-accelerator/
We Are Hiring
20
• jobs@datatorrent.com• Developers/Architects• QA Automation Developers• Information Developers• Build and Release• Community Leaders