Post on 05-Apr-2017
1
Data Pipelines Made Simple With Apache KafkaEwen Cheslack-PostavaEngineer, Apache Kafka Committer
2
Attend the whole series!
Simplify Governance for Streaming Data in Apache KafkaDate: Thursday, April 6, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Gwen Shapira, Product Manager, Confluent
Using Apache Kafka to Analyze Session WindowsDate: Thursday, March 30, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Michael Noll, Product Manager, Confluent
Monitoring and Alerting Apache Kafka with Confluent Control CenterDate: Thursday, March 16, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Nick Dearden, Director, Engineering and Product
Data Pipelines Made Simple with Apache KafkaDate: Thursday, March 23, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Ewen Cheslack-Postava, Engineer, Confluent
https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/
What’s New in Apache Kafka 0.10.2 and Confluent 3.2
Date: Thursday, March 9, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Clarke Patterson, Senior Director, Product Marketing
3
The Challenge: Streaming Data Pipelines
4
Simplifying Streaming Data Pipelines with Apache Kafka
5
Kafka Connect
6
Streaming ETL
7
Single Message Transforms for Kafka Connect
Modify events before storing in Kafka:• Mask sensitive information
• Add identifiers• Tag events
• Store lineage • Remove unnecessary columns
Modify events going out of Kafka:• Route high priority events to
faster data stores• Direct events to different
Elasticsearch indexes• Cast data types to match
destination
• Remove unnecessary columns
8
Where Single Message Transforms Fit In
9
Built-in Transformations
• InsertField – Add a field using either static data or record metadata• ReplaceField – Filter or rename fields• MaskField – Replace field with valid null value for the type (0, empty string, etc)• ValueToKey – Set the key to one of the value’s fields• HoistField – Wrap the entire event as a single field inside a Struct or a Map• ExtractField – Extract a specific field from Struct and Map and include only this field in results• SetSchemaMetadata – modify the schema name or version• TimestampRouter – Modify the topic of a record based on original topic and timestamp. Useful
when using a sink that needs to write to different tables or indexes based on timestamps• RegexpRouter – modify the topic of a record based on original topic, replacement string and a
regular expression
10
Configuring Single Message Transforms
name=local-file-sourceconnector.class=FileStreamSourcetasks.max=1file=test.txttopic=connect-testtransforms=MakeMap,InsertSourcetransforms.MakeMap.type=org.apache.kafka.connect.transforms.HoistField$Valuetransforms.MakeMap.field=linetransforms.InsertSource.type=org.apache.kafka.connect.transforms.InsertField$Valuetransforms.InsertSource.static.field=data_sourcetransforms.InsertSource.static.value=test-file-source
11
Why only single messages?
• Delivery guarantees!• Always provide at least once semantics• For supported connectors, provide exactly once semantics
• No additional complication: transformations happens inline with import/export
12
When should I use each tool?
Kafka Connect & Single Message Transforms• Simple, message at a time• Transformation can be performed inline• Transformation does not interact with
external systems
Kafka Streams• Complex transformations including
• Aggregations• Windowing• Joins
• Transformed data stored back in Kafka, enabling reuse
• Write, deploy, and monitor a Java application
13
Conclusion
Single Message Transforms in Kafka Connect• Lightweight transformation of individual messages• Configuration-only data pipelines• Pluggable, with lots of built-in transformations
14
Attend the whole series!
Simplify Governance for Streaming Data in Apache KafkaDate: Thursday, April 6, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Gwen Shapira, Product Manager, Confluent
Using Apache Kafka to Analyze Session WindowsDate: Thursday, March 30, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Michael Noll, Product Manager, Confluent
Monitoring and Alerting Apache Kafka with Confluent Control CenterDate: Thursday, March 16, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Nick Dearden, Director, Engineering and Product
Data Pipelines Made Simple with Apache KafkaDate: Thursday, March 23, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Ewen Cheslack-Postava, Engineer, Confluent
https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/
What’s New in Apache Kafka 0.10.2 and Confluent 3.2
Date: Thursday, March 9, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Clarke Patterson, Senior Director, Product Marketing
15
Get Started with Apache Kafka Today!
https://www.confluent.io/downloads/
THE place to start with Apache Kafka!
Thoroughly tested and quality assured
More extensible developer experience
Easy upgrade path to Confluent Enterprise
16
Discount code: kafcom17 Use the Apache Kafka community discount code to get $50 off www.kafka-summit.orgKafka Summit New York: May 8Kafka Summit San Francisco: August 28
Presented by