Kafka & Hadoop
Gwen Shapira / Software Engineer
2©2014 Cloudera, Inc. All rights reserved.
• 15 years of moving data around• Formerly consultant• Now Cloudera Engineer:– Flume– Sqoop– Kafka
About Me
3©2014 Cloudera, Inc. All rights reserved.
There’s a book on that!
4©2014 Cloudera, Inc. All rights reserved.
We are also blogging
5©2014 Cloudera, Inc. All rights reserved.
Getting Data from Kafka to Hadoop
There are only bad options.
It's about finding the best one.
6©2014 Cloudera, Inc. All rights reserved.
Camus
7©2014 Cloudera, Inc. All rights reserved.
Camus
ZooKeeper
Setup
Topic Offsets
Pro
cess
es
HD
FSO
ther
Syst
em
s
TaskTask
Task
In process Avro Files
In process Avro Files
Audit Counts
Clean Up
Kakfa
B
A
C
D
F
G H
I
E
8©2014 Cloudera, Inc. All rights reserved.
• Kafka has no MR layer– InputFormat, OutputFormat, Utils…
• Sqoop is a generic batch ingest framework– Why no Kafka?
Missing in Action
9©2014 Cloudera, Inc. All rights reserved.
Flume + Kafka = Flafka
10
Sources Interceptors Selectors Channels Sinks
Flume Agent
How does work?Twitter, logs, webserver,
Kafka…
Mask, re-format,
validate…DR, critical
Memory, file
HDFS, Hbase,
Solr, Kafka
11
But I just want to get data from Kafka to Hbase / HDFS
©2014 Cloudera, Inc. All rights reserved.
12
Channels Sinks
Flume Agent
Kafka ChannelKafka! HDFS,
Hbase, Solr
13©2014 Cloudera, Inc. All rights reserved.
SparkStreaming
Single Pass
SourceRawInputDStream
RDD
SourceRawInputDStream
RDD
RDD
Filter Count Print
SourceRawInputDStream
RDD
RDD
RDD
Single Pass
Filter Count Print
Pre-first Batch
First Batch
Second Batch
14©2014 Cloudera, Inc. All rights reserved.
Storm
Spout
Source
Split wordsbolts
Split wordsbolts
Spout
Split wordsbolts
Split wordsbolts
Count
Count
Count
Spout Layer Fan out Layer 1 Shuffle Layer 2
15©2014 Cloudera, Inc. All rights reserved.
Retro Thoughts
16©2014 Cloudera, Inc. All rights reserved.
• Data often has schema• At least it should• Kafka is unaware – which is good• Need capability to figure out schema for
events• Without including it in every event
Schema
17©2014 Cloudera, Inc. All rights reserved.
Kafka in Cloudera Manager
18
Visit us at
Booth #305
BOOK SIGNINGS THEATER SESSIONS
TECHNICAL DEMOS GIVEAWAYS
Top Related