Ingest and Stream Processing - What will you choose?
-
Upload
pat-patterson -
Category
Software
-
view
3.286 -
download
0
Transcript of Ingest and Stream Processing - What will you choose?
![Page 1: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/1.jpg)
1© Cloudera, Inc. All rights reserved.
13 June2016Ted Malaska| Principle Solutions Architect @ Cloudera, Pat Patterson| Community Champion @ StreamSets
Ingest and Stream Processing - What will you choose?
![Page 2: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/2.jpg)
2© Cloudera, Inc. All rights reserved.
About Ted and Pat
Ted Malaska• Principal Solutions Architect @ Cloudera• Apache HBase SparkOnHBase
Contributor•Contact• [email protected]• @TedMalaska
Pat Patterson•Community Champion @ StreamSets• Formerly Developer Evangelist at
Salesforce•Contact• [email protected]• @metadaddy
![Page 3: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/3.jpg)
3© Cloudera, Inc. All rights reserved.
Streaming Patterns
•Ingestion•Low Millisecond Actions•Near Real Time Complex Actions
![Page 4: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/4.jpg)
4© Cloudera, Inc. All rights reserved.
Parts Of Streaming
Producer Kafka Engine Destination
![Page 5: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/5.jpg)
5© Cloudera, Inc. All rights reserved.
Parts Of Streaming
Producer Kafka Engine Destination
At Least onceOrdered
Partitioned
At Least Once Depends
Depends
![Page 6: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/6.jpg)
6© Cloudera, Inc. All rights reserved.
Destinations• File Systems: example HDFS• Batch is good•Only can do exactly once is a file is closed in a single ack.•Good for Scans
• Solr• Everything is Document based making exactly once• Batch is still good•Good for Search Queries
![Page 7: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/7.jpg)
7© Cloudera, Inc. All rights reserved.
Destinations• NoSQL: example HBase• Everything has a row key making exactly once for writes• Increments can be applied twice is so be careful•Good for gets and puts
• Kudu• Everything has a row key making exactly once for writes•Good for gets, puts, and scans
![Page 8: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/8.jpg)
8© Cloudera, Inc. All rights reserved.
Ingestion Destinations• File Systems: example HDFS•Flume•Kafka Connect
• Solr•Flume•Any Streaming Engine
![Page 9: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/9.jpg)
9© Cloudera, Inc. All rights reserved.
Ingestion Destinations
•NoSQL: example HBase•Flume•Any Streaming Engine: Storm and Spark Streaming Tested
•Kudu•Flume•Kafka Connect•Any Streaming Engine: Spark Streaming Tested
![Page 10: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/10.jpg)
10© Cloudera, Inc. All rights reserved.
Tricks With Producers• Send Source ID (requires Partitioning In Kafka) •Seq•UUID•UUID plus time
•Partition on SourceID•Watch out for repartitions and partition fail overs
![Page 11: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/11.jpg)
11© Cloudera, Inc. All rights reserved.
Streaming Engines
•Consumer•Flume, KafkaConnect, Streaming Engine
• Storm• Spark Streaming• Flink•Kafka Streams
![Page 12: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/12.jpg)
12© Cloudera, Inc. All rights reserved.
Consumer: Flume, KafkaConnect• Simple and Works• Low latency•High throughput • Interceptors•Transformations•Alerting• Ingestions
![Page 13: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/13.jpg)
13© Cloudera, Inc. All rights reserved.
Consumer: Streaming Engines •Not so great at HDFS Ingestion•But great for record storage systems•HBase•Cassandra •Kudu•SolR•Elastic Search
![Page 14: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/14.jpg)
14© Cloudera, Inc. All rights reserved.
Storm•Old Gen• Low latency• Low throughput •At least once•Around for ever• Topology Based
![Page 15: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/15.jpg)
15© Cloudera, Inc. All rights reserved.
Spark Streaming• The Juggernaut•Higher Latency•High Through Put• Exactly Once• SQL•MlLib
•Highly used• Easy to Debug/Unit Test• Easy to transition from Batch• Flow Language•600 commits in a month and about 100 meetups
![Page 16: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/16.jpg)
16© Cloudera, Inc. All rights reserved.
Spark Streaming
DStream
DStream
DStream
Single Pass
Source Receiver RDD
Source Receiver RDD
RDD
Filter Count Print
Source Receiver RDD
RDD
RDD
Single Pass
Filter Count Print
First Batch
Second Batch
![Page 17: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/17.jpg)
17© Cloudera, Inc. All rights reserved.
DStream
DStream
DStream
Single Pass
Source Receiver RDD
Source Receiver RDD
RDD
Filter Count
Source Receiver RDDpartitions
RDDParition
RDD
Single PassFilter Count
Pre-first Batch
First Batch
Second Batch
Stateful RDD 1
Stateful RDD 2
Stateful RDD 1
Spark Streaming
![Page 18: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/18.jpg)
18© Cloudera, Inc. All rights reserved.
Flink• I’m Better Than Spark Why Doesn’t Anyone use me•Very much like Spark but not as feature rich• Lower Latency•Micro Batch -> ABS• Asynchronous Barrier Snapshotting
• Flow Language• ~1/6th the comments and meetups•But Slim loves it
![Page 19: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/19.jpg)
19© Cloudera, Inc. All rights reserved.
Flink - ABS
Operator
Buffer
![Page 20: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/20.jpg)
20© Cloudera, Inc. All rights reserved.
Operator
Buffer
Operator
Buffer
Flink - ABS
Barrier 1A Hit
Barrier 1B Still Behind
![Page 21: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/21.jpg)
21© Cloudera, Inc. All rights reserved.
Operator
Buffer
Flink - ABS
Both Barriers Hit
Operator
Buffer
Barrier 1A Hit
Barrier 1B Still Behind
Check Point
![Page 22: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/22.jpg)
22© Cloudera, Inc. All rights reserved.
Operator
Buffer
Flink - ABSBoth
Barriers Hit
Check Point
Operator
Buffer Barrier is combined and can move on
Buffer can be flushed
out
![Page 23: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/23.jpg)
23© Cloudera, Inc. All rights reserved.
Kafka Streams• The new Kid on the Block• When you only have Kafka• Low Latency• High Throughput• Not exactly once• Very Young• Flow Language• Very different hardware profile then others• Not widely supported• Not widely used• Worries about separation of concern
![Page 24: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/24.jpg)
24© Cloudera, Inc. All rights reserved.
Summary about Engines• Ingestion• Flume and KafkaConnect
• Super Real Time and Special • Consumer
• Counting, MlLib, SQL• Spark
• Maybe future and cool• Flink and KafkaStreams
• Odd man out• Storm
![Page 25: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/25.jpg)
25© Cloudera, Inc. All rights reserved.
Abstractions
Code Abstractions
BeamSQL Abstraction
SQLUI Abstraction
StreamSets
Streaming Engines
![Page 26: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/26.jpg)
26© Cloudera, Inc. All rights reserved.
StreamSets Data CollectorBuilding a Higher Level, Open Source Tool
![Page 27: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/27.jpg)
27© Cloudera, Inc. All rights reserved.
Traditional and Big Data Founders
StreamSets Company Background
Top tier Investors
Momentum to Date
Strategic Partners
• Founded 2014; exited stealth 9/15• ~30 employees• Double-digit enterprise customers• 10,000 downloads
![Page 28: Ingest and Stream Processing - What will you choose?](https://reader035.fdocuments.in/reader035/viewer/2022070509/589ef9d71a28ab06368b4fb5/html5/thumbnails/28.jpg)
28© Cloudera, Inc. All rights reserved.
Thank you!