Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
-
Upload
till-rohrmann -
Category
Technology
-
view
355 -
download
0
Transcript of Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
![Page 1: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/1.jpg)
Modern Stream Processing with Apache Flink®
Till Rohrmann@stsffap
GOTO Berlin 20171
![Page 2: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/2.jpg)
2
Original creators ofApache Flink®
dA Platform 2Open Source Apache Flink + dA Application Manager
![Page 3: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/3.jpg)
What changes faster? Data or Query?
3
Data changes slowlycompared to fastchanging queries
ad-hoc queries, data exploration, ML training and
(hyper) parameter tuning
Batch ProcessingUse Case
Data changes fastapplication logic
is long-lived
continuous applications,data pipelines, standing queries,
anomaly detection, ML evaluation, …
Stream ProcessingUse Case
![Page 4: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/4.jpg)
Batch Processing
4
![Page 5: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/5.jpg)
Stream Processing
5
![Page 6: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/6.jpg)
6
Apache Flink in a Nutshell
![Page 7: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/7.jpg)
Apache Flink in a Nutshell
7
Queries
Applications
Devices
etc.
Database
Stream
File/ObjectStorage
Stateful computations over streamsreal-time and historic
fast, scalable, fault tolerant, in-memory,event time, large state, exactly-once
HistoricData
Streams
Application
![Page 8: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/8.jpg)
8
Event Streams State (Event) Time Snapshots
The Core Building Blocks
real-time andhindsight
complexbusiness logic
consistency without-of-order data
and late data
forking /versioning /time-travel
![Page 9: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/9.jpg)
Powerful Abstractions
9
Process Function (events, state, time)
DataStream API (streams, windows)
Stream SQL / Tables (dynamic tables)
Stream- & Batch Data Processing
High-levelAnalytics API
Stateful Event-Driven Applications
val stats = stream.keyBy("sensor").timeWindow(Time.seconds(5)).sum((a, b) -> a.add(b))
def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = {// work with event and state(event, state.value) match { … }
out.collect(…) // emit eventsstate.update(…) // modify state
// schedule a timer callbackctx.timerService.registerEventTimeTimer(event.timestamp + 500)
}
Layered abstractions tonavigate simple to complex use cases
![Page 10: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/10.jpg)
Hardened at scale
10
Athena X Streaming SQLPlatform Service
Streaming Platform as a Service
Fraud detectionStreaming Analytics Platform
100s jobs, 1000s nodes, TBs statemetrics, analytics, real time MLStreaming SQL as a platform
![Page 11: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/11.jpg)
11
Distributed applicationinfrastructure
![Page 12: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/12.jpg)
Good old centralized architecture
12
The big meancentral database
$$$
The grumpyDBA
Application Application Application Application
![Page 13: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/13.jpg)
Write log and tables
13FromtheApacheCassandradocs
![Page 14: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/14.jpg)
Modern distributed app. architecture
14
EventSourcing
TurningtheDBinsideout
CQRS
![Page 15: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/15.jpg)
Modern distributed app. architecture
15
Application
Sensor
APIsApplication
Application
Application
The limit to what you can do is how sophisticatedcan you compute over the stream!
Boils down to: How well do you handle state and time
![Page 16: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/16.jpg)
16
A Flink-favored approach
![Page 17: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/17.jpg)
Event Sourcing + Memory Image
17
event logpersists events(temporarily)
event /command
Process
main memory
update localvariables/structures
periodically snapshot the
memory
![Page 18: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/18.jpg)
Event Sourcing + Memory Image
18
Recovery: Restore snapshot and replay events since snapshot
event logpersists events(temporarily)
Process
![Page 19: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/19.jpg)
Distributed Memory Image
19
Distributed application, many memory images.Snapshots are all consistent together.
![Page 20: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/20.jpg)
Stateful Event & Stream Processing
20
Scalable embedded state
Access at memory speed &scales with parallel operators
![Page 21: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/21.jpg)
Compute, State, and Storage
21
Classic tiered architecture Streaming architecture
databaselayer
computelayer
application state+ backup
compute+
stream storageand
snapshot storage(backup)
application state
![Page 22: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/22.jpg)
Performance
22
synchronous reads/writesacross tier boundary
asynchronous writesof large blobs
all modificationsare local
Classic tiered architecture Streaming architecture
![Page 23: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/23.jpg)
Consistency
23
distributed transactions
at scale typicallyat-most / at-least once
exactly onceper state =1 =1
Classic tiered architecture Streaming architecture
![Page 24: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/24.jpg)
Scaling a Service
24
separately provision additionaldatabase capacity
provision computeand state together
Classic tiered architecture Streaming architecture
provisioncompute
![Page 25: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/25.jpg)
Rolling out a new Service
25
provision a new database(or add capacity to an existing one)
simply occupies someadditional backup space
Classic tiered architecture
Streaming architecture
provision computeand state together
![Page 26: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/26.jpg)
What users built on checkpoints…§ Upgrades and Rollbacks§ Cross Datacenter Failover§ State Archiving§ Application Migration§ Spot Instance Region Chasing§ A/B testing§ …
26
![Page 27: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/27.jpg)
27
What is the next wave ofstream processing
applications?
![Page 28: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/28.jpg)
What changes faster? Data or Query?
28
Data changes slowlycompared to fastchanging queries
ad-hoc queries, data exploration, ML training and
tuning
Batch ProcessingUse Case
Data changes fastapplication logic
is long-lived
continuous applications,data pipelines, standing queries,
anomaly detection, ML evaluation, …
Stream ProcessingUse Case
![Page 29: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/29.jpg)
Analytics & Business Logic
29
Stream
Source analytics metricsBusiness
Logic
Stream Processor
events
ApplicationDatabase
![Page 30: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/30.jpg)
Blending Analytics & Business Logic
30
Stream
Source StreamingApplication
Stream Processor
events
RPCs
(Actor) Messages
…
![Page 31: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/31.jpg)
AthenaX by Uber
31
![Page 32: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/32.jpg)
32
Can one build an entire sophisticatedweb application (say a social network)
on a stream processor?
(Yes, we can!™)
![Page 33: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/33.jpg)
33
Social network implemented using event sourcing and CQRS (Command Query Responsibility Segregation) on Kafka/Flink/Elasticsearch/Redis
More:https://data-artisans.com/blog/drivetribe-cqrs-apache-flink
@
![Page 34: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/34.jpg)
34
The next wave of stream processing applications…
… is all types of statefulapplications that react to
data and time!
![Page 35: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/35.jpg)
35
Stateful stream applications
versioning, upgrading,rollback, duplicating,
migrating, …
Continuous applications
![Page 36: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/36.jpg)
The dA Platform Architecture
36
dA Platform 2
Apache FlinkStateful stream processing
KubernetesContainer platform
Logging
Streams from Kafka, Kinesis,
S3, HDFS, Databases, ...
dA Application
ManagerApplication lifecycle
management
Metrics
CI/CD
Real-timeAnalytics
Anomaly- &Fraud Detection
Real-timeData Integration
ReactiveMicroservices (and more)
![Page 37: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/37.jpg)
Versioned Applications, not Jobs/Jars
37
Stream ProcessingApplication
Version 3
Version 2
Version 1
Code and ApplicationSnapshot
upgrade
upgrade
New Application
Version 3a
Version 2afork /
duplicate
![Page 38: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/38.jpg)
Deployments, not Flink Clusters
38
Testing / QA Kubernetes Cluster Production Kubernetes Cluster
Threat MetricsApp. Testing
Activity MonitorApplication
Fraud DetectionApplication
Fraud DetectionApp. Testing
![Page 39: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/39.jpg)
Hooks for CI/CD pipelines
39
Kubernetes Cluster
ApplicationVersion 1
CI Service
dA Application
Manager
pushupdate
triggerCI
upgradeAPI
statefulupgrade
ApplicationVersion 2
![Page 40: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/40.jpg)
40
Thank you!@stsffap@ApacheFlink @dataArtisans
![Page 41: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a6475f17f8b9afc4d8b45c1/html5/thumbnails/41.jpg)
We are hiring!data-artisans.com/careers
41