Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It
-
Upload
dataartisans -
Category
Data & Analytics
-
view
193 -
download
0
Transcript of Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It
![Page 1: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/1.jpg)
Stream Processingand Apache Flink®'s approach to it@StephanEwen
Apache Flink PMCCTO @ data Artisans
![Page 2: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/2.jpg)
About meDatabase systems, TU Berlin, IBM, MicrosoftCo-bootstrapped Stratosphere project's runtimeApache Flink created from a (partial) Stratosphere forkApache Flink community founded data ArtisansNow Flink PMC and CTO at data Artisans
![Page 3: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/3.jpg)
Streaming technology is enabling the obvious: continuous processing on data that is continuously produced
Hint: you already have streaming data
3
![Page 4: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/4.jpg)
Streaming Subsumes Batch
4
2016-3-112:00 am
2016-3-11:00 am
2016-3-12:00 am
2016-3-1111:00pm
2016-3-1212:00am
2016-3-121:00am
2016-3-1110:00pm
2016-3-122:00am
2016-3-123:00am…
partition
partition
![Page 5: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/5.jpg)
Streaming Subsumes Batch
5
2016-3-112:00 am
2016-3-11:00 am
2016-3-12:00 am
2016-3-1111:00pm
2016-3-1212:00am
2016-3-121:00am
2016-3-1110:00pm
2016-3-122:00am
2016-3-123:00am…
partition
partition
Stream (low latency)
Stream (high latency)
![Page 6: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/6.jpg)
Streaming Subsumes Batch
6
2016-3-112:00 am
2016-3-11:00 am
2016-3-12:00 am
2016-3-1111:00pm
2016-3-1212:00am
2016-3-121:00am
2016-3-1110:00pm
2016-3-122:00am
2016-3-123:00am…
partition
partition
Stream (low latency)
Batch(bounded stream)Stream (high latency)
![Page 7: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/7.jpg)
Stream Processing Decouples
7
Database(State)
App a
App b
App c
App a
App b
App c
Applications build their own stateState managed centralized
![Page 8: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/8.jpg)
Time Travel
8
Process a period ofhistoric data
partition
partition
Process latest datawith low latency(tail of the log)
Reprocess stream(historic data first, catches up with realtime data)
![Page 9: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/9.jpg)
9
Latency
Volume/Throughput
State &Accuracy
![Page 10: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/10.jpg)
10
Latency
Volume/Throughput
State &Accuracy
Exactly-once semanticsEvent time processing
10s of millions evts/secfor stateful applications
Latency down tothe milliseconds
Apache Flink was the first open-source system to eliminate these
tradeoffs
![Page 11: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/11.jpg)
Streaming Architecture Blueprint
11
collect log analyze serve & store
![Page 12: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/12.jpg)
Flink's Approach
12
Stateful Steam Processing
Fluent API, Windows, Event Time
Table API
Stream SQL
Core API
Declarative DSL
High-level Language
Building Block
![Page 13: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/13.jpg)
Stateful Steam Processing
13
Source Filter /Transform
Stateread/write Sink
![Page 14: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/14.jpg)
Stateful Steam Processing
14
Scalable embedded state Access at memory speed &scales with parallel operators
![Page 15: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/15.jpg)
Stateful Steam Processing
15
Re-load state
Reset positionsin input streams
Rolling back computationRe-processing
![Page 16: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/16.jpg)
Stateful Steam Processing
16
Restore to differentprograms
Bugfixes, Upgrades, A/B testing, etc
![Page 17: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/17.jpg)
Versioning the state of applications
17
Savepoint
Savepoint
Savepoint
App. A
App. B
App. C
Time
Savepoint
![Page 18: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/18.jpg)
Flink's Approach
18
Stateful Steam Processing
Fluent API, Windows, Event Time
Table API
Stream SQL
Core API
Declarative DSL
High-level Language
Building Block
![Page 19: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/19.jpg)
Event Time / Out-of-Order
19
1977 1980 1983 1999 2002 2005 2015
Processing Time
EpisodeIV
EpisodeV
EpisodeVI
EpisodeI
EpisodeII
EpisodeIII
EpisodeVII
Event Time
![Page 20: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/20.jpg)
(Stream) SQL & Table API
20
Table API
// convert stream into Tableval sensorTable: Table = sensorData .toTable(tableEnv, 'location, 'time, 'tempF)
// define query on Tableval avgTempCTable: Table = sensorTable .groupBy('location) .window(Tumble over 1.days on 'rowtime as 'w) .select('w.start as 'day, 'location, (('tempF.avg - 32) * 0.556) as 'avgTempC) .where('location like "room%")
SQL
sensorTable.sql(""" SELECT day, location, avg((tempF - 32) * 0.556) AS avgTempC
FROM sensorData WHERE location LIKE 'room%'GROUP BY day, location
""")
![Page 21: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/21.jpg)
What can you do with that?
21
10 billion events (2TB) processed daily across multiple Flink jobs for the telco network control center.
Ad-hoc realtime queries, > 30 operators, processing 30 billion events daily, maintaining state of 100s of GB inside Flink with exactly-once guarantees
Jobs with > 20 operators, runs on > 5000 vCores in 1000-node cluster, processes millions of events per second
![Page 22: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/22.jpg)
Flink's Streams playing at Batch
22
TeraSort
Relational Join
Classic Batch Jobs
GraphProcessing
LinearAlgebra
![Page 23: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/23.jpg)
23
What can we expect next ?
![Page 24: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/24.jpg)
Queryable State
24
![Page 25: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/25.jpg)
Streaming Architecture Blueprint
25
collect log analyze &serve & store
Other Services
![Page 26: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/26.jpg)
Full SQL on Streams
26
Continuous queriesincremental results
Windows, event time,processing time
Consistent with SQL on bounded data https://docs.google.com/document/d/1qVVt_16kdaZQ8RTfA_f4konQPW4tnl8THw6rzGUdaqU
![Page 27: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/27.jpg)
Elastic Parallelism
27
Maintaining exactly-oncestate consistency
No extra effort for the userNo need to carefully planpartitions
![Page 28: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/28.jpg)
Very large state
28
Terabytes of state inside the stream processor
Maintaining fast checkpoints and recoveryE.g., long histories of windows, large join tablesState at local memory speed
![Page 29: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/29.jpg)
29
![Page 30: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.fdocuments.in/reader035/viewer/2022070600/58a226061a28ab527c8b47fd/html5/thumbnails/30.jpg)
We are hiring!
data-artisans.com/careers