Kostas Kloudas - Extending Flink's Streaming APIs
-
Upload
dataartisans -
Category
Data & Analytics
-
view
151 -
download
1
Transcript of Kostas Kloudas - Extending Flink's Streaming APIs
![Page 1: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/1.jpg)
1
Kostas Kloudas@KLOUBEN_K
meetup@ResearchGateFebruary16, 2017
Extending Flink’s Streaming APIs
![Page 2: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/2.jpg)
2
Original creators of Apache Flink®
Providers of the dA Platform, a supported
Flink distribution
![Page 3: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/3.jpg)
Additions in Flink 1.2
3
![Page 4: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/4.jpg)
Additions in Flink 1.2
4
Re-scalable State Low-level Stream Operations Asynchronous I/O Table API and SQL Externalized Checkpoints Queryable State Mesos Integration …and, of course, Documentation
![Page 5: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/5.jpg)
Additions in Flink 1.2
5
Re-scalable State Low-level Stream Operations Asynchronous I/O Table API and SQL Externalized Checkpoints Queryable State Mesos Integration …and, of course, Documentation
![Page 6: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/6.jpg)
Low-level Stream Operations
6
![Page 7: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/7.jpg)
7
Common Usecase Skeleton A On each incoming element:• update some state• register a callback for a moment in the
future When that moment comes:• Check a condition and perform a certain
action, e.g. emit an element
![Page 8: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/8.jpg)
8
Use built-in windowing:• +Expressive• +A lot of functionality out-of-the-box• - Not always intuitive• - An overkill for simple cases
Write your own operator:• - Too many things to account for in Flink 1.1
The Flink 1.1 way
![Page 9: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/9.jpg)
9
The Flink 1.2 way: ProcessFunction Gives access to all basic building
blocks:• Events• Fault-tolerant, Consistent State• Timers (event- and processing-time)
![Page 10: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/10.jpg)
10
The Flink 1.2 way: ProcessFunction Simple yet powerful API:
/** * Process one element from the input stream. */void processElement(I value, Context ctx, Collector<O> out) throws Exception;
/** * Called when a timer set using {@link TimerService} fires. */void onTimer(long timestamp, OnTimerContext ctx, Collector<O> out) throws Exception;
![Page 11: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/11.jpg)
11
The Flink 1.2 way: ProcessFunction Simple yet powerful API:
/** * Process one element from the input stream. */void processElement(I value, Context ctx, Collector<O> out) throws Exception;
/** * Called when a timer set using {@link TimerService} fires. */void onTimer(long timestamp, OnTimerContext ctx, Collector<O> out) throws Exception;
A collector to emit result values
![Page 12: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/12.jpg)
12
The Flink 1.2 way: ProcessFunction Simple yet powerful API:
/** * Process one element from the input stream. */void processElement(I value, Context ctx, Collector<O> out) throws Exception;
/** * Called when a timer set using {@link TimerService} fires. */void onTimer(long timestamp, OnTimerContext ctx, Collector<O> out) throws Exception;
1. Get the timestamp of the element
2. Interact with the TimerService to: • query the current time • and register timers
1. Do the above2. Query if we are operating on
Event or Processing time
![Page 13: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/13.jpg)
ProcessFunction: example Requirements:• maintain counts per incoming key, and • emit the key/count pair if no element
came for the key in the last 100 ms (in event time)
13
![Page 14: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/14.jpg)
14
ProcessFunction: example Implementation sketch:
• Store the count, key and last mod timestamp in a ValueState (scoped by key)
• For each record:• update the counter and the last mod timestamp• register a timer 100ms from “now” (in event time)
• When the timer fires:• check the callback’s timestamp against the last mod time
for the key and • emit the key/count pair if they match
![Page 15: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/15.jpg)
15
ProcessFunction: examplepublic class MyProcessFunction extends
RichProcessFunction<Tuple2<String, String>, Tuple2<String, Long>> {@Overridepublic void open(Configuration parameters) throws Exception {
// register our state with the state backend}
@Override public void processElement(Tuple2<String, Long> value, Context ctx,
Collector<Tuple2<String, Long>> out) throws Exception {
// update our state and register a timer}
@Override public void onTimer(long timestamp, OnTimerContext ctx,
Collector<Tuple2<String, Long>> out) throws Exception {
// check the state for the key and emit a result if needed}
}
![Page 16: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/16.jpg)
16
ProcessFunction: examplepublic class MyProcessFunction extends
RichProcessFunction<Tuple2<String, String>, Tuple2<String, Long>> {
private ValueState<MyStateClass> state;@Overridepublic void open(Configuration parameters) throws Exception {
state = getRuntimeContext().getState(new ValueStateDescriptor<>("myState", MyStateClass.class));
}
}
![Page 17: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/17.jpg)
17
ProcessFunction: examplepublic class MyProcessFunction extends
RichProcessFunction<Tuple2<String, String>, Tuple2<String, Long>> {@Override public void processElement(Tuple2<String, Long> value, Context ctx,
Collector<Tuple2<String, Long>> out) throws Exception {
CountWithTimestamp current = state.value(); if (current == null) {
current = new CountWithTimestamp(); current.key = value.f0;
} current.count++; current.lastModified = ctx.timestamp();state.update(current);ctx.timerService().registerEventTimeTimer(current.timestamp + 100);
}
}
![Page 18: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/18.jpg)
18
ProcessFunction: examplepublic class MyProcessFunction extends
RichProcessFunction<Tuple2<String, String>, Tuple2<String, Long>> {
@Override public void onTimer(long timestamp, OnTimerContext ctx,
Collector<Tuple2<String, Long>> out) throws Exception {
CountWithTimestamp result = state.value(); if (timestamp == result.lastModified) {
out.collect(new Tuple2<String, Long>(result.key, result.count)); }
}}
![Page 19: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/19.jpg)
19
ProcessFunction: example
If your stream is not keyed, you can always group on a dummy key BEWARE: parallelism of 1
stream.keyBy("id").process(new MyProcessFunction())
![Page 20: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/20.jpg)
20
ProcessFunction: miscellaneous CoProcessFunction for low-level joins:
• Applied on two input streams • Has two processElement() methods, one for each input stream
Upcoming releases may further enhance the ProcessFunction/CoProcessFunction
Planning to transform all CEP operators to ProcessFunctions
![Page 21: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/21.jpg)
Asynchronous I/O
21
![Page 22: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/22.jpg)
22
Common Usecase Skeleton B On each incoming element:• extract some info from the element (e.g.
key)• query an external storage system (DB or
KV-store) for additional info • emit an enriched version of the input
element
![Page 23: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/23.jpg)
23
Write a MapFunction that queries the DB:• +Simple• - Slow (synchronous access) or/and • - Requires high parallelism (more tasks)
Write your own operator:• - Too many things to account for in Flink 1.1
The Flink 1.1 way
![Page 24: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/24.jpg)
24
Write a MapFunction that queries the DB:• +Simple• - Slow (synchronous access) or/and • - Requires high parallelism (more tasks)
Write your own operator:• - Too many things to account for in Flink 1.1
The Flink 1.1 way
![Page 25: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/25.jpg)
25
Synchronous Access
![Page 26: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/26.jpg)
26
Synchronous Access
Communication delay can dominate application throughput and latency
![Page 27: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/27.jpg)
27
Asynchronous Access
![Page 28: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/28.jpg)
28
Requirement:• a client that supports asynchronous requests
Flink handles the rest:• integration of async IO with DataStream API• fault-tolerance• order of emitted elements• correct time semantics (event/processing time)
The Flink 1.2 way: AsyncFunction
![Page 29: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/29.jpg)
29
Simple API:/** * Trigger async operation for each stream input. */void asyncInvoke(IN input, AsyncCollector<OUT> collector) throws Exception;
API call:/** * Example async function call. */DataStream<...> result = AsyncDataStream.(un)orderedWait(stream,
new MyAsyncFunction(), 1000, TimeUnit.MILLISECONDS, 100);
The Flink 1.2 way: AsyncFunction
![Page 30: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/30.jpg)
The Flink 1.2 way: AsyncFunction
30
Emitter
P2P3 P1P4
AsyncWaitOperatorE5
AsyncWaitOperator:• a queue of “Promises”• a separate thread
(Emitter)
![Page 31: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/31.jpg)
The Flink 1.2 way: AsyncFunction
31
Emitter
P2P3 P1P4
AsyncWaitOperator • Wrap E5 in a “promise” P5
• Put P5 in the queue• Call asyncInvoke(E5, P5)
E5
P5
asyncInvoke(E5, P5)P5
![Page 32: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/32.jpg)
The Flink 1.2 way: AsyncFunction
32
Emitter
P2P3 P1P4
AsyncWaitOperatorE5
P5
asyncInvoke(E5, P5)P5
asyncInvoke(value, asyncCollector):
• a user-defined function• value : the input element• asyncCollector : the collector of the
result (when the query returns)
![Page 33: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/33.jpg)
The Flink 1.2 way: AsyncFunction
33
Emitter
P2P3 P1P4
AsyncWaitOperatorE5
P5
asyncInvoke(E5, P5)P5
asyncInvoke(value, asyncCollector):
• a user-defined function• value : the input element• asyncCollector : the collector of the
result (when the query returns)Future<String> future = client.query(E5);
future.thenAccept((String result) -> { P5.collect(
Collections.singleton(new
Tuple2<>(E5, result)));});
![Page 34: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/34.jpg)
The Flink 1.2 way: AsyncFunction
34
Emitter
P2P3 P1P4
AsyncWaitOperatorE5
P5
asyncInvoke(E5, P5)P5
asyncInvoke(value, asyncCollector):
• a user-defined function• value : the input element• asyncCollector : the collector of the
result (when the query returns)Future<String> future = client.query(E5);
future.thenAccept((String result) -> { P5.collect(
Collections.singleton(new
Tuple2<>(E5, result)));});
![Page 35: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/35.jpg)
The Flink 1.2 way: AsyncFunction
35
Emitter
P2P3 P1P4
AsyncWaitOperatorE5
P5
asyncInvoke(E5, P5)P5
Emitter:• separate thread• polls queue for completed
promises (blocking)• emits elements
downstream
![Page 36: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/36.jpg)
36
The Flink 1.2 way: AsyncFunctionDataStream<Tuple2<String, String>> result = AsyncDataStream.
(un)orderedWait(stream, new
MyAsyncFunction(), 1000,
TimeUnit.MILLISECONDS, 100); our asyncFunction
a timeout: max time until considered failed capacity: max number of in-flight requests
![Page 37: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/37.jpg)
37
The Flink 1.2 way: AsyncFunctionDataStream<Tuple2<String, String>> result = AsyncDataStream.
(un)orderedWait(stream, new
MyAsyncFunction(), 1000,
TimeUnit.MILLISECONDS, 100);
![Page 38: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/38.jpg)
38
The Flink 1.2 way: AsyncFunctionDataStream<Tuple2<String, String>> result = AsyncDataStream.
(un)orderedWait(stream, new
MyAsyncFunction(), 1000,
TimeUnit.MILLISECONDS, 100);
P2P3 P1P4E2E3 E1E4
Ideally... Emitter
![Page 39: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/39.jpg)
39
The Flink 1.2 way: AsyncFunctionDataStream<Tuple2<String, String>> result =
AsyncDataStream.unorderedWait(stream, new
MyAsyncFunction(), 1000,
TimeUnit.MILLISECONDS, 100);
P2P3 P1P4E2E3 E1E4
Reallistically... Emitter
...output ordered based on which request finished first
![Page 40: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/40.jpg)
40
The Flink 1.2 way: AsyncFunction
P2P3 P1P4E2E3 E1E4
Emitter
unorderedWait: emit results in order of completion orderedWait: emit results in order of arrival
Always: watermarks never overpass elements and vice versa
![Page 41: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/41.jpg)
Documentation ProcessFunction:
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/process_function.html
AsyncIO:https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/
asyncio.html
41
![Page 42: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/42.jpg)
42
Thank you!@KLOUBEN_K@ApacheFlink @dataArtisans
![Page 43: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/43.jpg)
43
One day of hands-on Flink training
One day of conference
Tickets are on sale
Call for Papers is already open
Please visit our website:http://sf.flink-forward.org
Follow us on Twitter: @FlinkForward
![Page 44: Kostas Kloudas - Extending Flink's Streaming APIs](https://reader036.fdocuments.in/reader036/viewer/2022081517/58ac3d261a28ab145e8b6731/html5/thumbnails/44.jpg)
We are hiring!
data-artisans.com/careers