Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam...
Transcript of Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam...
![Page 1: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/1.jpg)
Unified Processing with Apache Beam
Cloud+Data NEXTCon 2017
![Page 2: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/2.jpg)
I am Sourabh
Hello!
![Page 3: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/3.jpg)
I am Sourabh
Hello!
I am a Software Engineer
![Page 4: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/4.jpg)
I am Sourabh
Hello!
I am a Software Engineer
I tweet at @sb2nov
![Page 5: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/5.jpg)
What is Apache Beam?
![Page 6: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/6.jpg)
Apache Beam is a unified programming model for expressing efficient and
portable data processing pipelines
![Page 7: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/7.jpg)
Big Data
![Page 8: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/8.jpg)
https://commons.wikimedia.org/wiki/File:Globe_centered_in_the_Atlantic_Ocean_(green_and_grey_globe_scheme).svg
LAUNCH!!
![Page 9: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/9.jpg)
DATA CAN BE BIG
![Page 10: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/10.jpg)
… REALLY BIG ...
TuesdayWednesday
Thursday
![Page 11: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/11.jpg)
UNBOUNDED, DELAYED, OUT OF ORDER
9:008:00 14:0013:0012:0011:0010:00
8:00
8:008:00
![Page 12: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/12.jpg)
ORGANIZING THE STREAM
8:00
8:00
8:00
![Page 13: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/13.jpg)
DATA PROCESSING TRADEOFFS
Completeness Latency
$$$Cost
![Page 14: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/14.jpg)
WHAT IS IMPORTANT?
Completeness Low Latency Low Cost
Important
Not Important
$$$
![Page 15: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/15.jpg)
MONTHLY BILLING
Completeness Low Latency Low Cost
Important
Not Important
$$$
![Page 16: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/16.jpg)
BILLING ESTIMATE
Completeness Low Latency Low Cost
Important
Not Important
$$$
![Page 17: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/17.jpg)
FRAUD DETECTION
Completeness Low Latency Low Cost
Important
Not Important
$$$
![Page 18: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/18.jpg)
Beam Model
![Page 19: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/19.jpg)
GENERATIONS BEYOND MAP-REDUCE
Clearly separates event time from processing time
Improved abstractions let you focus on your application logic
Batch and stream processing are both first-class citizens
![Page 20: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/20.jpg)
Pipeline
PTransform
PCollection(bounded or unbounded)
![Page 21: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/21.jpg)
EVENT TIME VS PROCESSING TIME
![Page 22: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/22.jpg)
EVENT TIME VS PROCESSING TIME
![Page 23: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/23.jpg)
EVENT TIME VS PROCESSING TIME
Watermarks describe event time progress.
"No timestamp earlier than the watermark will be seen"
Often heuristic-based.
Too Slow? Results are delayed.Too Fast? Some data is late.
![Page 24: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/24.jpg)
ASKING THE RIGHT QUESTIONS
When in processing time?
What is being computed?
Where in event time?
How do refinements happen?
![Page 25: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/25.jpg)
WHAT IS BEING COMPUTED?
scores: PCollection[KV[str, int]] = (input | beam.CombinePerKey(sum))
![Page 26: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/26.jpg)
WHAT IS BEING COMPUTED?
Element-Wise Aggregating Composite
![Page 27: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/27.jpg)
WHAT IS BEING COMPUTED?
![Page 28: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/28.jpg)
WHERE IN EVENT TIME?
scores: PCollection[KV[str, int]] = (input | beam.WindowInto(FixedWindows(2 * 60)) | beam.CombinePerKey(sum))
![Page 29: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/29.jpg)
WHERE IN EVENT TIME?
![Page 30: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/30.jpg)
WHERE IN EVENT TIME?
![Page 31: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/31.jpg)
WHERE IN EVENT TIME?
scores: PCollection[KV[str, int]] = (input | beam.WindowInto(FixedWindows(2 * 60)) | beam.CombinePerKey(sum))
The choice of windowing is retained through subsequent aggregations.
![Page 32: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/32.jpg)
WHEN IN PROCESSING TIME?
scores: PCollection[KV[str, int]] = (input | beam.WindowInto(FixedWindows(2 * 60), triggerfn=trigger.AfterWatermark()) | beam.CombinePerKey(sum))
![Page 33: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/33.jpg)
WHEN IN PROCESSING TIME?
Triggers control when results are emitted.
Triggers are often relative to the watermark.
![Page 34: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/34.jpg)
WHEN IN PROCESSING TIME?
![Page 35: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/35.jpg)
HOW DO REFINEMENTS HAPPEN?
scores: PCollection[KV[str, int]] = (input | beam.WindowInto(FixedWindows(2 * 60),
triggerfn=trigger.AfterWatermark(early=trigger.AfterPeriod(1*60), late=trigger.AfterCount(1)),
accumulation_mode=ACCUMULATING) | beam.CombinePerKey(sum))
![Page 36: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/36.jpg)
HOW DO REFINEMENTS HAPPEN?
![Page 37: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/37.jpg)
CUSTOMIZING WHAT WHERE WHEN HOW
Classic Batch
Windowed Batch
Streaming Streaming + Accumulation
For more information see https://cloud.google.com/dataflow/examples/gaming-example
![Page 38: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/38.jpg)
Python SDK
![Page 39: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/39.jpg)
39
SIMPLE PIPELINE
with beam.Pipeline() as p:
Pipeline construction is deferred.
![Page 40: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/40.jpg)
40
SIMPLE PIPELINE
with beam.Pipeline() as p:
lines = p | beam.io.ReadTextFile('/path/to/files')
lines is a PCollection, a deferred collection of all lines in the specified files.
![Page 41: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/41.jpg)
41
SIMPLE PIPELINE
with beam.Pipeline() as p:
lines = p | beam.io.ReadTextFile('/path/to/files')
words = lines | beam.FlatMap(lambda line: re.findall('\w+', line))
The "pipe" operator applies a transformation (on the right) to a PCollection, reminiscent of bash.
This will be applied to each line, resulting in a PCollection of words.
![Page 42: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/42.jpg)
42
SIMPLE PIPELINE
with beam.Pipeline() as p:
lines = p | beam.io.ReadTextFile('/path/to/files')
words = lines | beam.FlatMap(lambda line: re.findall('\w+', line))
totals = (words
| beam.Map(lambda w: (w, 1))
| beam.CombinePerKey(sum))Operations can be chained.
![Page 43: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/43.jpg)
43
SIMPLE PIPELINE
with beam.Pipeline() as p:
lines = p | beam.io.ReadTextFile('/path/to/files')
words = lines | beam.FlatMap(lambda line: re.findall('\w+', line))
totals = words | Count()
Composite operations easily defined.
![Page 44: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/44.jpg)
44
SIMPLE PIPELINE
with beam.Pipeline() as p:
lines = p | beam.io.ReadTextFile('/path/to/files')
words = lines | beam.FlatMap(lambda line: re.findall('\w+', line))
totals = words | Count()
totals | beam.io.WriteTextFile('/path/to/output')
(totals | beam.CombinePerKey(Largest(100))
| beam.io.WriteTextFile('/path/to/another/output')
Finally, write the results somewhere.
The pipeline actually executes on exiting its context. Pipelines are DAGs in general.
![Page 45: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/45.jpg)
45
SIMPLE BATCH PIPELINE
with beam.Pipeline() as p:
lines = p | beam.io.ReadTextFile('/path/to/files')
words = lines | beam.FlatMap(lambda line: re.findall('\w+', line))
totals = words | Count()
totals | beam.io.WriteTextFile('/path/to/output')
(totals | beam.CombinePerKey(Largest(100))
| beam.io.WriteTextFile('/path/to/another/output')
![Page 46: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/46.jpg)
46
WHAT ABOUT STREAMING?
![Page 47: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/47.jpg)
47
SIMPLE STREAMING PIPELINE
with beam.Pipeline() as p:
lines = p | beam.io.ReadPubSub(...) | WindowInto(...)
words = lines | beam.FlatMap(lambda line: re.findall('\w+', line))
totals = words | Count()
totals | beam.io.WriteTextFile('/path/to/output')
(totals | beam.CombinePerKey(Largest(100))
| beam.io.WriteTextFile('/path/to/another/output')
![Page 48: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/48.jpg)
Portability &Vision
Google Cloud Dataflow
![Page 49: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/49.jpg)
WHAT DOES APACHE BEAM PROVIDE?
Runners for Existing Distributed Processing Backends
The Beam Model: What / Where / When / How
API (SDKs) for writing Beam pipelines
Apache Apex
Apache Flink
InProcess / Local
Apache Spark
Google Cloud Dataflow
Apache GearPump
![Page 50: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/50.jpg)
OtherLanguages
Beam Java
Beam Python Pipeline SDK
User facing SDK, defines a language specific API for the end user to specify the pipeline computation DAG.
![Page 51: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/51.jpg)
Runner API
OtherLanguages
Beam Java
Beam Python Runner API
Runner and language agnostic representation of the user’s pipeline graph. It only contains nodes of Beam model primitives that all runners understand to maintain portability across runners.
![Page 52: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/52.jpg)
Runner API
OtherLanguages
Beam Java
Beam Python
Execution ExecutionExecution
SDK HarnessDocker based execution environments that are shared by all runners for running the user code in a consistent environment.
![Page 53: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/53.jpg)
Fn API
Runner API
OtherLanguages
Beam Java
Beam Python
Execution ExecutionExecution
Fn APIAPI which the execution environments use to send and receive data, report metrics around execution of the user code with the Runner.
![Page 54: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/54.jpg)
Fn API
Apache Flink
Apache Spark
Runner API
OtherLanguages
Beam Java
Beam Python
Execution Execution
Cloud Dataflow
Execution
Apache Gear-pump
Apache Apex
RunnerDistributed processing environments that understand the runner API graph and how to execute the Beam model primitives.
![Page 55: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/55.jpg)
BEAM RUNNER CAPABILITIES
https://beam.apache.org/capability-matrix/
![Page 56: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/56.jpg)
MORE BEAM?
Issue tracker (https://issues.apache.org/jira/projects/BEAM)
Beam website (https://beam.apache.org/)
Source code (https://github.com/apache/beam)
Developers mailing list ([email protected])
Users mailing list ([email protected])
Follow @ApacheBeam on Twitter
![Page 57: Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache](https://reader035.fdocuments.in/reader035/viewer/2022063005/5fb41e69e6c8c33f8151fbdb/html5/thumbnails/57.jpg)
SUMMARY● Beam helps you tackle big data that is:
○ Unbounded in volume○ Out of order ○ Arbitrarily delayed
● The Beam model separates concerns of:○ What is being computed?○ Where in event time?○ When in processing time?○ How do refinements happen?