Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit
-
Upload
zalando-technology -
Category
Technology
-
view
368 -
download
2
Transcript of Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit
STREAM
PROCESSING WITH
APACHE FLINK IN
ZALANDO’S WORLD
OF MICROSERVICES
JAVIER LOPEZ
MIHAIL VIERU
Please write title, subtitle
and speaker name in all
capital letters
2
Please write the title in
all capital letters
AGENDA
Please write the title in
all capital letters
● Zalando’s Microservices Architecture
● Saiki - Data Integration and Distribution at Scale
● Flink in a Microservices World
● Stream Processing Use Cases:
o Business Process Monitoring
o Continuous ETL
● Future Work
3
Please write the title in
all capital letters
ABOUT US
Please write the title in
all capital letters
Mihail Vieru Big Data Engineer,
Business Intelligence
Javier López Big Data Engineer,
Business Intelligence
4
Please write the title in
all capital letters
Please write the title in
all capital letters
5
Please write the title in
all capital letters
Please write the title in
all capital letters
One of Europe's largest online fashion retailers
15 countries
~19 million active customers
~3 billion € revenue 2015
1,500 brands
150,000+ products
11,000+ employees in Europe
6
Please write the title in
all capital letters
Put images in the grey
dotted box "unsupported
placeholder"
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
ZALANDO TECHNOLOGY
Put images in the grey
dotted box "unsupported
placeholder"
Please write the title in
all capital letters
Put images in the grey
dotted box "unsupported
placeholder"
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
1500+ TECHNOLOGISTS
Rapidly growing
international team
http://tech.zalando.com
Put images in the grey
dotted box "unsupported
placeholder" - behind
the orange box and
quote in capital letters
VINTAGE ARCHITECTURE
Put images in the grey
dotted box "unsupported
placeholder" - behind
the orange box and
quote in capital letters
8
Please write the title in
all capital letters
VINTAGE BUSINESS INTELLIGENCE
Please write the title in
all capital letters
Classical ETL process
Business
Logic
Data Warehouse (DWH)
Database DBA
BI
Business
Logic
Database
Business
Logic
Database
Business
Logic
Database
Dev
9
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
VINTAGE BUSINESS INTELLIGENCE
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
DWH Oracle
Exasol
Put images in the grey
dotted box "unsupported
placeholder" - behind
the orange box and
quote in capital letters
RADICAL AGILITY
Put images in the grey
dotted box "unsupported
placeholder" - behind
the orange box and
quote in capital letters
11
Please write the title in
all capital letters
Put images in the grey
dotted box "unsupported
placeholder"
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
RADICAL AGILITY
Put images in the grey
dotted box "unsupported
placeholder"
Please write the title in
all capital letters
Put images in the grey
dotted box "unsupported
placeholder"
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
AUTONOMY
MASTERY
PURPOSE
12
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
RADICAL AGILITY - AUTONOMY
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
Technologies Operations Teams
13
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
SUPPORTING AUTONOMY: MICROSERVICES
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box Business
Logic
Database
RE
ST
AP
I
Business
Logic
Database
RE
ST
AP
I
Business
Logic
Database
RE
ST
AP
I
Business
Logic
Database
RE
ST
AP
I
Business
Logic
Database
RE
ST
AP
I
14
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
SUPPORTING AUTONOMY: MICROSERVICES
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
Business
Logic
Database
Team A Business
Logic
Database
Team B
RE
ST
AP
I RE
ST
AP
I
public Internet
Applications communicate using REST APIs
Databases hidden behind the walls of AWS VPC
15
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
SUPPORTING AUTONOMY: MICROSERVICES
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
Business
Logic
Database
Team A Business
Logic
Database
Team B
RE
ST
AP
I RE
ST
AP
I
public Internet
Classical ETL process is impossible!
16
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
SUPPORTING AUTONOMY: MICROSERVICES
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box Business
Logic
Database
RE
ST
AP
I
App A
Business
Logic
Database
RE
ST
AP
I
Ap
p B
Business
Logic
Database
RE
ST
AP
I
Ap
p C
Business
Logic
Database
RE
ST
AP
I
Ap
p D
Business Intelligence
Put images in the grey
dotted box "unsupported
placeholder" - behind
the orange box and
quote in capital letters
SAIKI
Put images in the grey
dotted box "unsupported
placeholder" - behind
the orange box and
quote in capital letters
18
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
SAIKI DATA PLATFORM
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
SAIKI
App A App B App D App C BI
Data Warehouse
19
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
SAIKI — DATA INTEGRATION & DISTRIBUTION
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box BI
Data Warehouse E.g. Forecast DB
SAIKI
App A App B App D App C
Exporter
REST API
Stream Processing
via Apache Flink Data Lake .
AWS S3
20
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
Please write the title in
all capital letters
SAIKI — SUMMARY
Δ J
D
B
C
REST
B
E
F
O
R
E
A
F
T
E
R
Data sources
Technologies
Data sources
Connections
Data sources
Extraction
Data
Delivery
Put images in the grey
dotted box "unsupported
placeholder" - behind
the orange box and
quote in capital letters
FLINK IN A
MICROSERVICES WORLD
Put images in the grey
dotted box "unsupported
placeholder" - behind
the orange box and
quote in capital letters
22
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
OPPORTUNITIES FOR NEXT GEN BI
Cloud Computing - Distributed ETL
- Scale
Access to Real Time Data - All teams publish data to central event
bus
Hub for Data Teams - Data Lake provides distributed access
and fine grained security
- Data can be transformed (aggregated,
joined, etc.) before delivering it to data
teams
Semi-Structured Data
“General-purpose data processing engines
like Flink or Spark let you define own data
types and functions.”
- Fabian Hueske, dataArtisans
23
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
THE RIGHT FIT
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
STREAM PROCESSING
24
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
THE RIGHT FIT — STREAM PROCESSING ENGINE
Candidates:
Storm & Samza ruled out because of batch processing
requirement
25
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
SPARK VS. FLINK DIFFERENCES
Feature Apache Spark 1.5.2 Apache Flink 0.10.1
Processing mode micro-batching tuple at a time
Temporal processing support processing time event time, ingestion time,
processing time
Latency seconds sub-second
Back pressure handling manual configuration implicit, through system
architecture
State access full state scan for each microbatch value lookup by key
Operator library neutral ++ (split, windowByCount..)
Support neutral ++ (mailing list, direct contact &
support from data Artisans)
26
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
SPARK VS. FLINK PERFORMANCE
source: Benchmarking Streaming Computation Engines at Yahoo!
27
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
APACHE FLINK
• true stream processing framework
• process events at a consistently high rate with low
latency
• scalable
• great community and on-site support from Berlin/
Europe
• university graduates with Flink skills
https://tech.zalando.com/blog/apache-showdown-flink-vs.-spark/
28
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
FLINK ON AWS - OUR APPLIANCE
MASTER ELB
EC2 Docker
Flink Master
EC2 Docker
Flink Shadow Master
WORKERS ELB
EC2 Docker
Flink Worker
EC2 Docker
Flink Worker
EC2 Docker
Flink Worker
Put images in the grey
dotted box "unsupported
placeholder" - behind
the orange box and
quote in capital letters
USE CASES
Put images in the grey
dotted box "unsupported
placeholder" - behind
the orange box and
quote in capital letters
Put images in the grey
dotted box "unsupported
placeholder" - behind
the orange box and
quote in capital letters
BUSINESS PROCESS
MONITORING
Put images in the grey
dotted box "unsupported
placeholder" - behind
the orange box and
quote in capital letters
31
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
BUSINESS PROCESS
A business process is in its simplest form a chain of
correlated events:
start event completion event
ORDER_CREATED ALL_PARCELS_SHIPPED
Business Events from the whole Zalando platform flow through
Saiki => opportunity to process those streams in near real time
32
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
REAL-TIME BUSINESS PROCESS MONITORING
• Check if business processes in the Zalando platform work
• Analyze data on the fly:
o Order velocities
o Delivery velocities
o Control SLAs of correlated events, e.g. parcel sent out
after order
33
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
Saiki BPM
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
ARCHITECTURE BPM
Cfg Service
App A App B
Nakadi Event Bus
App C
Operational Systems
Kafka2Kafka
Unified Log
PU
BLIC
INT
ER
NE
T
OA
UT
H
Alert Svc
UI
Elasticsearch
Stream Processing
34
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
HOW WE USE FLINK IN BPM
• 1000+ Event Types; 1 Event Type -> 1 Kafka topic
• Analyze processes with correlated event types (Join &
Union)
• Enrich data based on business rules
• Sliding Windows (1min to 48hrs) for Platform Snapshots
• State for alert metadata
• Generation and processing of Complex Events (CEP lib)
Put images in the grey
dotted box "unsupported
placeholder" - behind
the orange box and
quote in capital letters
STREAMING ETL
Put images in the grey
dotted box "unsupported
placeholder" - behind
the orange box and
quote in capital letters
36
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
Extract Transform Load (ETL)
Traditional ETL process:
• Batch processing
• No real time
• ETL tools
• Heavy processing on the storage side
37
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
WHAT CHANGED WITH RADICAL AGILITY?
• Data comes in a semi-structured format (JSON payload)
• Data is distributed in separate Kafka topics
• There would be peak times, meaning that the data flow
will increase by several factors
• Data sources number increased by several factors
38
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
`
Saiki Streaming ETL
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
ARCHITECTURE STREAMING ETL
Stream Processing
App A App B
Nakadi Event Bus
App C
Operational Systems
Kafka2Kafka
Unified Log Exporter
Oracle DWH
Importer
39
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
HOW WE (WOULD) USE FLINK IN STREAMING ETL
• Transformation of complex payloads into simple ones for
easier consumption in Oracle DWH
• Combine several topics based on Business Rules (Union,
Join)
• Pre-Aggregate data to improve performance in the
generation of reports (Windows, State)
• Data cleansing
• Data validation
Put images in the grey
dotted box "unsupported
placeholder" - behind
the orange box and
quote in capital letters
FUTURE USE CASES
Put images in the grey
dotted box "unsupported
placeholder" - behind
the orange box and
quote in capital letters
41
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
COMPLEX EVENT PROCESSING FOR BPM
Cont. example business process:
• Multiple PARCEL_SHIPPED events per order
• Generate complex event ALL_PARCELS_SHIPPED,
when all PARCEL_SHIPPED events received
(CEP lib, State)
42
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
DEPLOYMENTS FROM OTHER BI TEAMS
Flink Jobs from other BI Teams
Requirements:
• manage and control deployments
• isolation of data flows
o prevent different jobs from writing to the same sink
• resource management in Flink
o share cluster resources among concurrently running jobs
StreamSQL would significantly lower the entry barrier
43
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
OTHER FUTURE TOPICS
• New use cases for Real Time Analytics/ BI
o Sales monitoring
o Price monitoring
• Fraud detection for payments (evaluation)
• Contact customer according to variable event pattern
(evaluation)
44
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
Please write the title in
all capital letters
Use bullet points to
summarize information
rather than writing long
paragraphs in the text
box
CONCLUSION
Flink proved to be the right fit for our current stream
processing use cases. It enables us to build Zalando’s Next
Gen BI platform.
https://tech.zalando.de/blog/?tags=Saiki
Put images in the grey
dotted box "unsupported
placeholder" - behind
the orange box and
quote in capital letters
THANK YOU
Put images in the grey
dotted box "unsupported
placeholder" - behind
the orange box and
quote in capital letters