Post on 10-Feb-2017
world of microservices
Flink in Zalando’s
Agenda
Zalando’s Microservices Architecture
Saiki - Data Integration and Distribution at Scale
Flink in a Microservices World
Stream Processing Use Cases:
Business Process Monitoring
Continuous ETL
Future Work
2
About us
Mihail Vieru Big Data Engineer,
Business Intelligence
Javier López Big Data Engineer,
Business Intelligence
3
4
15 countries
4 fulfillment centers
~18 million active customers
2.9 billion € revenue 2015
150,000+ products
10,000+ employees
One of Europe's largest
online fashion retailers
Zalando Technology
1100+ TECHNOLOGISTS
Rapidly growing
international team
http://tech.zalando.com
6
VINTAGE ARCHITECTURE
Classical ETL process
Vintage Business Intelligence
Business Logic
Data Warehouse (DWH)
Database DBA
BI
Business Logic
Database
Business Logic
Database
Business Logic
Database
Dev
8
Vintage Business Intelligence
Very stable architecture that is still in use in the oldest
components
Classical ETL process
• Use-case specific
• Usually outputs data into a Data Warehouse
• well structured
• easy to use by the end user (SQL) 9
RADICAL AGILITY
AUTONOMY
MASTERY
PURPOSE
Radical Agility
11
Autonomy
Autonomous teams
• can choose own technology stack
• including persistence layer
• are responsible for operations
• should use isolated AWS accounts 12
Supporting autonomy — Microservices
Business
Logic
Database
RE
ST
AP
I
Business
Logic
Database
RE
ST
AP
I
Business
Logic
Database
RE
ST
AP
I
Business
Logic
Database
RE
ST
AP
I
Business
Logic
Database
RE
ST
AP
I
Business
Logic
Database
RE
ST
AP
I
Business
Logic
Database
RE
ST
AP
I
13
Supporting autonomy — Microservices
Business Logic
Database
Team
A Business Logic
Database
Team
B
RE
ST
AP
I RE
ST
AP
I
Applications communicate using REST APIs
Databases hidden behind the walls of AWS VPC
public Internet
14
Supporting autonomy — Microservices
Business Logic
Database
Team
A Business Logic
Database
Team
B
Classical ETL process is impossible!
RE
ST
AP
I
public Internet
RE
ST
AP
I
15
Supporting autonomy — Microservices
Business
Logic
Database
RE
ST
AP
I
Ap
p A
Business
Logic
Database
RE
ST
AP
I
App B
Business
Logic
Database
RE
ST
AP
I
Ap
p C
Business
Logic
Database
RE
ST
AP
I
Ap
p D
16
Supporting autonomy — Microservices
Business Intelligence
Business
Logic
Database
RE
ST
AP
I
Ap
p A
Business
Logic
Database
RE
ST
AP
I
App B
Business
Logic
Database
RE
ST
AP
I
Ap
p C
Business
Logic
Database
RE
ST
AP
I
Ap
p D
17
Supporting autonomy — Microservices
App A App B App D App C BI
Data Warehouse
?
18
SAIKI
SAIKI
Saiki Data Platform
App A App B App D App C BI
Data Warehouse
20
SAIKI
Saiki — Data Integration & Distribution
App A App B App D App C BI
Data Warehouse
Buku
21
SAIKI
Saiki — Data Integration & Distribution
App A App B App D App C BI
Data Warehouse
Buku
AWS S3
Tukang
REST API
22
SAIKI
Saiki — Data Integration & Distribution
App A App B App D App C BI
Data Warehouse
Buku Tukang
REST API
AWS S3
23
E.g. Forecast DB
Saiki — Data Integration & Distribution
Old Load Process New Load Process
relied on Delta Loads relies on Event Stream
JDBC connection RESTful HTTPS connections
Data Quality could be controlled by BI
independently
trust for Correctness of Data in the
delivery teams
PostgreSQL dependent Independent of the source technology
stack
N to 1 data stream N to M streams, no single data sink
24
BI
Data Warehouse E.g. Forecast DB
SAIKI
Saiki — Data Integration & Distribution
App A App B App D App C
Buku Tukang
REST API
AWS S3
25
BI
Data Warehouse E.g. Forecast DB
SAIKI
Saiki — Data Integration & Distribution
App A App B App D App C
Buku Tukang
REST API
Stream Processing
via Apache Flink
AWS S3
26
BI
Data Warehouse E.g. Forecast DB
SAIKI
Saiki — Data Integration & Distribution
App A App B App D App C
Buku Tukang
REST API
Stream Processing
via Apache Flink Data Lake .
AWS S3
27
FLINK IN A
MICROSERVICES
WORLD
Current state in BI
29
• Centralized ETL tools
(Pentaho DI/Kettle and Oracle Stored Procedures)
• Reporting data does not arrive in real time
• Data to data science & analytical teams is provided using
Exasol DB
• All data comes from SQL databases
Opportunities for Next Gen BI
• Cloud Computing
• Distributed ETL
• Scale
• Access to real time data
• All teams publishing data to central bus (Kafka)
30
Opportunities for Next Gen BI
• Hub for Data teams
• Data Lake provides distributed access and fine
grained security
• Data can be transformed (aggregated, joined, etc.)
before delivering it to data teams
• Non structured data
• “General-purpose data processing engines like Flink or
Spark let you define own data types and functions.” -
Fabian Hueske
31
The right fit
Stream Processing
32
The right fit - Data processing engine (I)
33
Feature Apache Spark 1.5.2 Apache Flink 0.10.1
processing mode micro-batching tuple at a time
temporal processing
support
processing time event time, ingestion time, processing
time
batch processing yes yes
latency seconds sub-second
back pressure handling through manual configuration implicit, through system architecture
state storage distributed dataset in memory distributed key/value store
state access full state scan for each microbatch value lookup by key
state checkpointing yes yes
high availability mode yes yes
The right fit - Data processing engine (II)
34
Feature Apache Spark 1.5.2 Apache Flink 0.10.1
event processing guarantee exactly once exactly once
Apache Kafka connector yes yes
Amazon S3 connector yes yes
operator library neutral ++ (split, windowByCount..)
language support Java, Scala, Python Java, Scala, Python
deployment standalone, YARN, Mesos standalone, YARN
support neutral ++ (mailing list, direct contact & support
from data Artisans)
documentation + neutral
maturity ++ neutral
Saiki Data Platform
Apache Flink
• true stream processing framework
• process events at a consistently high rate with low latency
• scalable
• great community and on-site support from Berlin/ Europe
• university graduates with Flink skills
https://tech.zalando.com/blog/apache-showdown-flink-vs.-spark/
35
Flink in AWS - Our appliance
36
MASTER ELB
EC2 Docker
Flink Master
EC2 Docker
Flink Shadow Master
WORKERS ELB
EC2 Docker
Flink Worker
EC2 Docker
Flink Worker
EC2 Docker
Flink Worker
USE CASES
BUSINESS PROCESS
MONITORING
Business Process
A business process is in its simplest form a chain of
correlated events:
Business Events from the whole Zalando platform flow through
Saiki => opportunity to process those streams in near real-time
39
start event completion event
ORDER_CREATED ALL_SHIPMENTS_DONE
Near Real-Time Business Process Monitoring
• Check if technically the Zalando platform works
• 1000+ Event Types
• Analyze data on the fly:
• Order velocities
• Delivery velocities
• Control SLAs of correlated events, e.g. parcel sent out
after order
40
Architecture BPM
41
App A App B
Nakadi Event Bus
App C
Operational Systems
Kafka2Kafka
Unified Log
Stream Processing Tokoh
ZMON
Cfg Service
PU
BL
IC IN
TE
RN
ET
OA
UT
H
Saiki BPM
Elasticsearch
How we use Flink in BPM
• 1 Event Type -> 1 Kafka topic
• Define a granularity level of 1 minute (TumblingWindows)
• Analyze processes with multiple event types (Join &
Union)
• Generation and processing of Complex Events (CEP &
State)
42
CONTINUOUS ETL
Extract Transform Load (ETL)
Traditional ETL process:
• Batch processing
• No real time
• ETL tools
• Heavy processing on the storage side
44
What changed with Radical Agility?
• Data comes in a semi-structured format (JSON payload)
• Data is distributed in separate Kafka topics
• There would be peak times, meaning that the data flow
will increase by several factors
• Data sources number increased by several factors
45
Architecture Continuous ETL
46
App A App B
Nakadi Event Bus
App C
Operational Systems
Kafka2Kafka
Unified Log
Stream Processing Saiki Continuous ETL
Tukang
Oracle DWH
Orang
How we (would) use Flink in Continuous ETL
• Transformation of complex payloads into simple ones for
easier consumption in Oracle DWH
• Combine several topics based on Business Rules (Union,
Join)
• Pre-Aggregate data to improve performance in the
generation of reports (Windows, State)
• Data cleansing
• Data validation 47
FUTURE WORK
Complex Event Processing for BPM
Cont. example business process:
• Multiple PARCEL_SHIPPED events per order
• Generate complex event ALL_SHIPMENTS_DONE, when
all PARCEL_SHIPPED events received
(CEP lib, State)
49
Deployments from other BI Teams
Flink Jobs from other BI Teams
Requirements:
• manage and control deployments
• isolation of data flows
• prevent different jobs from writing to the same sink
• resource management in Flink
• share cluster resources among concurrently running jobs
StreamSQL would significantly lower the entry barrier
50
Replace Kafka2Kafka
• Python app
• extracts events from REST API Nakadi Event Bus
• writes them to our Kafka cluster
Idea: Replace Python app with Flink Jobs, which would write
directly to Saiki’s Kafka cluster (under discussion)
51
Other Future Topics
• CD integration for Flink appliance
• Monitoring data flows via heartbeats
• Flink <-> Kafka
• Flink <-> Elasticsearch
• New use cases for Real Time Analytics/ BI
• Sales monitoring
52
THANK YOU
Open Source @ZalandoTech
https://zalando.github.io/
https://tech.zalando.de/blog
https://github.com/zalando/saiki/wiki
54
QUESTIONS?