Event Logs
MySQL Dumps
Gold Cluster
HDFS
Hive
Kafka
Sqoop
Silver Cluster Spark Cluster
SparkReAir
Airflow Scheduling
S3
Presto Cluster
AirPal
Caravel
Tableau
Batch Infrastructure
Yarn HDFS
Hive
Yarn
Liyin Tang and Jingwei Lu3
Streaming at Airbnb
Event Logging
MySQL BINLOG
Cluster
HDFS
HiveSpinal tap
Presto Cluster
Yarn
Kafka
HBase
Spark Streaming
Datadog
Druid
Kafka
Liyin Tang and Jingwei Lu4
Stateful
Liyin Tang and Jingwei Lu
ComputationSource
DStream DF DFSink1
Sink2
Sink N
State Storage
RDD
Multiple Streams
Liyin Tang and Jingwei Lu
DataFrameSink1
Process A
Sink2
Sink3
SinkN
…
DataFrameSink1
Process N
Sink2
Sink3
SinkN
…
Source
DStream
Align by Time
DataFrame
DataFrame
State
Source
DStream
…
Streaming + Batch
Liyin Tang and Jingwei Lu
DataFrameSink1
Process A
Sink2
Sink3
SinkN
…
DataFrame
State
DStream
…
Align by Time
…
DataFrameSink1
Process A
Sink2
Sink3
SinkN
…
AirStream Architecture
Liyin Tang and Jingwei Lu
Sources
Stream #1 Stream #NHive Tables HBase Tables
Virtual Table Views for Computation
Sinks
…
Customized ComputationSpark SQL
Simple Config
HBase Services Streaming SourcesDruid
AirStream Architecture
Liyin Tang and Jingwei Lu
Sources
Stream #1 Stream #NHive Tables HBase Tables
Virtual Table Views for Computation
Sinks
…
Customized ComputationSpark SQL
HBase Services Streaming SourcesDruid
Same Computation for Batch processing
Liyin Tang and Jingwei Lu
State Store• Merge changes
• Provide fast lookup
• Fast persistent storage across streaming and batch jobs
14
Why HBase
Liyin Tang and Jingwei Lu
Rich Functionalities
Rich Integration with Hadoop EcoSystem
Easy Management
Strong Community
Reliable and Scalable
HBase State StoreOperators in Airstream
Liyin Tang and Jingwei Lu16
Full Table Scan
Simple Aggregation
Bulk Upload
Key/Prefix Lookup
Update
Liyin Tang and Jingwei Lu
Key Space Design
• Hash partition key space for load balance
• Composite key for K -> V
• Support full key lookup
• Prefix lookup supported for all keys used in hash function
Hash key1 key2 key3
Hash based on key prefix
Hash key1 key2
Lookup based on key prefix
key1 = ‘value1’ and key2 = ‘value2’
18
• Partition based on key before write
• Use bulk upload for large volume update
Write Performance
Liyin Tang and Jingwei Lu19
Case Study
Liyin Tang and Jingwei Lu
Experiment realtime feedback
20
Update
Experiment
Assignment Event
LookupHBase
with TTL
Booking Event
Druid Datadog
one airstream
configjob 2 job 1
Realtime Ingestion on HBase
Data Infrastructure
MySQL
Analytical Events
KafkaSpark
Streamin HBase
HDFS Presto/Hive/Spark
Source
Ingest
Realtime Query
Snapshot
Batch Query
Liyin Tang and Jingwei Lu22
Access Data in HBase
Liyin Tang and Jingwei Lu
HBase
Hive PrestoSpark SQL
Spark Streaming
Batch Jobs Interactive Query Streaming
HDFSSnapshot
Table Mapping/Unifed View on realtime data
23
Case Study 1: Events Ingestion
Liyin Tang and Jingwei Lu
Kafka
topic
…
topic
topic
Spark
Executor1
…
Executor
Executor
HBase
DeD
up
HDFS Daily
Realtime
Hive
Presto
Events
Part
ition
25
Case Study 2: Streaming DB Export
Kafka RDS
Table1
…
Spinaltap.
…
Table2
TableN
Spinaltap.
Table2
Spinaltap.
TableN
Spark
Executor1
…
Executor2
Executor K
HBase
Region1
…
Region2
Region M
HDFS
Daily Snapshot
Realtime Query
Liyin Tang and Jingwei Lu26
Case Study: Streaming DB Export
Rows CF: Colums Version Value
<ShardKey><DB_TABLE_#1><PK_a=A> id Fri May 19 00:33:19 2016 101
<ShardKey><DB_TABLE_#1><PK_a=A> city Fri May 19 00:33:19 2016 San Francisco
<ShardKey><DB_TABLE_#1><PK_a=A> city Fri May 10 00:34:19 2016 New York
<ShardKey><DB_TABLE_#2><PK_a=A’> id Fri May 19 00:33:19 2016 1
Liyin Tang and Jingwei Lu27
Case Study: Streaming DB Export
TXN 1
Commit_TS: 101
…TXN 2
Commit_TS: 102
TXN 3
Commit_TS: 103
TXN N
Commit_TS: N’
Binlog Order
Liyin Tang and Jingwei Lu28
Case Study: Streaming DB Export
TXN 1
Commit_TS: 101
…TXN 2
Commit_TS: 103
TXN 3
Commit_TS: 102
TXN N
Commit_TS: N’
NTP
Binlog Order
Liyin Tang and Jingwei Lu29
Case Study: Streaming DB Export
TXN 1
Commit_TS: 101
…
Binlog Order
TXN 2
Commit_TS: 103
TXN 3
Commit_TS: 102
TXN N
Commit_TS: N’
Point-in-Time Restore on TS 102Liyin Tang and Jingwei Lu
30
Case Study: Streaming DB Export
Rows CF: Colums Version Value
<ShardKey><DB_TABLE_#1><PK_a=A> id bin100 101
<ShardKey><DB_TABLE_#1><PK_a=A> city bin101 San Francisco
<ShardKey><DB_TABLE_#1><PK_a=A> city bin102 New York
<ShardKey><DB_TABLE_#2><PK_a=A’> id bin100 1
Liyin Tang and Jingwei Lu31
Case Study: Streaming DB Export
Rows Version (Logical Offset) Value
<ShardKey><DB_TABLE_#1><2016-05-23 23><100> 100 mysql-bin.00000:100
<ShardKey><DB_TABLE_#1><2016-05-23 23><101> 101 mysql-bin.00000:101
<ShardKey><DB_TABLE_#1><2016-05-23 23><103> 103 mysql-bin.00000:103
<ShardKey><DB_TABLE_#1><2016-05-24 00><102> 102 mysql-bin.00000:102
Liyin Tang and Jingwei Lu32
Case Study: Streaming DB Export
Rows Version (Logical Offset) Value
<ShardKey><DB_TABLE_#1><2016-05-23 23><100> 100 mysql-bin.00000:100
<ShardKey><DB_TABLE_#1><2016-05-23 23><101> 101 mysql-bin.00000:101
<ShardKey><DB_TABLE_#1><2016-05-23 23><103> 103 mysql-bin.00000:103
<ShardKey><DB_TABLE_#1><2016-05-24 00><102> 102 mysql-bin.00000:102
Liyin Tang and Jingwei Lu33
Job Management: Scaling up
Config Driver Streaming Job
Yarn
Spark Jobs
…
Liyin Tang & Jingwei Lu
Config Driver Streaming Job
… … … …
Spark Jobs
Config Driver Streaming Job Spark Jobs
Spark Job 1
Spark Job2
Spark Job N
Concurrent
…
…
Liyin Tang & Jingwei Lu
Config Driver Streaming Job
Yarn
Job Management: Scaling up
Job Management: Fault Tolerant
Driver
Spark Job 1
Spark Job2
Spark Job N
Streaming Job
Concurrent
Yarn
…
…
Liyin Tang & Jingwei Lu
OffsetManagement
Mesos
Driver
Driver
Config
Config
Config
……
Checkpoint Rewind
Job Management: Monitoring & Alerting
Driver
Spark Job 1
Spark Job2
Spark Job NStreaming Job
Concurrent
Yarn
…
…AirStreamListener
Liyin Tang & Jingwei Lu
Summary
Liyin Tang and Jingwei Lu
Simplify and Unify Stream Batch Pipeline
Rich Stateful Computation
Rich Integration with Hadoop EcoSystem
Easy Operation
Top Related