Uber Real Time Data Analytics
-
Upload
ankur-bansal -
Category
Engineering
-
view
612 -
download
0
Transcript of Uber Real Time Data Analytics
![Page 1: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/1.jpg)
Real Time Data Analytics @ Uber Ankur Bansal Apache Big Data EuropeNovember 14, 2016
![Page 2: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/2.jpg)
About Me
● Sr. Software Engineer, Streaming Team @ Uber○ Streaming team supports platform for real time data
analytics: Kafka, Samza, Flink, Pinot.. and plenty more○ Focused on scaling Kafka at Uber’s pace
● Staff software Engineer @ Ebay○ Build & scale Ebay’s cloud using openstack
● Apache Kylin: Committer, Emeritus PMC
![Page 3: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/3.jpg)
Agenda
● Real time Use Cases ● Kafka Infrastructure Deep Dive● Our own Development:
○ Rest Proxy & Clients○ Local Agent○ uReplicator (Mirrormaker)○ Chaperone (Auditing)
● Operations/Tooling
![Page 4: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/4.jpg)
Important Use Cases
![Page 5: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/5.jpg)
StreamProcessing
Real-time Price Surging
SURGEMULTIPLIERS
Rider eyeballs
Open car information
KAFKA
![Page 6: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/6.jpg)
Real-time Machine Learning - UberEats ETD
![Page 7: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/7.jpg)
![Page 8: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/8.jpg)
● Fraud detection● Share my ETA
And many more ...
![Page 9: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/9.jpg)
Apache Kafka is Uber’s Lifeline
![Page 10: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/10.jpg)
Kafka ecosystem @ Uber
![Page 11: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/11.jpg)
100s of billion
100s TB
Messages/day
bytes/day
Kafka cluster stats
Multiple data centers
![Page 12: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/12.jpg)
Kafka Infrastructure Deep Dive
![Page 13: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/13.jpg)
Requirements
● Scale to 100s Billions/day → 1 Trillion/day ● High Throughput ( Scale: 100s TB → PB)● Low Latency for most use cases(<5ms )● Reliability - 99.99% ( #Msgs Available /#Msgs Produced)● Multi-Language Support● Tens of thousands of simultaneous clients.● Reliable data replication across DC
![Page 14: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/14.jpg)
Kafka Pipeline
Local Agent
uReplicator
![Page 15: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/15.jpg)
Kafka Pipeline: Data Flow
1
2
3 5 7
64 8
![Page 16: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/16.jpg)
Kafka Clusters
Local Agent
uReplicator
![Page 17: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/17.jpg)
Kafka Clusters
● Use case based clusters○ Data (async, reliable)○ Logging (High throughput)○ Time Sensitive (Low Latency e.g. Surge, Push
notifications)○ High Value Data (At-least once, Sync e.g. Payments)
● Secondary cluster as fallback ● Aggregate clusters for all data topics.
![Page 18: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/18.jpg)
Kafka Clusters
● Scale to 100s Billions/day → 1 Trillion/day ● High Throughput ( Scale: 100s TB → PB)● Low Latency for most use cases(<5ms )● Reliability - 99.99% ( #Msgs Available /#Msgs Produced)● Multi-Language Support● Tens of thousands of simultaneous clients.● Reliable data replication across DC
![Page 19: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/19.jpg)
Kafka Rest Proxy
Local Agent
uReplicator
![Page 20: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/20.jpg)
Why Kafka Rest Proxy ?
● Simplified Client API ● Multi-lang support (Java, NodeJs, Python, Golang)● Decouple client from Kafka broker
○ Thin clients = operational ease○ Less connections to Kafka brokers○ Future kafka upgrade
● Enhanced Reliability○ Primary & Secondary Kafka Clusters
![Page 21: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/21.jpg)
Kafka Rest Proxy: Internals
![Page 22: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/22.jpg)
Kafka Rest Proxy: Internals
![Page 23: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/23.jpg)
Kafka Rest Proxy: Internals
● Based on Confluent’s open sourced Rest Proxy ● Performance enhancements
○ Simple http servlets on jetty instead of Jersey ○ Optimized for binary payloads. ○ Performance increase from 7K* to 45-50K QPS/box
● Caching of topic metadata. ● Reliability improvements*
○ Support for Fallback cluster ○ Support for multiple Producers (SLA based segregation)
● Plan to contribute back to community
*Based on benchmarking & analysis done in Jun ’2015
![Page 24: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/24.jpg)
Rest Proxy: performance (1 box)
Message rate (K/second) at single node
End-
end
Late
ncy
(ms)
![Page 25: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/25.jpg)
Kafka Clusters + Rest Proxy
● Scale to 100s Billions/day → 1 Trillion/day ● High Throughput ( Scale: 100s TB → PB)● Low Latency for most use cases(<5ms )● Reliability - 99.99% ( #Msgs Available /#Msgs Produced)● Multi-Language Support● Tens of thousands of simultaneous clients.● Reliable data replication across DC
![Page 26: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/26.jpg)
Kafka Clients
Local Agent
uReplicator
![Page 27: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/27.jpg)
Client Libraries
● Support for multiple clusters. ● High Throughput
○ Non-blocking, async, batching ○ <1ms produce latency for clients○ Handles Throttling/BackOff signals from Rest Proxy
● Topic Discovery○ Discovers the kafka cluster a topic belongs ○ Able to multiplex to different kafka clusters
● Integration with Local Agent for critical data
![Page 28: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/28.jpg)
Client Libraries
Add Figure
What if there is network glitch / outage?
![Page 29: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/29.jpg)
Client Libraries
Add Figure
![Page 30: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/30.jpg)
Kafka Clusters + Rest Proxy + Clients
● Scale to 100s Billions/day → 1 Trillion/day ● High Throughput ( Scale: 100s TB → PB)● Low Latency for most use cases(<5ms )● Reliability - 99.99% ( #Msgs Available /#Msgs Produced)● Multi-Language Support● Tens of thousands of simultaneous clients.● Reliable data replication across DC
![Page 31: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/31.jpg)
Local Agent
Local Agent
uReplicator
![Page 32: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/32.jpg)
Local Agent
● Local spooling in case of downstream outage/backpressure● Backfills at the controlled rate to avoid hammering
infrastructure recovering from outage● Implementation:
○ Reuses code from rest-proxy and kafka’s log module. ○ Appends all topics to same file for high throughput.
![Page 33: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/33.jpg)
Local Agent Architecture
Add Figure
![Page 34: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/34.jpg)
Local Agent in Action
Add Figure
![Page 35: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/35.jpg)
Kafka Clusters + Rest Proxy + Clients + Local Agent
● Scale to 100s Billions/day → 1 Trillion/day ● High Throughput ( Scale: 100s TB → PB)● Low Latency for most use cases(<5ms )● Reliability - 99.99% ( #Msgs Available /#Msgs Produced)● Multi-Language Support● Tens of thousands of simultaneous clients.● Reliable data replication across DC
![Page 36: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/36.jpg)
uReplicator
Local Agent
uReplicator
![Page 37: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/37.jpg)
Multi-DC data flow
![Page 38: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/38.jpg)
CONFIDENTIAL
>> INSERT SCREENSHOT HERE <<
Mirrormaker : existing problems
● New Topic added● New partitions added● Mirrormaker bounced● New mirrormaker added
![Page 39: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/39.jpg)
uReplicator: In-house solution
ZookeeperHelix MMController
HelixAgent Thread 1
Thread NTopic-partition
HelixAgent Thread 1
Thread NTopic-partition
HelixAgent Thread 1
Thread NTopic-partition
MM worker1 MM worker2 MM worker3
![Page 40: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/40.jpg)
uReplicator
ZookeeperHelix MMController
HelixAgent Thread 1
Thread NTopic-partition
HelixAgent Thread 1
Thread NTopic-partition
HelixAgent Thread 1
Thread NTopic-partition
MM worker1 MM worker2 MM worker3
![Page 41: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/41.jpg)
Kafka Clusters + Rest Proxy + Clients + Local Agent
● Scale to 100s Billions/day → 1 Trillion/day ● High Throughput ( Scale: 100s TB → PB)● Low Latency for most use cases(<5ms )● Reliability - 99.99% ( #Msgs Available /#Msgs Produced)● Multi-Language Support● Tens of thousands of simultaneous clients.● Reliable data replication across DC
![Page 42: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/42.jpg)
uReplicator
● Running in production for 1+ year ● Open sourced: https://github.com/uber/uReplicator● Blog: https://eng.uber.com/ureplicator/
![Page 43: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/43.jpg)
Chaperone - E2E Auditing
![Page 44: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/44.jpg)
Chaperone Architecture
![Page 45: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/45.jpg)
CONFIDENTIAL
>> INSERT SCREENSHOT HERE <<
Chaperone : Track counts
![Page 46: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/46.jpg)
CONFIDENTIAL
>> INSERT SCREENSHOT HERE <<
Chaperone : Track Latency
![Page 47: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/47.jpg)
Chaperone
● Running in production for 1+ year ● Planning to open source in ~2 Weeks
![Page 48: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/48.jpg)
At-least Once Kafka
![Page 49: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/49.jpg)
Why do we need it?
1
2
3 5 7
64 8
● Most of infrastructure tuned for high throughput ○ Batching at each stage ○ Ack before produce (ack’ed != committed)
● Single node failure in any stage leads to data loss● Need a reliable pipeline for High Value Data e.g. Payments
![Page 50: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/50.jpg)
How did we achieve it?
● Brokers:○ min.insync.replicas=2, can only torrent one node failure○ unclean.leader.election= false, need to wait until the old
leader comes back● Rest Proxy:
○ Partition Failover● Improved Operations:
○ Replication throttling, to reduce impact of node bootstrap ○ Prevent catching up nodes to become ISR
![Page 51: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/51.jpg)
Operations/Tooling
![Page 52: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/52.jpg)
Partition Rebalancing
Add Figure
![Page 53: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/53.jpg)
Partition Rebalancing
● Calculates partition imbalance and inter-broker dependency.
● Generates & Executes Rebalance Plan.
● Rebalance plans are incremental, can be stopped and resumed.
● Currently on-demand, Automated in the future.
![Page 54: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/54.jpg)
XFS vs EXT4
Add Figure
![Page 55: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/55.jpg)
Summary: Scale
● Kafka Brokers:○ Multiple Clusters per DC○ Use case based tuning
● Rest Proxy to reduce connections and better batching● Rest Proxy & Clients
○ Batch everywhere, Async produce ○ Replace Jersey with Jetty
● XFS
![Page 56: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/56.jpg)
Summary: Reliability
● Local Agent ● Secondary Clusters ● Multi Producer support in Rest Proxy● uReplicator ● Auditing via Chaperone
![Page 57: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/57.jpg)
Future Work
● Open source contribution○ Chaperone○ Toolkit
● Data Lineage● Active Active Kafka● Chargeback● Exactly once mirroring via uReplicator
![Page 59: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/59.jpg)
Extra Slides
![Page 60: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/60.jpg)
Kafka Durability (acks=1)
Broker 1
100
101
102
103
Broker 2
100
101
Broker 3
100
101
Leader
Committed
Producer
Acked
![Page 61: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/61.jpg)
Kafka Durability (acks=1)
Broker 1
100
101
102
103
Broker 2
100
101
Broker 3
100
101
Leader
Committed
Producer
Failed
Acked
![Page 62: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/62.jpg)
Kafka Durability (acks=1)
Broker 1
100
101
102
103
Broker 2
100
101
Broker 3
100
101
Leader
Committed
Producer
![Page 63: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/63.jpg)
Kafka Durability (acks=1)
Broker 1
100
101
102
103
Broker 2
100
101
104
105
106
Broker 3
100
101
104
105
Leader
Committed
Producer
Old HW
![Page 64: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/64.jpg)
Kafka Durability (acks=1)
Broker 1
100
101
102
103
Broker 2
100
101
104
105
106
Broker 3
100
101
104
105
Leader
Committed
Producer
X
Old HWX
![Page 65: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/65.jpg)
Kafka Durability (acks=1)
Broker 1
100
101
104
105
106
Broker 2
100
101
104
105
106
Broker 3
100
101
105
106
Leader
Committed
Producer
data loss!!
![Page 66: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/66.jpg)
Distributed Messaging system
* Supported in Kafka 0.8+
● High throughput● Low latency● Scalable● Centralized● Real-time
![Page 67: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/67.jpg)
What is Kafka?
● Distributed● Partitioned● Replicated● Commit Log
Broker 1 Broker 2 Broker 3
ZooKeeper
![Page 68: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/68.jpg)
What is Kafka?
● Distributed● Partitioned● Replicated● Commit Log
Broker 1
Partition 0
Broker 2
Partition 1
Broker 3
Partition 2
ZooKeeper
![Page 69: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/69.jpg)
What is Kafka?
● Distributed● Partitioned● Replicated● Commit Log
Broker 1
Partition 0
Partition 2
Broker 2
Partition 1
Partition 0
Broker 3
Partition 2
Partition 1
ZooKeeper
![Page 70: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/70.jpg)
What is Kafka?
● Distributed● Partitioned● Replicated● Commit Log
Broker 1
Partition 0
0 1 2 3
Partition 2
0 1 2 3
Broker 2
Partition 1
0 1 2 3
Partition 0
0 1 2 3
Broker 3
Partition 2
0 1 2 3
Partition 1
0 1 2 3
ZooKeeper
![Page 71: Uber Real Time Data Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051520/586f758f1a28ab10258b60db/html5/thumbnails/71.jpg)
Kafka Concepts