Zero to InsightsReal time analytics with Kafka, C*, and Spark
Peter Bakas
Peter Bakas | @peter_bakas
@ Netflix : Cloud Platform Engineering - Event and Data Pipelines
@ Ooyala : Analytics, Discovery, Platform Engineering & Infrastructure
@ Yahoo : Display Advertising, Behavioral Targeting, Payments
@ PayPal : Site Engineering and Architecture
@ Play : Advisor to various Startups (Data, Security, Containers)
Who is this guy?
Let’s get down to business
Netflix is a logging company
that occasionally streams video
● 450 billion events per day
● 8 million events & 17 GB per second during peak
● Hundreds of event types
By the Numbers
Publish, Collect, Process, Aggregate & Move Data
What does it take to run @ Cloud Scale?
How did we get here?
EMR
EventProducer
Chukwa
What are we supposed to do?
EventProducer
Druid
EMR
Stream Consumers
KafkaRouter
Suro
EventProducer
EventProducer
Druid
EMR
Stream Consumers
KafkaRouter
Suro
EventProducer
Where are we going?
Stream Consumers
Router
EMR
FrontingKafka
EventProducer
Druid
ConsumerKafka
Keystone
Stream Consumers
Router
EMR
FrontingKafka
EventProducer
Druid
ConsumerKafka
Keystone
Stream Consumers
Router
EMR
FrontingKafka
EventProducer
Druid
ConsumerKafka
Keystone
Routing Service
++
Stream Consumers
Router
EMR
FrontingKafka
EventProducer
Druid
ConsumerKafka
Keystone
ConsumerKafka
Custom Apps
Real time processing
ConsumerKafka
Custom Apps
Real time processing
ConsumerKafka
Custom Apps
Real time processing
ConsumerKafka
Custom Apps
Real time processing
ConsumerKafka
Custom Apps
Real time processing
FrontingKafka
Ooyala’s experience
About Ooyala
Powering personalized video experiences across all screens.
● 5 billion events per day
● 1 billion videos per month
● 200 million unique users per month
● 130 countries
● 25% of U.S. online viewers watch video powered by Ooyala
By the Numbers
Where did it all start?
Precomputed Aggregates
What if we need more dynamic queries?
Why not just use C*?
What were the options?
100% Precomputation 100% Dynamic
Where we wanted to be
100% Precomputation 100% Dynamic
Partly dynamic
API
loggersloggersloggersloggersloggers
loggersloggersloggersloggersingest
loggersloggersloggersloggersjob server
Delphi - Real time AnalyticsKafka
● Hiring● Rapidly evolving ecosystem● Enterprise Service for Enterprise Software
Challenges
Obligatory...
Everyone is hiring
[email protected]