Monitorama 2015 Monitoring OpenConnect CDN
-
Upload
sergey-fedorov -
Category
Technology
-
view
1.216 -
download
0
Transcript of Monitorama 2015 Monitoring OpenConnect CDN
Monitoring OpenConnect CDN
Sergey Fedorov, NetflixMonitorama 2015
Sergey Fedorov, Netflix, Monitorama 2015
What is OpenConnect
36.5%
US downstream traffic *
* 2015 Sandvine reportSergey Fedorov, Netflix, Monitorama 2015
OpenConnect Cache Appliance
Space/Power optimized10/40Gbs network interfaceFreeBSD OSNGinx serverBird routing proxy
Gizmodo, “This box can hold an entire Netflix” http://gizmodo.com/this-box-can-hold-an-entire-netflix-1592590450Sergey Fedorov, Netflix, Monitorama 2015
Network
Transit
Internet Exchange
ISP embedded
Sergey Fedorov, Netflix, Monitorama 2015
Sergey Fedorov, Netflix, Monitorama 2015
Intelligent clients
Control Plane
end-user content request router
client locationnetwork conditionsserver utilizationcontent distribution
Sergey Fedorov, Netflix, Monitorama 2015
Who we are
Sergey Fedorov Stefan PraszalowiczSergey Fedorov, Netflix, Monitorama 2015
Monitoring challenge
Testing in prod*
Network changesFirmware deploymentsApp pushesUpdating content...
Sergey Fedorov, Netflix, Monitorama 2015
Sergey Fedorov, Netflix, Monitorama 2015
CachesClients
Control Plane
Microservices
Network
Capacity
Config
Content
Telemetry (Atlas)Logs (ElasticSearch)
Data sources
METRICS
Something breaks all the time
Big problems start small
Context matters
Sergey Fedorov, Netflix, Monitorama 2015
Sergey Fedorov, Netflix, Monitorama 2015
Small SRE team
Elastic
How we do it
Netflix Clients Caches Network ConfigData sources ......
...
Sergey Fedorov, Netflix, Monitorama 2015
Netflix Clients Caches Network ConfigData sources ......
...
Orchestration
Data processing
stream processorspollers
Sergey Fedorov, Netflix, Monitorama 2015
FSMState processing
Netflix Clients Caches Network ConfigData sources ......
...
Orchestration
Data processing
stream processorspollers
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
start fixing end fixing
action: okfrom: cpu
threshold=75%
MAINTENANCE
Sergey Fedorov, Netflix, Monitorama 2015
start fixing end fixing
action: okfrom: cpu
threshold=75%
MAINTENANCE
Sergey Fedorov, Netflix, Monitorama 2015
start fixing end fixing
action: okfrom: cpu
threshold=75%
MAINTENANCE
Sergey Fedorov, Netflix, Monitorama 2015
start fixing end fixing
action: okfrom: cpu
threshold=75%
MAINTENANCE
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: silencefrom: config
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: okfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: silencefrom: config
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: breakfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: breakfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: breakfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: breakfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: breakfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: breakfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: okfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: unsilencefrom: config
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: okfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: okfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: okfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: okfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: breakfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: breakfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: start_fixfrom: user
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: breakfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: breakfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: breakfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: breakfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: breakfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: breakfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: breakfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: okfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: okfrom: cpu
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
action: end_fixfrom: user
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
MAINTENANCE
start fixing end fixing
threshold=75%
Sergey Fedorov, Netflix, Monitorama 2015
FSMState processing
Netflix Clients Caches Network ConfigData sources ......
...
Orchestration
Data processing
stream processorspollers
Sergey Fedorov, Netflix, Monitorama 2015
FSMState processing
Netflix Clients Caches Network ConfigData sources ......
...
Orchestration
Data processing
stream processorspollers
Events processingEvent handlers
STATE TRANSITION EVENT● OLD STATE● NEW STATE● Input action● Metric name● Action metadata
○ metric value○ comments○ tags○ timestamp○ ...
Event handlers
Triggers an event
Event handlersRULES
Sergey Fedorov, Netflix, Monitorama 2015
Sergey Fedorov, Netflix, Monitorama 2015
Events priority
Escalation
Do Never
Notice
Warning
Critical
Severity
Info
Do Next
Do Last
Do Now
0 1 2 3
Notice
Warning
Critical
Severity
Info
0 1 2 3Escalation
Notice
Warning
Critical
Severity
Info
0 1 2 3
Notifications
Sergey Fedorov, Netflix, Monitorama 2015
FSMState processing
Netflix Clients Caches Network ConfigData sources ......
...
Orchestration
Data processing
stream processorspollers
Events processingEvent handlers
Aggregation
C
ClusterCache state = aggregation of states of its metrics
Cluster state = aggregation of states of its caches
OK all OK DEGRADED some BROKEN or DEGRADEDBROKEN most BROKEN
All caches are OK → cluster state is OK
Sergey Fedorov, Netflix, Monitorama 2015
Aggregation
C
Cluster OK all OK DEGRADED some BROKEN or DEGRADEDBROKEN most BROKEN
2/12 caches are BROKEN → cluster state is DEGRADED
Sergey Fedorov, Netflix, Monitorama 2015
Aggregation
C
Cluster OK all OK DEGRADED some BROKEN or DEGRADEDBROKEN most BROKEN
7/12 caches are BROKEN → cluster state is BROKEN
Sergey Fedorov, Netflix, Monitorama 2015
FSMState processing
Netflix Clients Caches Network ConfigData sources ......
...
Orchestration
Data processing
stream processorspollers
Events processingEvent handlers
Challenges
Setup
Sergey Fedorov, Netflix, Monitorama 2015
Challenges
SetupPredefined groupings
Sergey Fedorov, Netflix, Monitorama 2015
Challenges
SetupPredefined groupingsUI
Sergey Fedorov, Netflix, Monitorama 2015
Challenges
SetupPredefined groupingsUIIssues correlation
Sergey Fedorov, Netflix, Monitorama 2015
Challenges
SetupPredefined groupingsUIIssues correlationFailure forecasting
Sergey Fedorov, Netflix, Monitorama 2015
Challenges
SetupPredefined groupingsUIIssues correlationFailure forecastingOSS
Sergey Fedorov, Netflix, Monitorama 2015
Feedback
jobs.netflix.com/jobs/1693/
jobs.netflix.com/jobs/2240/
Sergey FedorovOpenConnect, [email protected]