© 2018 eBay. All rights reserved. © 2018 eBay. All rights reserved. • User-centric • Fast •...
Transcript of © 2018 eBay. All rights reserved. © 2018 eBay. All rights reserved. • User-centric • Fast •...
© 2018 eBay. All rights reserved.
© 2018 eBay. All rights reserved.
Some Numbers:
Application Monitoring @Scale
Eitan Schichmanter, 2018.10.31
© 2018 eBay. All rights reserved.
• Pivotal Unit in ebay
• Ebay’s Catalog
• Backend - all around service
• We’re down - everyone’s down
Structured Data
© 2018 eBay. All rights reserved.
What’s it all About?
System Monitoring1
Application Monitoring 2
Business Monitoring3
© 2018 eBay. All rights reserved.
Journey
Reactive3
Proactive 2
Predictive1
3
2
1
1
2
3
© 2018 eBay. All rights reserved.
• User-centric
• Fast
• Scaleable & Expandable
• Maintainable
Requirements
© 2018 eBay. All rights reserved.
Challenge #1: It’s all about the KPI’s, man!
Define KPIs to use:• Latency, Request Count, Error/Exception Count, Pipeline vs. Pool vs. Host KPIs
Separating the wheat from the chaff:• Filter white noise (“All metrics are equal, but some metrics are more equal than others”)
• Focus on what you’re looking for and brings value (less is more)
https://www.dictionary.com/browse/all-animals-are-equal--but-some-animals-are-more-equal-than-others
© 2018 eBay. All rights reserved.
Challenge #2: scale, Scale, SCALE!
1st Attempt (oh, the naïveté..):• Prometheus, Grafana, K8S, etcd.
• Result:⁃ Not scalable (aggregations) ⁃ Doesn’t handle load⁃ No HA (no easy way to rotate DS)
© 2018 eBay. All rights reserved.
Challenge #2: scale, Scale, SCALE!
2nd Attempt:• Influx, remote_read
• Result: It works, but…⁃ FW bottleneck⁃ InfluxDB performance⁃ Remote_read performance⁃ InfluxDB data persistency
© 2018 eBay. All rights reserved.
Challenge #2: scale, Scale, SCALE!
3nd Attempt:• HAProxy, Promxy, NGINX,
InfluxDB optimizations• Poor-man’s sharding for
Prometheus, HA for InfluxDB
• Result: Aha! Moment
*
* We're evaluating M3DB as our persistence layer
© 2018 eBay. All rights reserved.
Challenge #2: scale, Scale, SCALE!
Query from 10.16.2018, 5 days’ data:
© 2018 eBay. All rights reserved.
Challenge #3: I want to break free!
Display production dashboards on hall monitors. Sounds easy, right?
© 2018 eBay. All rights reserved.
Challenge #3: I want to break free!
Solution:HAproxy to the rescue!
• Create a scalable single point of failure
• NGinx for “poor-man’s sharding”• Promxy for fast querying
© 2018 eBay. All rights reserved.
Challenge #4: Business Metrics
Data retention (over 14 days)
Data aggregation
Business metrics galore
© 2018 eBay. All rights reserved.
Challenge #4: Business Metrics
© 2018 eBay. All rights reserved.
Challenge #5: Microservices. Microservices Everywhere!
Opentracing
© 2018 eBay. All rights reserved.
Challenge #6: So, what’s next?
Canary Release
• Leverage system and application monitoring metrics
• Run a benchmark, version (X+1) vs. the rest of the pool (X)
• Deployment strategy: 1 15% 30% 45% 100%
Thank You!