Monitoring Your AWS Cloud Infrastructure
-
Upload
newvewm -
Category
Technology
-
view
1.554 -
download
2
description
Transcript of Monitoring Your AWS Cloud Infrastructure
AWS Usergroup GreeceAWS Usergroup Greece
Monitoring your cloud-hosted appMonitoring your cloud-hosted app
18/07/201218/07/2012
Andreas ChatzakisAndreas Chatzakis@achatzakis on twitter@achatzakis on twitter
2
whoami
Andreas Chatzakis CTO & co-founder /
High traffic Greek Real Estate portal Software delivery team management IT Operations
co-founder of AWS Usergroup Greece
@achatzakis
3
Why monitoring
Detect problems before (many) users are aware Alerts and notifications at 3 AM Be informed of issues you wouldn't be able to recreate Collect data to discover root cause of an incident
...and automate response for next time Statistics and KPIs to track service quality trends Visibility to prioritize optimization efforts Make sense out of large quantity of logs and data
You need monitoring to proact or react to availability & performance risks and issues:
4
Monitoring in the cloud
Cloud allows us to build highly dynamic setups More data Our tools need to adapt Ephemeral resources require centralized approach
Need aggregation based on server role Cloud promises agility
Only possible when cost of failure is low Being able to spot issues in a more automated manner is key
The rise of the devops Developers need visibility to understand how their code affects costs and impacts availability
Principles are not that diverse from traditional infrastructure but...
5
Types of monitoring
External checks (is my app still up?) Server monitoring (CPU, RAM, IO...) Systems monitoring (mySQL, Apache etc metrics) Process monitoring (restart crashed services) Application monitoring (bottlenecks in the code) End user monitoring (client side performance) Log aggregation & analysis (centralize storage) Cloud Analytics (do I make the most out of AWS?)
There is a variety of monitoring tools that complement each other
6
Deployment models
Agent vs Agent-less SaaS vs DIY on own computing instances
Consider different AZ or provider Least privilege principle (e.g. read-only access to agent)
Consider the deployment model of each monitoring solution
7
Pricing models
Freeware Per host Per host-hour Per user Per alert Per stored Gbyte
Different pricing models offered by the various solutions
8
External testsExternal tests detect failure & alert you so that you react
Treats your app as a black box Periodic check from a bot Define expected response (specific string) Tests from different geographies Report on average response time, latency etc Alert via email, sms, phone
9
Server & Systems monitoringServer monitoring collects data from OS and Systems
Server metrics (CPU, Load Average, RAM, IO activity) System metrics (Apache status, MySQL connections...) Typically works via an agent or remote access Can point towards root cause
But can't trace issues to specific parts of your code Helps with capacity planning and scaling decisions
10
Process monitoringProcesses die or misbehave... Monitor their health and automate response
Tools that check critical processes Restart if crashed process
...or those using too many resources Can configure complex scenarios Beware of false positives Beware of recurring restarts
11
Application monitoringA 'Flight recorder' for your code helps you fix real issues.
It is often hard to recreate a production issue. Plugs into your app servers & tracks execution Code tracing
Captures errors, input variables and debugging info
Records performance metrics Time spent on DB, Cache, external services Overhead of specific classes or methods Slow queries
12
End user monitoringGet real data about the experience of your app's users
It works for you. Does it work for them? Servers running ok. What about that 3rd party widget? Typically collects actual end user data via js Capture performance issues faced by user segments
OS / browser / addons Network connection speed Geographical location First time visit VS warm browser cache
13
Log aggregatorsCentralized storage of logs for cloud setups with ephemeral instances
Logs are sent over to centralized repository Persists after server has been decomissioned Logs are captured, stored, archived & recycled Logs are indexed and analyzed Preconfigured analyzers for known apps Free text analyzers for less known apps Alerts based on specific patterns, frequencies
14
Swiss knivesThe future might belong to holistic monitoring solutions
Monitoring at multiple levels Correlating data can be a godsend for devops
Cloud management tools might move to integrate or provide such functionality
15
A common pitfallWhile it does have its uses, you should not rely on custom application logging
Typically inconsistent logging that is added reactively
Developer bias and lack of operational issues understanding
logging what you anticipate to go wrong Increased code maintenance costs and risks Can hurt performance if you are not careful Instead use a proper monitoring toolset
let developers focus on building new functionality
16
Cloud AnalyticsCombine traditinal monitoring with Newvem's Analytics and make the most of the cloud
Powerful analytics of cloud usage data Reveal security & availability issues in your cloud infra
Get actionable insights Identify opportunities for cost reductions Spot overloaded resources requiring vertical or horizontal scaling
Visibility and confidence you making the most of the cloud
17
18
Questions
?