Post on 23-Jun-2015
description
www.applicationperformance.com
London Web Performance Meet-up15 February 2012
APM across the lifecycleor
APM and the Holy Grail
Martin PinnerApplication Performance Ltd
www.applicationperformance.com
Who am I?
• 14 years in the APM space• Started with one of the APM
pioneers – Precise• Co-founder of Application
Performance Ltd• Represented or encountered many
APM solutions• Interested in making applications go
faster
www.applicationperformance.com
What am I going to talk about?
What is APM and why do we do it?The Holy Grail of APMWhat are the products?APM next generation
www.applicationperformance.com
What is APM?
Application performance management, or APM, refers to the discipline … that focuses on monitoring and managing the performance and service availability of software applications – Wikipedia
www.applicationperformance.com
Gartner defines five areas
End-user experience monitoringRuntime application architecture
discovery, modelling and displayUser-defined transaction profilingComponent deep-dive monitoring in
application contextAnalytics
www.applicationperformance.com
Market spend
APM market is worth over £1 billionGrowing 15% per year
www.applicationperformance.com
Why do we do it?
Web apps just can’t be slowEnsure that we deliver a consistent
quality of serviceReduce the time (and cost) to identify
and fix performance problemsSave money and brand reputation“What you can’t measure, you can’t
manage”Buying faster tin doesn’t always work
www.applicationperformance.com
How do we do it?
Measure resource consumptionMeasure response timeWhich is better…?Response time is generally better
but resource consumption is still useful
Many tools give you resource consumption (endless counters)
www.applicationperformance.com
How did we get here?1990
2000
2010
NSM – Network and Systems Management
IPM – Infrastructure Performance ManagementSLM – Service Level Management
APM – Application Performance Management
RUM – Real User Monitoring
BPM – Business Process MonitoringBTM – Business Transaction Monitoring
www.applicationperformance.com
Who’s who?Gartner Magic Quadrant
www.applicationperformance.com
The Holy Grail
Development Test & QA Production
www.applicationperformance.com
The Holy Grail: we want to…
Satisfy the needs of the businessTalk in terms of the application and not the infrastructureUse the business transaction as our unit of measureKnow how long it takesSegment the time into the different layers of the
application Identify the root causes of issues wherever they occur
Implies we follow business transactions from the browser, through the network, switches, firewalls, load balancers, web servers, application servers, middleware, databases, storage and external systems…
www.applicationperformance.com
The Holy Grail: the technology
We want it to be low-cost (or even free!)We want it to be easy to deployWe want it to discover our infrastructureWe want it to learn our application
behaviourWe want the minimum of configurationWe want a single version of the truthWe want zero overhead
www.applicationperformance.com
The Holy Grail: different requirements
OperationsManagersDevelopers and testersConsultants
www.applicationperformance.com
The Holy Grail: operations
We want a simple green traffic light to indicate all is well
But with the ability to dive in and pinpoint the exact cause should the lights go red
We want real-time alerts (e-mail, SMS, integrations)
We want incident handling and workflowIf we must, we need to know about
Service Level Agreement (SLA) violationsWe quite like dashboards
www.applicationperformance.com
The Holy Grail: managers
We want reportsWe need to see the historyWe want an aggregated viewWe want to slice and dice to our hearts
contentWe love dashboards
www.applicationperformance.com
The Holy Grail: developers and testers
We want to profile the performance with as much detail as required (deep-dive)
We want to understand what operations are complaining about
We aspire to be agile (DevOps)
www.applicationperformance.com
The Holy Grail: consultants
What about consultants like me?I haven’t got time to drill down on each
problematic transaction (it might not happen while I am there)
I want a developer-like tool with good aggregation and drill-down capabilities that I can use in production
Ideally I want to find a missing database index that gives an order of magnitude performance improvement!
www.applicationperformance.com
The reality: the technology
We can’t do true business transactions but we can approximate with request-response pairs
There is an overhead, roughly proportional to the amount of detail you want
We can measure time quite accurately except perhaps in virtual environments
But we could be timing a page or URL or method or SQL – our unit of measure changes unless we can tie it altogether
www.applicationperformance.com
The reality: physics
The “Heisenberg uncertainty principle” of APM:
You cannot monitor the performance of something without affecting its performance
Relativistic effects:Time may run more slowly in a virtual
machine
www.applicationperformance.com
The reality: the organisation
We are silo-basedNetwork people don’t speak to web server
people don’t speak to application server people…
We have / want our own toolsWe cannot justify holistic toolsThird parties provide much of the
infrastructure – we don’t touch that
www.applicationperformance.com
The reality: costs
Licence costsHardware and pre-requisite software costsDeployment and configuration costsOngoing maintenance costs
www.applicationperformance.com
The Holy Grail
Development Test & QA Production
www.applicationperformance.com
Survey
What web servers are you using?Apache? IIS?Other?
What application servers are you using?PHP? .NET? Java?Other?
What databases are you using?MySQL?SQL Server?Oracle?Other?
Any other technologies – message queues, big data?
www.applicationperformance.com
End-user experience
This is the ideal starting point for your end-to-end APM
Typically gives page and URL metrics
www.applicationperformance.com
Development
www.webpagetest.org – for internet sitesdynaTrace AJAX edition – free tool, very
good for JavaScript performanceOther browser plug-ins – Firebug, Debug
Bar, Chrome developer tools, HTTPWatch (basic edition is free) are all good sources of waterfall charts
Proxies – Fiddler, BrowserMobTest one browser at a time
www.applicationperformance.com
Load testing
On-premise – HP LoadRunner (supports non-web as well, good reports), SilkPerformer
Many free ones including JMeter, Selenium, …
Cloud testing – SOASTA, SOAtest, LoadRunner, Keynote, LoadStorm, Load Impact, Site Confidence, BrowserMob, NeoLoad (some are freemium)
www.applicationperformance.com
Load testing
Won’t tell you why the back-end is slowResults may not be what happens in live!You could use real data, suitably massaged
– e.g. Atomic Labs Pion workflow engine
www.applicationperformance.com
Synthetic transactions in production
Closely related to load testingBrowserMob (NEUstar) uses Selenium scripts for
bothSite Confidence (biggest in UK), Gomez
(Compuware), KeynoteTest key transactions at various locationsComplementary to capturing real dataWhat if all your users log in at 9am? Is your site
available at 8:55? No real data to useOften restricted to simple, read-only transactionsQuickly go out of date – require a lot of
maintenance
www.applicationperformance.com
Real transactions in production
Most tools use JavaScript to time the pageGoogle Analytics Site Speed – free, but sampled (1%), low
granularity (simple average), not real-timeOther open source – Jiffy, Boomerang (Yahoo!), Episodes
(Steve Souders)WebTuna, Precise, New Relic – not free but without above
restrictionsNow use web timing and navigation timing APIs in
Chrome, Firefox and IE9 In older browser timing is less accurate, particularly for
first view, when the JavaScript has not been downloaded yet
SaaS is common but on-premise available as well
www.applicationperformance.com
Real transactions in production
End users? – qualitative rather than quantitative
Aternity, Knoa – supports any Windows application, very good at correlation but eye-watering prices
www.applicationperformance.com
RUM – Real User Monitoring
Combination of end-user experience and network
Coradiant (BMC), HP, Triometric (UK)Tealeaf, Atomic Labs (have session replay)Network tap or mirror port (span port) or
software (for use in the cloud)Low overhead
www.applicationperformance.com
RUM – Real User Monitoring
Network component gives you URL response time but not easy to relate it to page response
Gateways and proxies can limit visibilityCan make it look like you only have one (busy)
customerMay be able to get originating IP from headers
but timings will probably be outTherefore often includes a JavaScript component
to tell you when the page has completed and who is the real user
www.applicationperformance.com
Network
Usually an appliance, often monitoring a network tap or span port
Packet capture + network flow analysisNetFlow (Cisco), sFlow, J-FlowGives you usage: web, e-mail, file sharing, video,
VoIPRiverbed Cascade, NetX, Solar Winds, OpNetSome offer packet-shaping (bandwidth control)Free tools: Wireshark, tcpdump based on packet
capture (pcap)
www.applicationperformance.com
Framework
Do a bit of everythingMainly to be found in production but may cover test and
devBasic ones use SNMP to capture network stats,
perfmon/WMI, OS commands via SSH to capture CPU, disk and memory
More sophisticated ones have plug-ins for different technologies – Cisco, VMware, Exchange, application servers, databases etc
Strengths – baselines, propagation of alerts, workflow, incident handling, SLAs and customisable dashboards
Weaknesses – lots of counters (wide but not deep), can take months to configure, “alert storms”
www.applicationperformance.com
Framework
Some of the lightweight, free ones are Cricket, Graphite and Splunk
They concentrate on the presentation layer (graphs)
More fully featured free ones are Nagios and Zenoss
The big players are IBM Tivoli, HP OpenView, Microsoft SCOM, CA Unicenter, BMC Patrol, Nimsoft, IG, SolarWinds
One of the better ones IMHO was IndicativeService model built around a tree, alerts would
automatically propagateBought by Nimsoft and now disappeared
www.applicationperformance.com
Web servers
Apache mod or ISAPI filter – PreciseMay just use mod_status – dynaTrace, New
Relic, AppDynamicsUsually fairly lightweightThe assumption must be that web servers
are not the main cause of bottlenecks
www.applicationperformance.com
Application servers
Monitoring typically implemented using “instrumentation”
Extra timing code automatically added to the application
Tag and follow HTTP, JMS protocols etc in order to track transactions between application tiers
www.applicationperformance.com
Application servers
Overhead is the big challenge hereWould like 1-2% in production but may
accept 30% in test and 100% in development
Instrumentation may complicate exception handling, optimisation method A ()
{ stopwatch.start ();
for (i = 1 to 1000) B ();
stopwatch.stop ();}
www.applicationperformance.com
Application servers (development)
Can afford the overhead so can go for the greatest level of detail
Typically use profilers
www.applicationperformance.com
Application servers (test)
One of the best for testing is probably dynaTrace (Compuware)
Captures every transaction (PurePaths) but high overhead as a consequence
Need to be careful using it production
www.applicationperformance.com
Application servers (production)
1st generation: Wily (CA), PreciseWide and deep but not originally designed for
auto-discovery in service-orientated, virtualised environments
2nd generation: New Relic, OpTier, AVIcode (wide but not so deep), dynaTrace, AppDynamics (wide and deep)
AppDynamics ‘Lite’ version is free but limited (freemium model)
Betfair, Netflix – 1000’s of instancesHave mechanisms to limit overhead
www.applicationperformance.com
Database
Vendor tools (Oracle, SQL Server and MySQL) dominate but not integrated with other tiers
Tend to measure statistics e.g. cache-hit ratio More important are wait statesPrecise – lots of detail, heavy infrastructureQuest Spotlight – easy on the eyeDBTuna, Confio – lightweight, easy to useDBTuna particularly good for load testing – 1
minute aggregation, load test comparisons
www.applicationperformance.com
Big Data
No monitoring products that I know ofCan be monitored indirectly from the
application tier
www.applicationperformance.com
Storage
Vendor tools typically measure I/O counters
True APM relates I/O to activity e.g. SQLEMC – PreciseNetApp – DBTuna
www.applicationperformance.com
Survey
What APM tools do you use?None?One of the big vendors – IBM, HP, Microsoft?Point solutions?A mixture?
www.applicationperformance.com
Limitations of request-responseAsynchronous e.g. Enterprise
Service BusParallel processingEffort v elapsed timeVariability in response time
(particularly for mobile)Very low latency (milliseconds)More technology e.g. web socketsMore agile / faster change
APM next generation challenges
www.applicationperformance.com
APM next generation challenges
More auto-scalingTighter integration between monitoring and workflow /
provisioning (but spikes in demand cannot be predicted)More SaaS – end-user experience (WebTuna, Atomic
Labs), net-flow (Boundary Networks, NetX)More cloud – load testing, synthetic transactions (Site
Confidence, Gomez, Keynote)But how do you get Amazon, Rackspace etc to monitor
your infrastructure?Continuous load testing – traditional tools require ramp-
up, load and stop before you get any resultsDevelopers like the bleeding edge but can the monitoring
tools keep up?
www.applicationperformance.com
Q & A
martin.pinner@applicationperformance.comwww.applicationperformance.com@appperf