APM across the lifecycle or APM and the Holy Grail

49
www.applicationperformance.com London Web Performance Meet-up 15 February 2012 APM across the lifecycle or APM and the Holy Grail Martin Pinner Application Performance Ltd

description

London Web Performance Group Meetup presentation 15 Feb 2012

Transcript of APM across the lifecycle or APM and the Holy Grail

Page 1: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

London Web Performance Meet-up15 February 2012

APM across the lifecycleor

APM and the Holy Grail

Martin PinnerApplication Performance Ltd

Page 2: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Who am I?

• 14 years in the APM space• Started with one of the APM

pioneers – Precise• Co-founder of Application

Performance Ltd• Represented or encountered many

APM solutions• Interested in making applications go

faster

Page 3: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

What am I going to talk about?

What is APM and why do we do it?The Holy Grail of APMWhat are the products?APM next generation

Page 4: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

What is APM?

Application performance management, or APM, refers to the discipline … that focuses on monitoring and managing the performance and service availability of software applications – Wikipedia

Page 5: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Gartner defines five areas

End-user experience monitoringRuntime application architecture

discovery, modelling and displayUser-defined transaction profilingComponent deep-dive monitoring in

application contextAnalytics

Page 6: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Market spend

APM market is worth over £1 billionGrowing 15% per year

Page 7: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Why do we do it?

Web apps just can’t be slowEnsure that we deliver a consistent

quality of serviceReduce the time (and cost) to identify

and fix performance problemsSave money and brand reputation“What you can’t measure, you can’t

manage”Buying faster tin doesn’t always work

Page 8: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

How do we do it?

Measure resource consumptionMeasure response timeWhich is better…?Response time is generally better

but resource consumption is still useful

Many tools give you resource consumption (endless counters)

Page 9: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

How did we get here?1990

2000

2010

NSM – Network and Systems Management

IPM – Infrastructure Performance ManagementSLM – Service Level Management

APM – Application Performance Management

RUM – Real User Monitoring

BPM – Business Process MonitoringBTM – Business Transaction Monitoring

Page 10: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Who’s who?Gartner Magic Quadrant

Page 11: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

The Holy Grail

Development Test & QA Production

Page 12: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

The Holy Grail: we want to…

Satisfy the needs of the businessTalk in terms of the application and not the infrastructureUse the business transaction as our unit of measureKnow how long it takesSegment the time into the different layers of the

application Identify the root causes of issues wherever they occur

Implies we follow business transactions from the browser, through the network, switches, firewalls, load balancers, web servers, application servers, middleware, databases, storage and external systems…

Page 13: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

The Holy Grail: the technology

We want it to be low-cost (or even free!)We want it to be easy to deployWe want it to discover our infrastructureWe want it to learn our application

behaviourWe want the minimum of configurationWe want a single version of the truthWe want zero overhead

Page 14: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

The Holy Grail: different requirements

OperationsManagersDevelopers and testersConsultants

Page 15: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

The Holy Grail: operations

We want a simple green traffic light to indicate all is well

But with the ability to dive in and pinpoint the exact cause should the lights go red

We want real-time alerts (e-mail, SMS, integrations)

We want incident handling and workflowIf we must, we need to know about

Service Level Agreement (SLA) violationsWe quite like dashboards

Page 16: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

The Holy Grail: managers

We want reportsWe need to see the historyWe want an aggregated viewWe want to slice and dice to our hearts

contentWe love dashboards

Page 17: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

The Holy Grail: developers and testers

We want to profile the performance with as much detail as required (deep-dive)

We want to understand what operations are complaining about

We aspire to be agile (DevOps)

Page 18: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

The Holy Grail: consultants

What about consultants like me?I haven’t got time to drill down on each

problematic transaction (it might not happen while I am there)

I want a developer-like tool with good aggregation and drill-down capabilities that I can use in production

Ideally I want to find a missing database index that gives an order of magnitude performance improvement!

Page 19: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

The reality: the technology

We can’t do true business transactions but we can approximate with request-response pairs

There is an overhead, roughly proportional to the amount of detail you want

We can measure time quite accurately except perhaps in virtual environments

But we could be timing a page or URL or method or SQL – our unit of measure changes unless we can tie it altogether

Page 20: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

The reality: physics

The “Heisenberg uncertainty principle” of APM:

You cannot monitor the performance of something without affecting its performance

Relativistic effects:Time may run more slowly in a virtual

machine

Page 21: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

The reality: the organisation

We are silo-basedNetwork people don’t speak to web server

people don’t speak to application server people…

We have / want our own toolsWe cannot justify holistic toolsThird parties provide much of the

infrastructure – we don’t touch that

Page 22: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

The reality: costs

Licence costsHardware and pre-requisite software costsDeployment and configuration costsOngoing maintenance costs

Page 23: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

The Holy Grail

Development Test & QA Production

Page 24: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Survey

What web servers are you using?Apache? IIS?Other?

What application servers are you using?PHP? .NET? Java?Other?

What databases are you using?MySQL?SQL Server?Oracle?Other?

Any other technologies – message queues, big data?

Page 25: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

End-user experience

This is the ideal starting point for your end-to-end APM

Typically gives page and URL metrics

Page 26: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Development

www.webpagetest.org – for internet sitesdynaTrace AJAX edition – free tool, very

good for JavaScript performanceOther browser plug-ins – Firebug, Debug

Bar, Chrome developer tools, HTTPWatch (basic edition is free) are all good sources of waterfall charts

Proxies – Fiddler, BrowserMobTest one browser at a time

Page 27: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Load testing

On-premise – HP LoadRunner (supports non-web as well, good reports), SilkPerformer

Many free ones including JMeter, Selenium, …

Cloud testing – SOASTA, SOAtest, LoadRunner, Keynote, LoadStorm, Load Impact, Site Confidence, BrowserMob, NeoLoad (some are freemium)

Page 28: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Load testing

Won’t tell you why the back-end is slowResults may not be what happens in live!You could use real data, suitably massaged

– e.g. Atomic Labs Pion workflow engine

Page 29: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Synthetic transactions in production

Closely related to load testingBrowserMob (NEUstar) uses Selenium scripts for

bothSite Confidence (biggest in UK), Gomez

(Compuware), KeynoteTest key transactions at various locationsComplementary to capturing real dataWhat if all your users log in at 9am? Is your site

available at 8:55? No real data to useOften restricted to simple, read-only transactionsQuickly go out of date – require a lot of

maintenance

Page 30: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Real transactions in production

Most tools use JavaScript to time the pageGoogle Analytics Site Speed – free, but sampled (1%), low

granularity (simple average), not real-timeOther open source – Jiffy, Boomerang (Yahoo!), Episodes

(Steve Souders)WebTuna, Precise, New Relic – not free but without above

restrictionsNow use web timing and navigation timing APIs in

Chrome, Firefox and IE9 In older browser timing is less accurate, particularly for

first view, when the JavaScript has not been downloaded yet

SaaS is common but on-premise available as well

Page 31: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Real transactions in production

End users? – qualitative rather than quantitative

Aternity, Knoa – supports any Windows application, very good at correlation but eye-watering prices

Page 32: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

RUM – Real User Monitoring

Combination of end-user experience and network

Coradiant (BMC), HP, Triometric (UK)Tealeaf, Atomic Labs (have session replay)Network tap or mirror port (span port) or

software (for use in the cloud)Low overhead

Page 33: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

RUM – Real User Monitoring

Network component gives you URL response time but not easy to relate it to page response

Gateways and proxies can limit visibilityCan make it look like you only have one (busy)

customerMay be able to get originating IP from headers

but timings will probably be outTherefore often includes a JavaScript component

to tell you when the page has completed and who is the real user

Page 34: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Network

Usually an appliance, often monitoring a network tap or span port

Packet capture + network flow analysisNetFlow (Cisco), sFlow, J-FlowGives you usage: web, e-mail, file sharing, video,

VoIPRiverbed Cascade, NetX, Solar Winds, OpNetSome offer packet-shaping (bandwidth control)Free tools: Wireshark, tcpdump based on packet

capture (pcap)

Page 35: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Framework

Do a bit of everythingMainly to be found in production but may cover test and

devBasic ones use SNMP to capture network stats,

perfmon/WMI, OS commands via SSH to capture CPU, disk and memory

More sophisticated ones have plug-ins for different technologies – Cisco, VMware, Exchange, application servers, databases etc

Strengths – baselines, propagation of alerts, workflow, incident handling, SLAs and customisable dashboards

Weaknesses – lots of counters (wide but not deep), can take months to configure, “alert storms”

Page 36: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Framework

Some of the lightweight, free ones are Cricket, Graphite and Splunk

They concentrate on the presentation layer (graphs)

More fully featured free ones are Nagios and Zenoss

The big players are IBM Tivoli, HP OpenView, Microsoft SCOM, CA Unicenter, BMC Patrol, Nimsoft, IG, SolarWinds

One of the better ones IMHO was IndicativeService model built around a tree, alerts would

automatically propagateBought by Nimsoft and now disappeared

Page 37: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Web servers

Apache mod or ISAPI filter – PreciseMay just use mod_status – dynaTrace, New

Relic, AppDynamicsUsually fairly lightweightThe assumption must be that web servers

are not the main cause of bottlenecks

Page 38: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Application servers

Monitoring typically implemented using “instrumentation”

Extra timing code automatically added to the application

Tag and follow HTTP, JMS protocols etc in order to track transactions between application tiers

Page 39: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Application servers

Overhead is the big challenge hereWould like 1-2% in production but may

accept 30% in test and 100% in development

Instrumentation may complicate exception handling, optimisation method A ()

{ stopwatch.start ();

for (i = 1 to 1000) B ();

stopwatch.stop ();}

Page 40: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Application servers (development)

Can afford the overhead so can go for the greatest level of detail

Typically use profilers

Page 41: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Application servers (test)

One of the best for testing is probably dynaTrace (Compuware)

Captures every transaction (PurePaths) but high overhead as a consequence

Need to be careful using it production

Page 42: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Application servers (production)

1st generation: Wily (CA), PreciseWide and deep but not originally designed for

auto-discovery in service-orientated, virtualised environments

2nd generation: New Relic, OpTier, AVIcode (wide but not so deep), dynaTrace, AppDynamics (wide and deep)

AppDynamics ‘Lite’ version is free but limited (freemium model)

Betfair, Netflix – 1000’s of instancesHave mechanisms to limit overhead

Page 43: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Database

Vendor tools (Oracle, SQL Server and MySQL) dominate but not integrated with other tiers

Tend to measure statistics e.g. cache-hit ratio More important are wait statesPrecise – lots of detail, heavy infrastructureQuest Spotlight – easy on the eyeDBTuna, Confio – lightweight, easy to useDBTuna particularly good for load testing – 1

minute aggregation, load test comparisons

Page 44: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Big Data

No monitoring products that I know ofCan be monitored indirectly from the

application tier

Page 45: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Storage

Vendor tools typically measure I/O counters

True APM relates I/O to activity e.g. SQLEMC – PreciseNetApp – DBTuna

Page 46: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Survey

What APM tools do you use?None?One of the big vendors – IBM, HP, Microsoft?Point solutions?A mixture?

Page 47: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Limitations of request-responseAsynchronous e.g. Enterprise

Service BusParallel processingEffort v elapsed timeVariability in response time

(particularly for mobile)Very low latency (milliseconds)More technology e.g. web socketsMore agile / faster change

APM next generation challenges

Page 48: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

APM next generation challenges

More auto-scalingTighter integration between monitoring and workflow /

provisioning (but spikes in demand cannot be predicted)More SaaS – end-user experience (WebTuna, Atomic

Labs), net-flow (Boundary Networks, NetX)More cloud – load testing, synthetic transactions (Site

Confidence, Gomez, Keynote)But how do you get Amazon, Rackspace etc to monitor

your infrastructure?Continuous load testing – traditional tools require ramp-

up, load and stop before you get any resultsDevelopers like the bleeding edge but can the monitoring

tools keep up?

Page 49: APM across the lifecycle or APM and the Holy Grail

www.applicationperformance.com

Q & A

martin.pinner@applicationperformance.comwww.applicationperformance.com@appperf