Application Performance Management - Uni...

Post on 19-May-2020

6 views 0 download

Transcript of Application Performance Management - Uni...

Institute of Software Technology

Reliable Software Systems

ApplicationPerformanceManagement

State of the Art andChallenges for the Future

André van HoornMario MannDušan OkanovićChristoph Heger

Tutorial @ 8th ACM/SPEC International Conference on Performance Engineering,

April 22, 2017, L‘Aquila, Italy

André van Hoorn

University of Stuttgart

Inst. of Softw. Techn.

Reliable SW Systems

Mario Mann

NovaTec Consulting

Application

Performance

Management

Dušan Okanović

University of Stuttgart

Inst. of Softw. Techn.

Reliable SW Systems

Christoph Heger

NovaTec Consulting

Application

Performance

Management

Who Are These Guys?

Performance Problems are Omnipresent

An unexpected

error occured

Temporarily

not available

… more capacity

is on the way

Try again later

Please visit us

again later

We are

experiencing

heavy demand

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

Influence of Poor Performance on the Success of Businesses

“Application performance management (APM), as a core IT operations discipline, aims to achieve an adequate level of performance during operations. To achieve this,

APM comprises methods, techniques, and tools for

• continuously monitoring the state of an application system and its usage, as well as for

• detecting, diagnosing, and resolving performance-related problems using the monitored data.”

C. Heger, A. van Hoorn, D. Okanović, M. Mann.:

Application performance management: State of the art and challenges for the future.

In: Proc. 8th ACM/SPEC ICPE, ACM (2017)

Agenda

Introductionto APM

APM Tools

APM Advanced

Discussion

45 min

45+15 min

60 min

15 min

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

André van Hoorn und Stefan Siegl.

Application Performance Management (APM). Continuous Monitoring of Application Performance.

(Poster in ObjektSpektrum magazine; in German)

Order for free: http://www.sigs-datacom.de/wissen/fachposter.html

• Agents collect data from all system levels

• On application level the agents are often technology-dependent

Collecting Data from All System Levels1.

Where? What? How?

Trace-based Metrics (Selection)

What?

Metric

Response Time

CPU Time

Method Name

Return Type

Logging Level

SQL Statement

Error Message

Okanović, D., van Hoorn, A., Heger, C., Wert, A., Siegl, S.:

Towards performance tooling interoperability: An open format for representing execution traces.

In: Proc. EPEW ’16. LNCS, Springer (2016)

Monitoring (Measurement-based Performance Evaluation)

Database

Server

Client

Browser

Request

Response

Query

Result

Application

Server

Data

Recorder

0.2s 0.3s 0.1s 3.0s

0.1s0.4s

0.5s

0.3s

3.0s

4.9s

VisualizationData Analysis

(fictio

naltim

ings)

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

• Event-driven measurement

• Event: a change in system state

• e.g., incoming requests, access to HDD, operation execution, throwing exceptions

• Calculate performance metrics whenever an event occurs

• Simplest metric: counter

• Tracing (also event-driven)

• Recording a certain part of the system state when an event occurs

• Example: Code adjustments

• printf – easy, but with questionable maintainability

• Bytecode engineering

• Aspect-oriented Programming (AOP)

• Example: Sampling

• Measurement (state or counter) is performed in specified time intervals

• Through overhead and increased resource consumption, measurements

influence the system!

Measurement Approaches and Techniques (Examples)

How?

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

• Data is collected from the system…

• represented as time series…

Reconstructing Information from Data2.

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

• Data is collected from the system…

• represented as time series…

• … and as detailed execution traces, and

used to support problem analysis

Reconstructing Information from Data2.

Traces in an APM Tool

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

• High quantity of information has to be pre-processed

• It has proven useful to use different views to show the data

• Views are navigable and can be categorized by both scope and detail level

Visualization Through Navigable Views3.

Example: Application Topology Discovery and Visualization

© AppDynamics

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

Manual or automated conclusions and actions can be derived from the

information, e.g.,

• Problem detection and alerting

• E.g., increased response times and resource utilization

• Detection, for instance, based on thresholds and baselines

• Problem diagnosis and root cause isolation

• E.g., N+1 problem, too many remote calls, poor DB queries

• Detection based on monitoring information

• System refactoring and adaptation

• E.g., auto-scaling in cloud-based architectures

Interpreting and Using the Information4.

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

• http://kieker-monitoring.net

1/20/2016 22

Dynamic Software Analysis and ApplicationPerformance Management

Application Performance Management: State of the Art and Challenges for the Future

Institute of Software Technology

Reliable Software Systems

ApplicationPerformanceManagement

Part 2 – APM Tools

André van HoornMario MannDušan OkanovićChristoph Heger

Tutorial @ 8th ACM/SPEC International Conference on Performance Engineering,

April 22, 2017, L‘Aquila, Italy

• Commercial APM tools

• Magic quadrant

• Timeline of APM tools

• Goals of APM tools

• Architecture

• EUM, Server, Database Monitoring

• How APM tools can help you to identify problems

• Open Source APM tools

Contents

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

Commercial APM Tools

Magic Quadrant

Sourc

e: G

artn

er (2

016)

4/22/2017

Timeline of Tools

Wily – CA APM

DynaTrace

AppDynamics

Cisco

inspectIT

1998

2005

2005 2008

2017

Kieker

Commercial tools

Open source tools

Application Performance Management: State of the Art and Challenges for the Future

2006

Goals of APM Tools

End-User

PerspectiveEnd to End

Monitoring

Smart

Analytics

Lifecycle by

Design

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

@Dynatrace

Architecture – Dynatrace

Java .NET

Mainframe,

Native, … Database

Dynatrace Sessions

Performance WarehouseDynatrace Frontend ServerDynatrace Backend Server

Agent/PurePath

Collector GroupMonitoring Collector

Web Server /Node.js

/ NGINX / PHP

Dynatrace

Client

Web UI Dashboards

@Dynatrace

Sensors – How Do They Work?

• Every sensor (custom and OOTB) instruments Java/.NET methods

• Code gets added to each method that matches the sensor rule to

• measure execution time

• capture method arguments and return values

• count method invocations

• capture exceptions

Uninstrumented Application Instrumented Application

Execution Time

of Diagnosis Code

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

@Dynatrace

4/22/2017

Architecture – AppDynamics

Source:

https://docs.appdynamics.com/download/attachments/34271888/dbmondplussingleapp_appd_architecture.png?version=4

&modificationDate=1487053783658&api=v2

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

10 ms 10 ms 10 ms 10 ms

Sensors – How Do They Work?

@Dynatrace

Applications and Infrastructure Overview

Analysis dashboards

UEM Visits grouped byChannel and Application

Show the total Tier

time with Process,

Host, and

Transaction

Deeper analysis

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

Application Overview – Dynatrace

Application Overview – AppDynamics

Tiers with technology

Databases

Remote Services

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

@AppDynamics

Real User Monitoring – AppDynamics

Infrastructure

Visibility

End User Monitoring Application Performance Management

NoSQL

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

@AppDynamics

Real User Monitoring – Dynatrace

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

Source: https://qph.ec.quoracdn.net/main-qimg-cd5a4b6e8cc29465d2cea98e055f606e.webp

End User Monitoring – AppDynamics

Geolocation

CrashesRequests

Time

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

Server Monitoring – AppDynamics

Availability CPU Usage

Memory Usage

Network

VolumesProcesses

Database Monitoring – AppDynamics

Alerting – AppDynamics

@A

ppD

ynam

ics

Dashboarding

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

@Dynatrace

How APM tools can help to identify problems?

BusinessTransaction

Potential Issues

User Experience

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

@AppDynamics

Top findings to optimize

performance

Filter

How APM tools can help to identify problems?

@Dynatrace

• Share configurations between stages

• Identify business transactions and map this to functional level

Challenges

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

Open Source APM tools

Licensing costs

Vendor Lock-In

• Typical model: x $ per agent scaling?

Weaknesses of Commercial APM Tools

Microservices Internet of Things

[Abbildung: http://blog.wso2.com] [Abbildung: Christian Hinkelmann, http://nahverkehrhamburg.de]

Flexibility Interoperability Sustainability

Mobile Revolution

[Abbildung: https://uxmag.com]

Adaption to own

needs

Bug fixing

Sources

Tools for analyse

OtherAPM tools

Stopp development

Change of strategy of

APM vendor

4/22/2017Application Performance Management: State of the Art and Challenges for the Future 4/22/2017Application Performance Management: State of the Art and Challenges for the Future

Open Source APM tools

Load Testing

Performance

Modeling

Web Performance

Analysis

System &

Resources Monitoring

Real User Monitoring

JRatJMemProf

Low-Level

Performance Profiling

Monitoring &

Application Deep Dive

What is inspectIT?

pen Source

8 |2

Platf rm

Pareto-principle

• Focus on main functionality

• Experience of APM projects

Platform-principle

• Integration with tools

• Extensibility

Open-Source APM solution

• Development since 2005

• Open Source since 2015

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

Gathering and Visualizing Timeseries Data

Time Series Databases

Graphing ToolsData Collectors

Custom Code…

Persist data Query & Visualize

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

Dashboards

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

Dashboards

The Flaw of Averages

The Flaw of Averages: Why We Underestimate Risk in the Face of

Uncertainty

Sam L. Savage, with illustrations by Jeff Danziger –

http://flawofaverages.com

Gil Tene – https://www.youtube.com/watch?v=lJ8ydIuPFeU

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

Dashboards

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

Alerting

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

Anomaly Detection & Alerting

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

End User Experience?

[Image | http://cdn2.hubspot.net/hub/2224117/file-4141070752-jpg/blog-files/three-

e1409846588737.jpg]

Satisfied Tolerating Frustrated

Performance User Actions Users

Performance issues

Functional Errors

Network (Local ISPs, Mobile network carriers)

Third party content providers

First user action

Last user action

Did the visit convert?

Did the visit bounce?

Device

Resolution

Browser Versions

Geolocation

What is Web Performance Analysis about?

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

Measuring End-User Experience

Real User

Testing

Synthetic

Monitoring

Continuous external testing

Known testing nodes

Different Locations (ISP, Networks)

No baseline traffic required

Competitor Benchmark

Availability testing (incl. 3rd Party)

e.g., every 30 mins from different nodes

Real User

Monitoring

Real End Users

Real Interaction

Real traffic

External testing with all major

browsers,

operating systems,

mobile devices

and real world data

QoS Validation

QoS ExpectationQoS Optimization

Anomaly

Testing and Monitoring Web Performance

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

[http://www.webpagetest.org/performance_optimization.php?test=160623_DF_MV7&run=1

&cached=0]

webpagetest.org

Server Monitoring

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

Source: https://www.icinga.com/wp-content/uploads/2016/08/Screen-Shot-2016-08-29-at-16.11.23.png

• No licensing costs

• Growing community

• Pick the tools which work for your needs and combine them

Benefits of Open Source Tools

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

• From perspectives of industry and research…

Challenges

Institute of Software Technology

Reliable Software Systems

ApplicationPerformanceManagement

Part 3 – APM Advanced

André van HoornMario MannDušan OkanovićChristoph Heger

Tutorial @ 8th ACM/SPEC International Conference on Performance Engineering,

April 22, 2017, L‘Aquila, Italy

Problem Diagnosis

Component

Deep-Dive

Interpretation of

measurements

Detect problem

Application Performance Management: State of the Art and Challenges for the Future 4/22/2017

Problem: Too Many Traces to Analyze

Lots of data to

analyze

Application Performance Management: State of the Art and Challenges for the Future 4/22/2017

http://diagnoseit.github.io/

diagnoseIT Overview

Node 1

JVM

App 1

App 2

Node n

SUT

Traces

Instrumentation

Refinement

Request

Result

Monitoring Tool

Dynamic

Instrumentation

API Labeling

Location

Identification

Instrumentation

Quality Manager

Result Query

4/22/2017

C. Heger, A. van Hoorn, D. Okanović, S. Siegl, and A. Wert.

Expert-guided automatic diagnosis of performance problems in enterprise applications.

In Proc. EDCC ’16. IEEE, 2016.

Agenda

• APM interoperability

• Anti-patterns and trace based detection

• Mobile

• Overhead control

• Results aggregation

• Business integration

Application Performance Management: State of the Art and Challenges for the Future 4/22/2017

convert

Application Performance Management: State of the Art and Challenges for the Future

APM Interoperability

.map

.dat X .itds

4/22/2017

Application Performance Management: State of the Art and Challenges for the Future

Solution

APM Interoperability

OPEN.xtrace

.dat .itds .txt

convert

convert convert

4/22/2017

Application Performance Management: State of the Art and Challenges for the Future

Data Export in APM Tools

Open

Source

Export

possible

✔ ✔

✔ ✔

✘ ✔

✘ ✔

Open

Source

Export

possible

✘ ✔

✘ ✘

✘ ✘

✘ ✘

4/22/2017

Metric OPEN.xtrace Kieker inspectIT AppDynamics Dynatrace New Relic

Response

Time ✔ ✔ ✔ ✔ ✔ ✔

CPU Time ✔ ✔ ✔ ✔

Method

Name ✔ ✔ ✔ ✔ ✔ ✔

Return Type ✔ ✔ ✔

Logging

Level ✔ ✔ ✔

SQL

Statement ✔ ✔ ✔ ✔ ✔

Error

Message ✔ ✔ ✔ ✔ ✔

Comparison table (excerpt)

4/22/2017

D. Okanovic, A. van Hoorn, C. Heger, A. Wert, and S. Siegl.

Towards performance tooling interoperability: An open format for representing execution traces.

In Proc. EPEW ’16, pages 94–108, 2016.

• Execution trace is a data structure that captures control flow of method

execution for a request served by the system (Ammons et al., 1997)

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

OPEN.xtrace Traces

Trace

SubTrace

SubTrace

Callable

Callable

Callable

Location

Callables

• Method executions

• DB calls

• Remote calls

• Logging

• Errors

• What is available:

• Trace model based on the available data in these tools

• Adapters to convert the data between OPEN.xtrace and tools

• Serialization

• Planned: OPEN.timeseries

• More on OPEN.xtrace

• D. Okanovic, A. van Hoorn, C. Heger, A. Wert, and S. Siegl. Towards performance

tooling interoperability: An open format for representing execution traces. In Proc.

EPEW ’16, pages 94–108, 2016.

• https://research.spec.org/apm-interoperability/

• https://github.com/spec-rgdevops

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

OPEN.xtrace

• Track what is going on in a complex, heterogeneous systems

• Capture important events during processing of client request

• Avoid vendor lock-in

• Wide industry support (http://opentracing.io)

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

OpenTracing

diagnoseIT Overview

Node 1

JVM

App 1

App 2

Node n

SUT

Traces

Instrumentation

Refinement

Request

Result

Monitoring Tool

Dynamic

Instrumentation

API Labeling

Location

Identification

Instrumentation

Quality Manager

Result Query

C. Heger, A. van Hoorn, D. Okanović, S. Siegl, and A. Wert.

Expert-guided automatic diagnosis of performance problems in enterprise applications.

In Proc. EDCC ’16. IEEE, 2016.

Anti-patterns are conceptually

similar to patterns, however

describe recurrent solutions to

design problems which, however,

may have a negative effect on

different software quality

attributes.

Koenig, A. (1998). “Patterns and Antipatterns”. In: The Patterns

Handbooks. Ed. by L. Rising. New York, NY, USA: Cambridge

University Press

• Performance Antipatterns

(Performance) Antipatterns

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

• Problem

„[…] a point in the execution where one, or only a few, processes

may continue to execute concurrently. All other processes must wait. […]”

• Cause

• „It frequently occurs in applications that access a database. Here, a lock ensures

that only one process may update the associated portion of the database at a

time.

• It may also occur when a set of processes make a synchronous call to another

process that is not multi-threaded; all of the processes making synchronous

calls must take turns “crossing the bridge.”“

• Solution

“Shared Resources principle[:] responsiveness improves when we minimize the

scheduling time plus the holding time. Holding time is reduced by reducing the

service time for the One-Lane Bridge, and by rerouting the work.”

[Smith and Williams, 2001]

Example 1: One-Lane Bridge

• Synchronization in source-code:

• Thread- and connection- pools in application servers

Example 1: One-Lane Bridge – Causes

public synchronized

void buyItems (Collection<Item> items) {

}

<Connector ... maxThreads="300" acceptCount="150"

... />

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

Example 1: One-Lane Bridge – Symptoms

0

10

20

30

40

1 10 20 30 40 50

0

20

40

60

80

100

1 10 20 30 40 50

Number of users

Re

sp

on

se

tim

e

[ms]

Number of users

CP

U-u

tili

za

tio

n

[%]

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

Example 2: N+1-Problem

Wert, A. (2015). Performance Problem

Diagnostics by Systematic

Experimentation, PhD thesis.

Parsons, T. (2007): “Automatic Detection of

Performance Design and Deployment

Antipatterns in Component Based

Enterprise Systems”. PhD thesis.

Performance-Antipatterns– Classifications

Application Performance Management: State of the Art and Challenges for the Future

Analysis automation

4/22/2017

List of Currently Detectable Antipatterns

• N+1 Query Problem

• Circuitous Treasure Hunt

• The Stifle

• The Ramp

• Traffic Jam

• More is Less

• Application Hiccups

• Garbage Collection Hiccups

Application Performance Management: State of the Art and Challenges for the Future 4/22/2017

Detecting Antipatterns in Mobile Environment

Application Performance Management: State of the Art and Challenges for the Future

Java

Agent

iOS

Agent

Buffer

OpenTracing

spans

GUI

OpenTracing

spans

OPEN.xtrace

4/22/2017

Antipatterns in Mobile Environment

• Anti-patterns

• Too many remote calls

• Too many remote calls to the same URL

• Too many remote calls to the same server

• Hard-disk utilization too high

• Hard-disk usage increases too fast (ramp)

• RAM utilization too high

• RAM usage increases too fast (ramp)

• Response is available after timeout (timeout occurred too early)

• Glitches

• No connection available during a request

• Latency between back end and mobile device is too high

Application Performance Management: State of the Art and Challenges for the Future 4/22/2017

diagnoseIT Overview

Node 1

JVM

App 1

App 2

Node n

SUT

Traces

Instrumentation

Refinement

Request

Result

Monitoring Tool

Dynamic

Instrumentation

API Labeling

Location

Identification

Instrumentation

Quality Manager

Result Query

Application Performance Management: State of the Art and Challenges for the Future 4/22/2017

Overhead Control

• diagnoseIT can request more data from the monitoring tool

• Tool has to support changing of monitoring at runtime

• Instrumentation Quality Monitor can deny this request

• How to assess?

• Modeling?

• Machine learning?

Application Performance Management: State of the Art and Challenges for the Future 4/22/2017

diagnoseIT Overview

Node 1

JVM

App 1

App 2

Node n

SUT

Traces

Instrumentation

Refinement

Request

Result

Monitoring Tool

Dynamic

Instrumentation

API Labeling

Location

Identification

Instrumentation

Quality Manager

Result Query

Application Performance Management: State of the Art and Challenges for the Future 4/22/2017

diagnoseIT – Diagnosis Results

Problems

Overview

Natural language

description

Problem instance

details

Application Performance Management: State of the Art and Challenges for the Future 4/22/2017

Aggregating the Results

PIPI

PIPI

PIPI

PIPI

PIPI

PIPI PI

PIPI

PI

PI

PI PIPI

PI PI

PI PI

PIPI

PI

PIPI

PIPI

PI

PIPI

PI

PIPI

PIPI

PI

PIPI

PI

PIPI

PIPI

PI

T. Angerstein, D. Okanović, C. Heger, A. van Hoorn, A. Kovačević, T. Kluge:

Many flies in one swat: Automated categorization of performance problem diagnosis results.

In: Proc. ACM/SPEC ICPE ’17

Categorization Process

Diagnosis results Categorization

Optimization

Categorization result

T. Angerstein, D. Okanović, C. Heger, A. van Hoorn, A. Kovačević, T. Kluge:

Many flies in one swat: Automated categorization of performance problem diagnosis results.

In: Proc. ACM/SPEC ICPE ’17

Application Performance Management: State of the Art and Challenges for the Future

Aggregation in diagnoseIT

4/22/2017

Missing Business Context

CPU utilization over time

How critical is this?

Asking the right questions:

How are the users affected?

Which business transactions are affected?

Are any time-critical batch jobs affected?

11pm 0am 1am 2am 3am 4am 5am

100 %

50 %

0%

4/22/2017Application Performance Management: State of the Art and Challenges for the Future

Defining Business Context

Application Performance Management: State of the Art and Challenges for the Future 4/22/2017

Conclusion

Node 1

JVM

App

1

App

2

Node n

S

U

T

Traces

Instrumentation

Refinement

Request

Result

Monitoring Tool

Dynamic

Instrumentation

API Labeling

Location

Identification

Instrumentation

Quality Manager

Result Query