Save the Users! Monitoring and Diagnosing Problems in Production

Post on 14-Jun-2015

237 views 2 download

Tags:

Transcript of Save the Users! Monitoring and Diagnosing Problems in Production

Save the Users!Monitoring and Diagnosing Problems in Production

Steven Haines

J2EE Architect and Evangelist

Quest Software

Agenda

• Speaker introduction

• Overview of the problem

• Overview of the strategy throughout the development lifecycle

• What needs to be monitored

• Solving problems quickly in production

• Questions & Answers

Speaker Introduction J2EE Architect and Evangelist for Quest Software Author of Java 2 Primer Plus and Java 2 From Scratch Co-Author of Java Web Services Unleashed Java Host and columnist on InformIT.com (Pearson

Education) Java Instructor at the University of California, Irvine

(UCI) and previously Learning Tree University (LTU) Recruited as a J2EE architect in the “real world”

Why Do I Need To Worry?

Market Trends: The Pain of J2EE

IT must answer to the business

Fewer than 20% of J2EE applications meet their performance SLAs in production. (IDC Research)

The Cost of Failure

Business to Consumer– Site Abandonment = lost revenue

Business to Business– Damaged Business Relationships = lost

opportunity

Internal– Loss of organizational efficiency

– Slower time-to-market

The Strategy:

Performance Throughout the Development Lifecycle

Analysts & Industry-Experts Agree…

Gartner recommends you approach J2EE performance throughout the lifecycle

Full Lifecycle Analysis

Application-level code assurance

Certify applications before deployment

24x7 application performance management

In Development… Put performance requirements in Use Cases

Unit test your components for performance

– Both for memory usage and response time

Test your application for performance at every integration milestone

Integration of un-tuned components is analogous to building a car with broken parts!

In QA Testing… Test performance along with functionality

Try to create load scripts that mirror your user’s actions

Analyze the reality (as much as possible)

Failed performance is not acceptable

The Performance StakeholdersWhat code is behind the symptom?

Is the application architecture a problem?

What component is at fault?

Who should fix the problem?

?Which SQL statements need tuning?

Is the DB really the problem?

? Is the application available?

Is the app server configured correctly?

?

?

In Live Production… Measure end-user performance

Watch for resource contention problems

Make sure you get warnings early, but avoid alarm storms

Keep historical data for trending and capacity planning

What Needs To Be Monitored?

End-User Measurement

This is the most important

Passive versus Active

A combination gives the best balance

Either way you must be able to follow users through the system

Resource Contention

Helps you to avoid system failure

Makes capacity planning easier and more reliable

Combine with end-user data

Tiered, composite alerts make your life easier

Setting Up An Alert

• Tiered alerts

• Normal

• Warning

• Critical

• Fatal

• Composite conditions

• Intelligent messaging

• Evaluation options

• Actions when triggered versus actions when cleared

Loaded System Behavior

# Concurrent Users (Load)

Res

po

nse

Tim

e (R

)

Th

rou

gh

pu

t (X

)

Uti

liza

tio

n (

U)

Buckle Zone

Light Load

Heavy Load

Resource Saturated

Consider Service Demand

Best measure of resource utilization

Service Demand = Utilization / Throughput

Normalizes your utilization against throughput, provides clarity

Breadth And Depth

J2EE problems come from many points across the system – not just the application or application server (Gartner Group)

You need to combine a broad, system-wide view and deep domain-specific data

Supporting Systems

Problem areas according to an IBM study

1.Database2.Application Code3.Application Server Configuration4.Infrastructure: OS, Network

J2EE Complexity: Vertical and Horizontal

Solving Problems Quickly In Production

Solving Problems Effectively

Fast detection + clear diagnosis = quick resolution

Need to be able to transition from detecting to diagnosing problems quickly

Triaging is essential

What Makes Fast Detection Possible?

Targeted, composite alerts

Trending analysis

A clear process for dealing with issues

How Do I Get Clear Diagnostic Data?

Make sure you can get deep data from:

– Application code

– Application server

– Database

– OS, Network and support systems

The data needs to be presented in a way that is tied to end-user requests

Guaranteeing Quick Resolution

Quick resolution is in the hands of the domain experts

They can work miracles with the right data at their fingertips

Getting the data smoothly from production to the developer is essential

Conclusions

Conclusions

Take a full lifecycle approach

Measure end-user response time

Track resource utilization / saturation

Ensure a smooth transition from detection to diagnosis and resolution

Quest APM Suite for J2EE

Product Overview

An integrated solution that empowers all the stakeholders in J2EE application performance management to accelerate the detection, diagnosis and resolution of business-threatening performance issues.

Quest’s Application Performance Management Suite for the J2EE platform

RE

SO

LV

ER

ES

OL

VE

Breadth And Depth

• Expert advice and intuitive interfaces make finding the root cause of problems simple in:

• The application server• The database• ERP, CRM, Network or operating system

Deep Source-code

View

High LevelSystemic

View

•Our developers real-world experience creates tools which are intuitive for your domain experts

• Broad coverage ensures problems are found before they impact your users

DIA

GN

OS

ED

IAG

NO

SE

DE

TE

CD

ET

EC

TT

Measuring End-Users and the SystemCustomizable web dashboard may include all the following and more…

Availability

Business StatusSystem Usage

Alert Viewing

Business- and Silo-Specific Reporting Choose from

hundreds of out-of-the-box reports

Create fully customized reports

Automate their generation

Control how often they are created

Use them to measure historical performance and capacity planning

                                                                                                                                        

Problem Solving in QA and Production Drill down to the Java components

– Enterprise Java Beans (EJBs)– Servlets, JSPs– HTTP Sessions– Class and method response times

Find JDBC and database problems Expose OS and network resource

contention problems

Problem Solving in QA and Production Real-time

diagnostics Context-

sensitive expert help suggests solutions

End-to-end view includes in depth data on: Web Servers Application

Servers Databases Windows Unix ERP, CRM

Auto-Record for Deeper Diagnosis

Capture the transaction flow and easily find J2EE application bottlenecks and resource contention

The appearance of an anomaly in the J2EE system automatically starts deeper data recording for the domain-expert

Unique Call Tree Shows the path of a

request through an application

From HTTP to SQL Cumulative response

time shows critical path

Individual response time shows method-level bottleneck

Popup windows show relevant metrics

Correlated Metrics View Quickly correlate

metrics from the:– Java/J2EE code– Application server– Database– Operating system– Web, ERP, CRM

servers– Network

Dynamically add metrics to the same graph to see them side-by-side

Line-of-Code Resolution

It’s Easy– Bottlenecks are

automatically highlighted in red

It’s Fast– Find Memory

Leaks quickly with the most detailed object allocation information

It’s Flexible– Line-of-code

differencing– Reporting in Excel,

HTML or text

Customer Results

“AutoDesk saves up to 80% of our time in investigation and diagnosing performance issues in our clustered WebLogic environment, which previously was done through manual log sifting and trial and error techniques.” – Senior Applications Manager, AutoDesk

“…Helped the team narrow down the bottlenecks within our Java code in days versus weeks.” – J2EE Architect, HSBC

“Within minutes, we were able to profile two different J2EE applications and get valuable results immediately.” Manager of Technical Architecture, UICI

Using Quest Software products…

At Toyota our tools were able to find a problem that had plagued them for six months, with our deep data and expert advice it was solved in less than two days.

Attend a PerformaSure Web Cast

Presented every Thursday1:00pm PST, 4:00pm EST

http://www.quest.com/events/webcast_index.asp

Thank you

http://www.quest.com