Save the Users! Monitoring and Diagnosing Problems in Production

43
Save the Users! Monitoring and Diagnosing Problems in Production Steven Haines J2EE Architect and Evangelist Quest Software

Transcript of Save the Users! Monitoring and Diagnosing Problems in Production

Page 1: Save the Users! Monitoring and Diagnosing Problems in Production

Save the Users!Monitoring and Diagnosing Problems in Production

Steven Haines

J2EE Architect and Evangelist

Quest Software

Page 2: Save the Users! Monitoring and Diagnosing Problems in Production

Agenda

• Speaker introduction

• Overview of the problem

• Overview of the strategy throughout the development lifecycle

• What needs to be monitored

• Solving problems quickly in production

• Questions & Answers

Page 3: Save the Users! Monitoring and Diagnosing Problems in Production

Speaker Introduction J2EE Architect and Evangelist for Quest Software Author of Java 2 Primer Plus and Java 2 From Scratch Co-Author of Java Web Services Unleashed Java Host and columnist on InformIT.com (Pearson

Education) Java Instructor at the University of California, Irvine

(UCI) and previously Learning Tree University (LTU) Recruited as a J2EE architect in the “real world”

Page 4: Save the Users! Monitoring and Diagnosing Problems in Production

Why Do I Need To Worry?

Page 5: Save the Users! Monitoring and Diagnosing Problems in Production

Market Trends: The Pain of J2EE

IT must answer to the business

Fewer than 20% of J2EE applications meet their performance SLAs in production. (IDC Research)

Page 6: Save the Users! Monitoring and Diagnosing Problems in Production

The Cost of Failure

Business to Consumer– Site Abandonment = lost revenue

Business to Business– Damaged Business Relationships = lost

opportunity

Internal– Loss of organizational efficiency

– Slower time-to-market

Page 7: Save the Users! Monitoring and Diagnosing Problems in Production

The Strategy:

Performance Throughout the Development Lifecycle

Page 8: Save the Users! Monitoring and Diagnosing Problems in Production

Analysts & Industry-Experts Agree…

Gartner recommends you approach J2EE performance throughout the lifecycle

Page 9: Save the Users! Monitoring and Diagnosing Problems in Production

Full Lifecycle Analysis

Application-level code assurance

Certify applications before deployment

24x7 application performance management

Page 10: Save the Users! Monitoring and Diagnosing Problems in Production

In Development… Put performance requirements in Use Cases

Unit test your components for performance

– Both for memory usage and response time

Test your application for performance at every integration milestone

Integration of un-tuned components is analogous to building a car with broken parts!

Page 11: Save the Users! Monitoring and Diagnosing Problems in Production

In QA Testing… Test performance along with functionality

Try to create load scripts that mirror your user’s actions

Analyze the reality (as much as possible)

Failed performance is not acceptable

Page 12: Save the Users! Monitoring and Diagnosing Problems in Production

The Performance StakeholdersWhat code is behind the symptom?

Is the application architecture a problem?

What component is at fault?

Who should fix the problem?

?Which SQL statements need tuning?

Is the DB really the problem?

? Is the application available?

Is the app server configured correctly?

?

?

Page 13: Save the Users! Monitoring and Diagnosing Problems in Production

In Live Production… Measure end-user performance

Watch for resource contention problems

Make sure you get warnings early, but avoid alarm storms

Keep historical data for trending and capacity planning

Page 14: Save the Users! Monitoring and Diagnosing Problems in Production

What Needs To Be Monitored?

Page 15: Save the Users! Monitoring and Diagnosing Problems in Production

End-User Measurement

This is the most important

Passive versus Active

A combination gives the best balance

Either way you must be able to follow users through the system

Page 16: Save the Users! Monitoring and Diagnosing Problems in Production

Resource Contention

Helps you to avoid system failure

Makes capacity planning easier and more reliable

Combine with end-user data

Tiered, composite alerts make your life easier

Page 17: Save the Users! Monitoring and Diagnosing Problems in Production

Setting Up An Alert

• Tiered alerts

• Normal

• Warning

• Critical

• Fatal

• Composite conditions

• Intelligent messaging

• Evaluation options

• Actions when triggered versus actions when cleared

Page 18: Save the Users! Monitoring and Diagnosing Problems in Production

Loaded System Behavior

# Concurrent Users (Load)

Res

po

nse

Tim

e (R

)

Th

rou

gh

pu

t (X

)

Uti

liza

tio

n (

U)

Buckle Zone

Light Load

Heavy Load

Resource Saturated

Page 19: Save the Users! Monitoring and Diagnosing Problems in Production

Consider Service Demand

Best measure of resource utilization

Service Demand = Utilization / Throughput

Normalizes your utilization against throughput, provides clarity

Page 20: Save the Users! Monitoring and Diagnosing Problems in Production

Breadth And Depth

J2EE problems come from many points across the system – not just the application or application server (Gartner Group)

You need to combine a broad, system-wide view and deep domain-specific data

Page 21: Save the Users! Monitoring and Diagnosing Problems in Production

Supporting Systems

Problem areas according to an IBM study

1.Database2.Application Code3.Application Server Configuration4.Infrastructure: OS, Network

Page 22: Save the Users! Monitoring and Diagnosing Problems in Production

J2EE Complexity: Vertical and Horizontal

Page 23: Save the Users! Monitoring and Diagnosing Problems in Production

Solving Problems Quickly In Production

Page 24: Save the Users! Monitoring and Diagnosing Problems in Production

Solving Problems Effectively

Fast detection + clear diagnosis = quick resolution

Need to be able to transition from detecting to diagnosing problems quickly

Triaging is essential

Page 25: Save the Users! Monitoring and Diagnosing Problems in Production

What Makes Fast Detection Possible?

Targeted, composite alerts

Trending analysis

A clear process for dealing with issues

Page 26: Save the Users! Monitoring and Diagnosing Problems in Production

How Do I Get Clear Diagnostic Data?

Make sure you can get deep data from:

– Application code

– Application server

– Database

– OS, Network and support systems

The data needs to be presented in a way that is tied to end-user requests

Page 27: Save the Users! Monitoring and Diagnosing Problems in Production

Guaranteeing Quick Resolution

Quick resolution is in the hands of the domain experts

They can work miracles with the right data at their fingertips

Getting the data smoothly from production to the developer is essential

Page 28: Save the Users! Monitoring and Diagnosing Problems in Production

Conclusions

Page 29: Save the Users! Monitoring and Diagnosing Problems in Production

Conclusions

Take a full lifecycle approach

Measure end-user response time

Track resource utilization / saturation

Ensure a smooth transition from detection to diagnosis and resolution

Page 30: Save the Users! Monitoring and Diagnosing Problems in Production

Quest APM Suite for J2EE

Product Overview

Page 31: Save the Users! Monitoring and Diagnosing Problems in Production

An integrated solution that empowers all the stakeholders in J2EE application performance management to accelerate the detection, diagnosis and resolution of business-threatening performance issues.

Quest’s Application Performance Management Suite for the J2EE platform

Page 32: Save the Users! Monitoring and Diagnosing Problems in Production

RE

SO

LV

ER

ES

OL

VE

Breadth And Depth

• Expert advice and intuitive interfaces make finding the root cause of problems simple in:

• The application server• The database• ERP, CRM, Network or operating system

Deep Source-code

View

High LevelSystemic

View

•Our developers real-world experience creates tools which are intuitive for your domain experts

• Broad coverage ensures problems are found before they impact your users

DIA

GN

OS

ED

IAG

NO

SE

DE

TE

CD

ET

EC

TT

Page 33: Save the Users! Monitoring and Diagnosing Problems in Production

Measuring End-Users and the SystemCustomizable web dashboard may include all the following and more…

Availability

Business StatusSystem Usage

Alert Viewing

Page 34: Save the Users! Monitoring and Diagnosing Problems in Production

Business- and Silo-Specific Reporting Choose from

hundreds of out-of-the-box reports

Create fully customized reports

Automate their generation

Control how often they are created

Use them to measure historical performance and capacity planning

                                                                                                                                        

Page 35: Save the Users! Monitoring and Diagnosing Problems in Production

Problem Solving in QA and Production Drill down to the Java components

– Enterprise Java Beans (EJBs)– Servlets, JSPs– HTTP Sessions– Class and method response times

Find JDBC and database problems Expose OS and network resource

contention problems

Page 36: Save the Users! Monitoring and Diagnosing Problems in Production

Problem Solving in QA and Production Real-time

diagnostics Context-

sensitive expert help suggests solutions

End-to-end view includes in depth data on: Web Servers Application

Servers Databases Windows Unix ERP, CRM

Page 37: Save the Users! Monitoring and Diagnosing Problems in Production

Auto-Record for Deeper Diagnosis

Capture the transaction flow and easily find J2EE application bottlenecks and resource contention

The appearance of an anomaly in the J2EE system automatically starts deeper data recording for the domain-expert

Page 38: Save the Users! Monitoring and Diagnosing Problems in Production

Unique Call Tree Shows the path of a

request through an application

From HTTP to SQL Cumulative response

time shows critical path

Individual response time shows method-level bottleneck

Popup windows show relevant metrics

Page 39: Save the Users! Monitoring and Diagnosing Problems in Production

Correlated Metrics View Quickly correlate

metrics from the:– Java/J2EE code– Application server– Database– Operating system– Web, ERP, CRM

servers– Network

Dynamically add metrics to the same graph to see them side-by-side

Page 40: Save the Users! Monitoring and Diagnosing Problems in Production

Line-of-Code Resolution

It’s Easy– Bottlenecks are

automatically highlighted in red

It’s Fast– Find Memory

Leaks quickly with the most detailed object allocation information

It’s Flexible– Line-of-code

differencing– Reporting in Excel,

HTML or text

Page 41: Save the Users! Monitoring and Diagnosing Problems in Production

Customer Results

“AutoDesk saves up to 80% of our time in investigation and diagnosing performance issues in our clustered WebLogic environment, which previously was done through manual log sifting and trial and error techniques.” – Senior Applications Manager, AutoDesk

“…Helped the team narrow down the bottlenecks within our Java code in days versus weeks.” – J2EE Architect, HSBC

“Within minutes, we were able to profile two different J2EE applications and get valuable results immediately.” Manager of Technical Architecture, UICI

Using Quest Software products…

At Toyota our tools were able to find a problem that had plagued them for six months, with our deep data and expert advice it was solved in less than two days.

Page 42: Save the Users! Monitoring and Diagnosing Problems in Production

Attend a PerformaSure Web Cast

Presented every Thursday1:00pm PST, 4:00pm EST

http://www.quest.com/events/webcast_index.asp

Page 43: Save the Users! Monitoring and Diagnosing Problems in Production

Thank you

http://www.quest.com