Save the Users! Monitoring and Diagnosing Problems in Production
Transcript of Save the Users! Monitoring and Diagnosing Problems in Production
Save the Users!Monitoring and Diagnosing Problems in Production
Steven Haines
J2EE Architect and Evangelist
Quest Software
Agenda
• Speaker introduction
• Overview of the problem
• Overview of the strategy throughout the development lifecycle
• What needs to be monitored
• Solving problems quickly in production
• Questions & Answers
Speaker Introduction J2EE Architect and Evangelist for Quest Software Author of Java 2 Primer Plus and Java 2 From Scratch Co-Author of Java Web Services Unleashed Java Host and columnist on InformIT.com (Pearson
Education) Java Instructor at the University of California, Irvine
(UCI) and previously Learning Tree University (LTU) Recruited as a J2EE architect in the “real world”
Why Do I Need To Worry?
Market Trends: The Pain of J2EE
IT must answer to the business
Fewer than 20% of J2EE applications meet their performance SLAs in production. (IDC Research)
The Cost of Failure
Business to Consumer– Site Abandonment = lost revenue
Business to Business– Damaged Business Relationships = lost
opportunity
Internal– Loss of organizational efficiency
– Slower time-to-market
The Strategy:
Performance Throughout the Development Lifecycle
Analysts & Industry-Experts Agree…
Gartner recommends you approach J2EE performance throughout the lifecycle
Full Lifecycle Analysis
Application-level code assurance
Certify applications before deployment
24x7 application performance management
In Development… Put performance requirements in Use Cases
Unit test your components for performance
– Both for memory usage and response time
Test your application for performance at every integration milestone
Integration of un-tuned components is analogous to building a car with broken parts!
In QA Testing… Test performance along with functionality
Try to create load scripts that mirror your user’s actions
Analyze the reality (as much as possible)
Failed performance is not acceptable
The Performance StakeholdersWhat code is behind the symptom?
Is the application architecture a problem?
What component is at fault?
Who should fix the problem?
?Which SQL statements need tuning?
Is the DB really the problem?
? Is the application available?
Is the app server configured correctly?
?
?
In Live Production… Measure end-user performance
Watch for resource contention problems
Make sure you get warnings early, but avoid alarm storms
Keep historical data for trending and capacity planning
What Needs To Be Monitored?
End-User Measurement
This is the most important
Passive versus Active
A combination gives the best balance
Either way you must be able to follow users through the system
Resource Contention
Helps you to avoid system failure
Makes capacity planning easier and more reliable
Combine with end-user data
Tiered, composite alerts make your life easier
Setting Up An Alert
• Tiered alerts
• Normal
• Warning
• Critical
• Fatal
• Composite conditions
• Intelligent messaging
• Evaluation options
• Actions when triggered versus actions when cleared
Loaded System Behavior
# Concurrent Users (Load)
Res
po
nse
Tim
e (R
)
Th
rou
gh
pu
t (X
)
Uti
liza
tio
n (
U)
Buckle Zone
Light Load
Heavy Load
Resource Saturated
Consider Service Demand
Best measure of resource utilization
Service Demand = Utilization / Throughput
Normalizes your utilization against throughput, provides clarity
Breadth And Depth
J2EE problems come from many points across the system – not just the application or application server (Gartner Group)
You need to combine a broad, system-wide view and deep domain-specific data
Supporting Systems
Problem areas according to an IBM study
1.Database2.Application Code3.Application Server Configuration4.Infrastructure: OS, Network
J2EE Complexity: Vertical and Horizontal
Solving Problems Quickly In Production
Solving Problems Effectively
Fast detection + clear diagnosis = quick resolution
Need to be able to transition from detecting to diagnosing problems quickly
Triaging is essential
What Makes Fast Detection Possible?
Targeted, composite alerts
Trending analysis
A clear process for dealing with issues
How Do I Get Clear Diagnostic Data?
Make sure you can get deep data from:
– Application code
– Application server
– Database
– OS, Network and support systems
The data needs to be presented in a way that is tied to end-user requests
Guaranteeing Quick Resolution
Quick resolution is in the hands of the domain experts
They can work miracles with the right data at their fingertips
Getting the data smoothly from production to the developer is essential
Conclusions
Conclusions
Take a full lifecycle approach
Measure end-user response time
Track resource utilization / saturation
Ensure a smooth transition from detection to diagnosis and resolution
Quest APM Suite for J2EE
Product Overview
An integrated solution that empowers all the stakeholders in J2EE application performance management to accelerate the detection, diagnosis and resolution of business-threatening performance issues.
Quest’s Application Performance Management Suite for the J2EE platform
RE
SO
LV
ER
ES
OL
VE
Breadth And Depth
• Expert advice and intuitive interfaces make finding the root cause of problems simple in:
• The application server• The database• ERP, CRM, Network or operating system
Deep Source-code
View
High LevelSystemic
View
•Our developers real-world experience creates tools which are intuitive for your domain experts
• Broad coverage ensures problems are found before they impact your users
DIA
GN
OS
ED
IAG
NO
SE
DE
TE
CD
ET
EC
TT
Measuring End-Users and the SystemCustomizable web dashboard may include all the following and more…
Availability
Business StatusSystem Usage
Alert Viewing
Business- and Silo-Specific Reporting Choose from
hundreds of out-of-the-box reports
Create fully customized reports
Automate their generation
Control how often they are created
Use them to measure historical performance and capacity planning
Problem Solving in QA and Production Drill down to the Java components
– Enterprise Java Beans (EJBs)– Servlets, JSPs– HTTP Sessions– Class and method response times
Find JDBC and database problems Expose OS and network resource
contention problems
Problem Solving in QA and Production Real-time
diagnostics Context-
sensitive expert help suggests solutions
End-to-end view includes in depth data on: Web Servers Application
Servers Databases Windows Unix ERP, CRM
Auto-Record for Deeper Diagnosis
Capture the transaction flow and easily find J2EE application bottlenecks and resource contention
The appearance of an anomaly in the J2EE system automatically starts deeper data recording for the domain-expert
Unique Call Tree Shows the path of a
request through an application
From HTTP to SQL Cumulative response
time shows critical path
Individual response time shows method-level bottleneck
Popup windows show relevant metrics
Correlated Metrics View Quickly correlate
metrics from the:– Java/J2EE code– Application server– Database– Operating system– Web, ERP, CRM
servers– Network
Dynamically add metrics to the same graph to see them side-by-side
Line-of-Code Resolution
It’s Easy– Bottlenecks are
automatically highlighted in red
It’s Fast– Find Memory
Leaks quickly with the most detailed object allocation information
It’s Flexible– Line-of-code
differencing– Reporting in Excel,
HTML or text
Customer Results
“AutoDesk saves up to 80% of our time in investigation and diagnosing performance issues in our clustered WebLogic environment, which previously was done through manual log sifting and trial and error techniques.” – Senior Applications Manager, AutoDesk
“…Helped the team narrow down the bottlenecks within our Java code in days versus weeks.” – J2EE Architect, HSBC
“Within minutes, we were able to profile two different J2EE applications and get valuable results immediately.” Manager of Technical Architecture, UICI
Using Quest Software products…
At Toyota our tools were able to find a problem that had plagued them for six months, with our deep data and expert advice it was solved in less than two days.
Attend a PerformaSure Web Cast
Presented every Thursday1:00pm PST, 4:00pm EST
http://www.quest.com/events/webcast_index.asp
Thank you
http://www.quest.com