Southwest Airlines: using HP Diagnostics to drive value
-
Upload
hp-software-solutions -
Category
Documents
-
view
1.639 -
download
1
description
Transcript of Southwest Airlines: using HP Diagnostics to drive value
1 ©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
Southwest Airlines: using HP Diagnostics to drive value
Subuhi Ali
Performance Tech LeadSouthwest Airlines
2
Topics
Company Introduction What is HP Diagnostic Installing HP Diagnostics Configuring the JVM / App Server Why Southwest Airlines chooses to use it? Issues – HP Diagnostic – Resolution Time Problem 1 – 100% CPU Problem 2 – Application Response Time Problem 3 – High Memory utilization Advantages of using Diagnostic Questions
3
Southwest Airlines - Intro
In 1971, Southwest began as a small Texas airline that served three cities with three airplanes.
In 2008, we operate more than 3,400 flights a day to 64 destinations in the U.S.
Over the course of 37 years, Southwest has grown to become the largest U.S. carrier in terms of domestic passengers flown
In 2009, Southwest moved 182 million pounds of cargo.
The shortest daily Southwest flight is between Ft. Myers (RSW) and Orlando (MCO) (133 miles). The longest daily Southwest flight is between Providence (PVD) and Las Vegas (LAS) (2,363 miles)).
Southwest has 1,164 married couples. In other words, 2,328 Southwest Employees have spouses who also work for the Company.
Southwest was the first airline to establish a home page on the Internet. Initially, five Employees comprised Southwest’s web site development team, and the site took about nine months to create.
4
Southwest Airlines - Intro
More than 35,000 total Employees throughout the Southwest system and 3,200 flights daily flying to 68 cities.
5
Southwest Airlines - Intro
6
What Is HP Diagnostic
HP Diagnostic tool is designed to help you improve the performance of your Java, .NET, and other enterprise applications throughout the application lifecycle. It enables you to:
Identify where time is spent in an application layer.
It allows you to drill down from a business transaction that is taking a long time due to the a problematic component
discover "rogue" code/components real-time as they are invoked
Identify Memory leaks
Tune Garbage Collection issues
For developers, it means that tracing code doesn't have to be added and removed.
The resolution time is quick
It can be enabled in Pre – Production or Production environment to quickly find resolutions
7
Installing HP Diagnostics
The installation Order is:
Diagnostic ServerProbeLoadRunner Integration
Diagnostic ServersJava Applications
Metrics
Diagnostic UI
Interface with LoadRunner
Controller
NAS Filer
Probe
8
Installing HP Diagnostics
Points to remember
Use a common id between your Linux and windows platform
Install the probe once on the NAS Mount and share it
Use unique names for each application JVM Node
With LoadRunner integration if the registered components change then start with a fresh Scenario
9
Configuring the JVM / App Server
Important: Ensure that the name of the probes that are defined for a machine are unique, and that each is only assigned to a single application
This can be done many ways, and will be specific to the type of server you are instrumenting (Tomcat vs. generic JAVA app).
In general, we are looking for the startup scripts that set the "JAVA_OPTIONS" parameter.
Minimally, you must set the "-javaagent" parameter; it is a best practice to also set the "-Dprobe.id" as well as the "-Dprobe.group".
/ JAVA_OPTIONS:"-javaagent:<probe_install_dir>\lib\probeagent.jar -Dprobe.id=**Unique_Name** -Dprobe.group=**Group_Name**“
10
Why Southwest Airlines Chooses to Use It?
Number one reason would be to get more visibility all the way to code level
Getting average response times split across different application layers
Easy of installation and setup
Very small footprint, you can actually run a Load Test with full instrumentation turned on
We have a lot of WebSphere and home grown Java applications
This adds on from a nominal 2 hrs to 1 day of setup time i.e. if you already have the Load Test setup
11
Issues – HP Diagnostic – Resolution Time
Mule Dispatcher
thread hitting 100%
CPU
Diagnostic
was enabled
Took 2
minutes
to identify
Issue
Response Times very
high on Complete
Purchase transactions
Diagnostic
was enabled
Took 2
days to
identify
Issue
Memory Leak in
ProductionDiagnostic
was enabled
Took 8
hours to
identify
Issue
Problem 1
Problem 2
Problem 3
12
Problem 1 – 100% CPU
Brief Problem Description:
In one of the application Layers as soon as the environment was started the CPU for a single thread hit 100%.
The thread stack trace page gave us the required information to rectify
A problem
that the
developer
was looking
at for 2 days
was resolved
in 2 minutes
13
Problem 2 – Application Response Time
Brief Problem description:
Application’s SLA’s with regards to response time were not being met.
We had a goal of 180 TPS and we were not making even 10 TPS
Using diagnostics we found that most of the time was being spent in one of the application layers.
14
Problem 2 – Application Response Time
Using Diagnostics we could isolate and see not only the total transaction times in LoadRunner but also for the same transaction how much time was spent on the server side only
Example – Complete Purchase
Average – Total = 6.838Average – Server = 5.34
15
Problem 2 – Application Response Time
The tool gives you the options to:Breakdown the Layer to – ClassesBreakdown the transaction to Server RequestShow VMBreakdown the Layer to – Server RequestShow Chain of Calls
16
Problem 2 – Application Response Time
Breakdown the Layer to – Classes
This option drills down to the Class level – Byte Protocol is the issue
17
Problem 2 – Application Response Time
Breakdown the Layer to – Methods
Java – io – OutputStream is the issue
18
Problem 2 – Application Response Time
Drill down to the Transaction Chain of calls which gives you:Method NameClass NamePackage nameLayer Name
19
Problem 2 – Application Response Time
User
User
User
User
Connection Pool
Apache
Thread Pool Client Framework
Tomcat
Connection Pool
Service
User
User
User
User
User
Locking happening on Mutual Exclusion during synchronization between tomcat and the mule layer dispatchers
20
Problem 2 – Application Response Time
Changed code to remove the lock
User
User
User
User
Connection Pool
Apache
Thread Pool Client Framework
Tomcat
Connection Pool
Service
User
User
User
User
User
21
Problem 2 – Application Response Time
A problem that the development team and
Engineers were looking at for 2 weeks with no
answer was resolved in two days
22
Problem 3 – High Memory Utilization
Brief Problem Description:Opening a holiday schedule in the scheduling application got the heap usage high and the application was constantly running out of Memory in Production
23
Problem 3 – High Memory Utilization
We saw huge fluctuations in Heap Size, GC (Garbage Collection) did not kick in till the Heap was really high up there, we did not see any Young GC happening
24
Problem 3 – High Memory Utilization
Changed the Garbage Collection policy to “gencon” from optthroughput
25
Problem 3 – High Memory Utilization
A problem that the development team and
Engineers were looking at for weeks on end,
having to restart the application and with no
answers were able to find a solution in two
sessions of 4 hours each
26
Advantages of using Diagnostic
Changes black box Load testing to white box as it give a lot more visibility
Makes identifying issues in the code or the environment a lot easier
Splits the total transaction time into different application layers making it easier to identify bottlenecks
Helps in tuning – Heap Size, GC Collection etc.
Issue identification process is a lot faster as it shows culprit code causing the problem
27
Questions
28 ©2010 Hewlett-Packard Development Company, L.P.
To learn more on this topic, and to connect with your peers after
the conference, visit the HP Software Solutions Community:
www.hp.com/go/swcommunity
29