Exposing and Fixing Common App Performance Problems
-
Upload
riverbed-technology -
Category
Technology
-
view
1.236 -
download
0
Transcript of Exposing and Fixing Common App Performance Problems
Take Control of Application Performance
Jon C. Hodgson Technical Director, Advanced Technology Group APM Subject Matter Expert
“Hidden in Plain Sight”
© 2015 Riverbed Technology. All rights reserved. 2
App
Java App Server
OS TCP/IP Stack
VMware
Apache
LAN
Web
.NET Worker Process
IIS Web Server
OS TCP/IP Stack
WAN
Client Browser
Remote Calls Web Service, DB etc.
Code Processing
Queuing
Hypervisor Oversubscription
Network/Bandwidth/Latency
Code Processing
Queuing
Request Payload Network/Bandwidth/Latency
BEGIN
Code Processing
Network/Bandwidth/Latency
Code Processing
Response Payload Network/Bandwidth/Latency
END Page Render Time
Packets
Code Instrumentation
Metrics
Packets
Code Instrumentation
Metrics
Packets
EUE
Anatomy of a Transaction
© 2015 Riverbed Technology. All rights reserved. 3
Crash Course: Application Architecture
VMware Hypervisor
Physical OS Resources (CPU, RAM, I/O etc.)
Operating System
Operating System Guest Operating System
Guest Operating System
OS Resources (CPU, RAM, I/O etc.) OS Resources (CPU, RAM, I/O etc.)
Operating System
OS Resources (CPU, RAM, I/O etc.)
java.exe OS Process
Java JVM (OS Process)
w3wp.exe OS Process
.NE T CLR (OS Process)
JVM Heap (Reserved RAM) CLR Heap (Reserved RAM)
Java Code .NE T Code
.NE T Application Java Application
© 2015 Riverbed Technology. All rights reserved. 4
The Flaw of Averages & Aggregates
© 2015 Riverbed Technology. All rights reserved. 5
The Flaw of Averages
A classic example of the Flaw of Averages involves the Statistician who drowned crossing a river that was, on average, 3 ft. deep
Source: The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty by Sam L. Savage, with illustrations by Jeff Danziger – http://flawofaverages.com
© 2015 Riverbed Technology. All rights reserved. 6
CPU Aggregation
0
50
100
0
50
100
0
50
100
0
50
100
0
50
100
Core 1
Core 2
Core 3
Core 4
Host CPU
Average
Runaway Thread Runaway Thread (Hyperthreading)
Intermittent CPU Spikes
`
Overloaded Host
© 2015 Riverbed Technology. All rights reserved. 7
CPU Aggregation
0
50
100
0
50
100
0
50
100
0
50
100
0
50
100
Core 1
Core 2
Core 3
Core 4
Host CPU
Average
Runaway Thread Runaway Thread (Hyperthreading)
Intermittent CPU Spikes
Overloaded Host
© 2015 Riverbed Technology. All rights reserved. 8
Data Granularity
1 sec
Runaway Thread Runaway Thread (Hyperthreading)
Intermittent CPU Spikes
Overloaded Host
0
50
100
0
50
100
15 sec Sampled
0
50
100
15 sec Averaged
© 2015 Riverbed Technology. All rights reserved. 9
Case Study: It’s Not the Database …
…Or Is It?
© 2015 Riverbed Technology. All rights reserved. 10
§ Customer’s application was running very slowly
§ Everything looked fine on the app server & database server
§ Database CPU was very low, using only 8.4% CPU
Pcpu time args 8.4 41:41 oracle (DESCRIPTION=(LOCAL=NO)(SDU=1521))
“It’s not the database”
© 2015 Riverbed Technology. All rights reserved. 11
Database CPU — Disaggregated Aggregate CPU Load ~ 9% Average
At any given moment, a single CPU is pegged at 100% while the others are mostly idle. Why is the aggregate ~ 9%? Why is oracle the top process with only 8.4% utilization?
ANSWER: 100% CPU / 12 CPUs = 8.3%
It was the database after all!
CPU
Loa
d of
CPU
s 1-
12
(Sca
le 0
-100
%)
© 2015 Riverbed Technology. All rights reserved. 12
Case Study: Forgotten freeware claims 10,000 CPUs
© 2015 Riverbed Technology. All rights reserved. 13
Customer noticed a CPU Spike hopping from core to core with AppInternals:
Hidden in Plain Sight
© 2015 Riverbed Technology. All rights reserved. 14
§ The top process was a freeware sysadmin utility, with 6.25% CPU
§ 100% CPU / 16 CPUs = 6.25% CPU
§ This utility, running on Windows 2008 servers, had not been updated since 2003
§ It was part of the default build on 10,000+ servers
Hidden in Plain Sight
© 2015 Riverbed Technology. All rights reserved. 15
…you’ll see this everywhere Now That You Know…
Android Phone
Dual Core XP
Quad Core Windows 7
8 Core Win 2012
© 2015 Riverbed Technology. All rights reserved. 16
The Power of Big Data
© 2015 Riverbed Technology. All rights reserved. 17
REMOVING THE HAYSTACK
© 2015 Riverbed Technology. All rights reserved. 18
“Slowest” Not Always “Worst”
2x slower, few transactions
12x slower, many transactions
“Slo
wes
t”
“Wor
st”
© 2015 Riverbed Technology. All rights reserved. 19
Case Study: Remote Dependency Blues
© 2015 Riverbed Technology. All rights reserved. 20
§ 3x production load test against 100 servers lasting 3 hours – 7 million front-end transactions with tens of millions of backend calls – All captured by AppInternals with call tree details
§ They could never reach their performance goals
Scalability Testing for Government Compliance Th
roug
hput
(hits
/sec
ond)
Expected Behavior Throughput would stall & then repeatedly trash until the entire environment was reset
© 2015 Riverbed Technology. All rights reserved. 21
§ Application Development identified GetQuotes.jws as the root cause
§ The team that owned that web service disputed the finding – Another APM product had previously identified this as well – This was dismissed as a “Red Herring”
Remote Dependency: GetQuotes.jws
© 2015 Riverbed Technology. All rights reserved. 22
Big Data Reveals a Back-End Pattern
Transactions which call GetQuotes.jws This pattern correlates with the load trashing
This pattern precedes every burst of traffic
Transactions which do not call GetQuotes.jws These do not show any relationship to the issue
Throughput Stall & trashing
App
Inte
rnal
s R
espo
nse
Tim
e (s
)
Load
G
ener
ator
Th
roug
hput
§ AppInternals clearly proved that GetQuotes.jws was the root cause of the thrash
§ The Application Owner used this information to force the web service team to take ownership of their issue
© 2015 Riverbed Technology. All rights reserved. 23
§ Multiple applications were affected
§ Dozens of transaction types were degraded
§ Months of effort was previously wasted chasing phantoms
Business Impact
© 2015 Riverbed Technology. All rights reserved. 24 © 2015 Riverbed Technology. All rights reserved. 24
Key Takeaways
Troubleshoot in the context of the entire stack
Averages, aggregates & sampling can mask issues
Learn to spot tell-tale patterns
“Slowest” is not always “Worst”
Leverage Big Data approaches to eliminate noise
© 2015 Riverbed Technology. All rights reserved. 25
Try instantly at www.appinternals.com No Installation Required!
Thank You
© 2015 Riverbed Technology. All rights reserved. 26