Bottlenecks exposed web app db servers
-
Upload
upender-dravidum -
Category
Documents
-
view
2.589 -
download
0
Transcript of Bottlenecks exposed web app db servers
^Bottlenecks Exposed: The Most Frequently Found Performance
Problems – and How to Nail Them!
Dan Downing, VP Testing Services
MENTORAAtlanta • Boston • DC • San Jose
404.250.6515 • www.mentora.com
Bottlenecks Exposed – Title Slide
Web Application
Copyright Mentora 2001
2
• Identify common website performance bottlenecks:• Source (what component they occur on)• Symptom (how you know there’s a problem)• Causes (what creates the problem)• Measurements (how to nail it)• Cures (how to make it go away)
• Illustrate with examples of B2C, B2B, B2E cases
Audience: Performance Engineer, Load Testing Expert, with intermediate experience
Objectives
3
Terms & Concepts
• Application Performance Testing: A repeatable methodology for volume-simulationof real-world applications in a customer’s environment to yield performance results that can be implemented to deliver efficient utilization of computing resources.
• Scalability: The demonstrated ability (or lack thereof) of a system (or component) to yield the same response time of a business process irrespective of the magnitude of the load applied to the system.
• Bottleneck: A hardware component or process or software of the system-under-test that is causing performance degradation and low scalability under load.
• Resource Utilization: The quantification of a shared computing resource being consumed by an application process or component.
• Symptom: The outwardly visible but unquantifiable effect of a performance bottleneck
• Cause: The specific and measurable factor yielding one or more symptoms.• Cure: The specific action applied to the Cause that will measurably improve the
visible symptom.• Measurement: A numeric value of a performance-affecting factor that can be
quantified by a monitoring tool and related to a specific component of the system-under-test.
4
Symptoms
• “It’s Too Slow”– As perceived from slow browser response by functional
testers– As measured by poor scalability during first low-load test– As experienced (too late!) by low productivity by real
production users• “It’s broken”
– Page ‘never returns’ after button press– Web server errors (404, 500…)– Application error messages in application logs
Symptoms are usually very unspecific!
5
3-Tier Environment
• Network– Firewall, load balancer, routers, network interface
cards, cabling between all components• Web Server Tier
– One or more (usually many) low capacity computers that receive, route, and display results of http requests from visitors’ browsers
• Application Server Tier– One or more (often 2) medium-high capacity computers
that receives, applies business logic to, and returns to the web server the results of the http request
• Database Server Tier– One or more (usually one with redundant stand-by) high
capacity computers that operate database software, and access database (often on large disk arrays) for servicing user data requests
Web Server Sun E220
DB Server Sun E4500
App Server Sun E420
Oracle
6
Performance Bottleneck Sources
Network
Web ServerApp Server
DB Server
30%16%>30%
16%12%21-30%
25%40%11-20%
27%29%<10%
NtwkWeb Srvr
How often?
What in your experience* do you find as the relative distribution of bottlenecks?
9%7%>60%
29%21%41-60%
32%48%21-40%
21%11%11-20%
7%11%<10%
DB Srvr
App Srvr
How often?
* Poll results of 56 Mercury Conference ’01 attendees of intermediate to advanced experience.
7
Performance Bottleneck Sources
In my experience, it’s the application! (~80% of the time)
Network8% Web Server
12%
App Server35%
DB Server45%
- % distribution is a SWAG based on experience testing dozens of apps
Most of the application code resides here…
21-40% (48%)
21-40% (32%)
11-20% (40%)
>30%% (30%)
Highest ranges from poll shown in color
8
Database (Simple) Anatomy
Data
Data
Data
Log
BIC
lient Com
mB
uffer
QueryParser
QueryOpti-mizer
QueryPlan
Storage
QueryExecutor
Metadata cache
WriteBuffer
Shared Memory
DataCache
Disk Array (e.g. Sun A10000)
DB Server (e.g. Sun 4500 quad cpu 2 GB memory)
DBConnection
Pool
App Server (e.g. Sun 420)
Data
SQL
Data
9
Key DB Server Measurements
Should be ~80% of available user memory on Server, and should average < 75%; else, add!DB Memory
Should be balanced across all drives, else indicates ‘db hot spot’ on large, hi-access tables, which need to be striped across multiple drives; avg 20% below disk IO saturation level
Server I/O
Correlates with cache-hit ratio; should decrease run-to-run as cache is tunedPhysical reads/writes
A measure of the number of open client queries; should be low, or could be an indicator of inefficient query model
Open cursors
A measure of the data-intensiveness of queries; read bytes should be <50% of sent bytes, else indicates complex application queries should become stored procedures
SQL*Net bytes rcvd/sent from/to client
A general indicator of db load handling, and should be compared run-to-runTransactions/second
Should be low (<20%); else could indicate under-sized query cache, old/no optimizer statistics, or flawed query model in app server function
Parse-to-execute ratio
Should be low for normal transactions (can be high for reporting functions); else indicates that indexes missing or poorly designed
Table scan blocks/sec
Should be zero at target loads; if not, indicated transaction model design problemDeadlocks
Should be hi – 90-95% range; else data cache sized too low and too much physical IOCache Hit Ratio
Should be low and constant, else yields virtual memory disk IO, which indicates insufficient memory allocated to DB processes
Server Page Faults/s.
Memory available should stay constant and average below 70-80%; else add memoryServer Memory
Shows raw horsepower consumption on the server; should average 70-80%; else add cpus!Server CPUImpact/RangeMeasurement
10
DB Server Causes & Cures
Pinpoint and correct!Inefficient access method; too many DB connections; small comm buffers;…
Other
Fix application transaction codeDeadlocks non-zero /errors in error logDeadlocks
rerun optimizer statisticshigh table scan blocks; many slow functions
Out-of-date statistics
Increase cache sizeLow cache-hit ratio, hi physical readsData cache too small
Review/fix transaction logic; modify DB locking strategy
Hi blocked transactions, high table locksInefficient concurrency model
Raise size of query plan cacheHi parsed-to-executed queries ratioQuery plan cache too small
Find/add/fix table indexeshigh table scan blocks; slow functionMissing/ineffective indexes
Tune query prepares in App server / code
Hi open cursors; hi bytes sent from clientOveruse of row-at-a-time processing
Reconfigure DB (add memory, write processes, threads, …)
Low correlation btw DB and Server resource utilization; unbalanced I/O
Inefficient DB configuration
Convert client SQL to stored procedures | optimize slow q’s
Many slow pages; hi 'bytes recvd' by db server; low db cpu; or: many slow queries
Inefficient SQL query model
Analyze query plan, optimize query
Slow page (>10 sec) which ties to a specific function, thus an SQL query; hi db cpu | IO
Inefficient SQL statement
CureMeasurementCause
11
Inefficient SQL statement24%
Inefficient SQL query model17%
Inefficient DB configuration14%
Hi row-at-a-time logic12%
Missing indexes9%
Inefficient concurrency model
7%
Query cache too small7%
Data cache too small5%
Other5%
Database Server Causes
~60% of the time the time it’s bad SQL or bad indexes!
12
Example:B2B Supply Chain Management
• Symptom:– Transactions that return list data running
very slowly; they don’t scale• Measurement: (using LR Oracle Monitor)
– Hi table scan blocks– Low index fast full scans
• Cure:– Add additional indexes– Design indexes so queries can be resolved
with index table columns w/o accessing base table
– Enable fast scan Oracle parameter
Web Server Sun E220
DB Server Sun E420
App Server Sun E420
Oracle
Apache
WebLogic
Oracle
13
LR Oracle Monitor
Table scan blocks average = 12
Index fast full scans = 0
14
App Server (Simple) Anatomy
Connection M
gr
PresentationManager
ObjectCache
DB ServerApp Server (e.g. usually two; Sun 420 dual cpu 1GB memory)
Data
SQL
Web Server
Client Requests
html pages
Business Logic
PresentationLogic
Security Mgr
Transaction Mgr
DB
Conn. M
gr
Messaging M
gr
Com
munic. M
gr
15
Key App Server Measurements
Should see all app server instance doing similar amount of work; else indicates load balacingproblem
Load balancing
Should contain low/no error messages, low warnings; else indicates application problemsApplication log
Memory should track App Server memory, should stabilize at target load at 70% average, else possible memory leak or add memory
Server Memory
Active sessions should rise with load, and stabilize at less than Total; if does not stabilize, indicates insufficient processing power to keep up with DB; if maxes out, too few connections
Active/Total DB Pool Connections
A general indicator of app server load as evidenced by web server request volume, and should be compared run-to-run and track with load applied
Requests/second
Should be a relatively low ratio vs. non-secure transactions (<15%?); else, eating up cpu, bwSSL transactions/sec
Should be rise as load increases, stabilize at target load, approximate vendor target/instance; else, decrease inactive session keep-alive time
Active/Total Sessions
Memory should rise as active sessions grow, should shrink in garbage collection cycle, and should stabilize at target load at 70% average, else possible memory leak or add memory
App Server memory
Should be hi – 90% range; else data/object caches sized too low and too much physical IOCache Hit Ratios
Should be low and constant, else yields virtual memory disk IO, which indicates insufficient memory allocated to App Server processes
Server Page Faults/s.
Shows raw horsepower consumption on the server; should average 70-80%; else add cpus!Server CPUImpact/RangeMeasurement
16
App Server Metrics & CuresCureMeasurementCause
Pinpoint and correct!Low OS resources; erratic transaction performance
Other
Change object access methodSlow object creationInefficient object access method
Review/relax app securityHi calls on port 7002Inefficient security model
Pinpoint & diagnose longest running business processes
Slow specific business functionInefficiently coded transaction
Raise DB connections; lower no. of App Server instances
Steadily rising active connections, hi cpu utilization
Poorly configured DB connection pool
Add cpus, memory; decrease no. App server instances
Hi cpu, memory, I/O utilizationInsufficient hardware resources
Validate proper JVM-to-app server match; Increase data & object caches; add HW memory
Low correlation btw App and HW resource utilization; overall poor performance
Poorly configured App Server
Tune session keep-alive settingSteadily rising active sessionsSub-optimal session model
Tune app server load balancingSpikes in transaction timesInefficient garbage collection
Find and fix memory faulty application code
Memory utilization rises steadily, doesn't recover
Memory leak
17
App Server Causes
Memory leak15%
Inefficient garbage collection
12%
Sub-optimal session model12%
Poorly configured App Server12%
Insufficient hardware resources
10%
Poorly configured DB connection pool
9%
Inefficiently coded transaction
11%
Inefficient DB access architecture
4%
Inefficient object access method
5%
Other10%
60% of the time: object caching, SQL, db connection pool; 20% of the time: inefficient application server
18
Example:B2C Large Retail Web Store
Web Server Sun E420
DB Server Sun E4500
App Server Sun E420
Oracle
• Symptom:– App server memory leak
• Measurement:– Steadily increasing, non-recovering
memory usage in Dynamo console– Memory exhausted and app server dies
over 8 hour run• Solution:
– Test individual functions– Isolate errant function not releasing
memory– Fix code!– Re-test to validate fix (longevity test)
Apache
ATG Dynamo
Oracle
19
Web Server Metrics & Cures
CureMeasurementCause
Add cpus, memory; add web servers; distribute content; add specialized servers (images, streaming media…)
Hi cpu, memory, I/O; timeout errors
Insufficient hw capacity
Tune web server configurationHi I/O, hi memory utilization, low throughput
Poorly configured server
Review/revise load balancing policiesUneven utilization across web servers
Unbalanced load across servers
Review/relax secure transaction model
Memory utilization >70%, low throughput; hi port 443 calls
Hi SSL transactions
Diagnose App, DB serversLow OS resource utilization, overall poor throughput
Other
Reduce keep-alive time; correct transaction design
Hi ip connections per active session
Inefficient transaction design
Diagnose / fix applicationBroken link errorsBroken links
Direct firewall and user traffic to different ports
Hi firewall-to-web server trafficSecurity too tight
20
Web Server Causes
Security too tight8% Broken links
8%
Inefficient transaction design
11%
Other12%
Hi SSL transactions13%
Unbalanced load across servers
15%
Poorly configured server15%
Insufficient hw capacity18%
Major contributor: Secure transactions; often: load balancing; sometimes: high-resource specialized functions (external links, email, chat)
21
Example:B2E Collaborating Communities
Web/ App Server Dell 1550
DB Server Dell 2450
SQL Server
IIS/Visual Basic
SQL Server
Cisco Load Director
• Symptom:– Slow overall performance– DB server low activity
• Measurement:– Web/App server resources maxed out– Non-scalable transaction times
• Solution:– Short-term: Move “Chat” function to
dedicated server– Long-term: Re-architect system in java,
separate Web and App tiers, introduce dedicated server for chat and email functions
22
Network Metrics & Cures
Review/tune configuration of NICs, Routers, other devices
Hi latency values in network delay monitor; low throughput
Poor network architecture
CureMeasurementCause
?????? Other
Tune NIC buffers; add 2nd NIC for failover heartbeat
Low throughput btw serversPoorly configured/insufficient network interface cards
Loosen security policies; redesign application security
High traffic btw firewall & servers
Security too tight
Get hoster to raise bw ceiling; increase system bw; add NICs for failover functions
Low, maxed throughput; high collision rate
Insufficient overall bandwidth
Revise load balancing policyUneven load at web serversLoad balancing ineffective
23
Network Causes
Load balancing ineffective22%
Insufficent overall bandwidth
13%
Security too tight15%
Poorly configured/insufficient NICs
10%
Other20%
Poor network architecture20%
No single major cause; often problem is load balancing, security, or network architecture.
24
Web Server Sun E420
DB Server Sun E4500
App Server Sun E420
Oracle
Example:B2C On-line Printing Services
• Symptom:– Low transaction performance scalability
under load– High latency across load balancer
• Measurement:– Unbalanced load on web server tier
• Solution:– Replace load balancer (bad hardware)– Change load balancer policies from IP-
based to server-load based
Cisco Load Director
25
Monitoring Tools
• LoadRunner– Transaction performance monitor– Server resource monitor– Oracle, SQL Server, selected app servers monitors– Network delay monitor
• Database performance monitoring tools– Quest Oracle Instance Monitor, Embarcadero, BMC DB Patrol
• App Server System Console (from app server vendor)• Java object monitoring tools
– JProbe, Performasure (Sitraka)• Network Analyzer (aka network sniffer)• Operating system utilities
– Unix top, sar, vmstat, iostat– 2000/NT Perfmon
26
Tool Example:WebLogic Console
27
Lessons Learned
1. 80% of the time it is the application or system software, notthe infrastructure!
2. Make friends with your app server, db server, and hardware monitoring tools!
3. Application architect, DBA, and App Server experts are indispensable and must be involved during load tests!
4. Arrive armed with the Top 10 Things to check for each component!
5. Id the measurements you need to be able to make6. Systems Engineer with networking, firewall, and load
balancer expertise is very handy!