Resource MappingA Wait Time Based Methodology for
Database Performance Analysis
Prepared for NoCOUG, Fall Conference, 2004
Presented by Matt Larson
Chief Technology Officer Confio Software
2
Presentation Agenda
Introduction Conventional Tuning vs. Wait-based
Tuning Foundation: Resource Mapping
Methodology 5 Key Steps of Applying RMM RMM Advantages Conclusion
3
Who am I?
Former DBA consultant specializing in Oracle performance tuning
Co-author of three Oracle books (Oracle Development Unleashed, Oracle Unleashed 2nd Edition, Oracle8 Server Unleashed)
Co-author of two other database related books
CTO and founder of Oracle performance software company
5
Conventional Tuning
Art, not a science Ratio-based (cache hit ratios, etc.) Sometimes fruitless It’s “tuned” (I guess?) Different tuning/investigation process for
each DBA/DBA Team/Company
6
Problems with Conventional Tuning Tools
Optimize systems, not business results Conventional tools:
• V$ Views: limited visibility & granularity• Statspack: averages across entire database• Tracing: Limited to session, excessive
volume
Incorrect Data hides real results• System-wide averages• Event counters• Incomplete visibility
7
What Problems are you Trying to Solve?
Methodology addresses a common problem space: • I spend the whole week monitoring and optimizing
Oracle configurations, but I have no demonstrable results to show for it - why?
• Will more hardware make my application run faster? By how much?
• Will the new application run efficiently on the production server?
• Why does one application keep impacting my SLA compliance?
• If I could make one (or 2, 3, or 4) changes to my database to have the biggest impact, what would they be?
8
You know you are working on the wrong thing when…
After spending an agonizing week tuning Oracle buffers to minimize I/O operations, management typically rewards you with:
• A. An all expense paid vacation• B. A free lunch• C. A stale donut• D. Reward? Nobody even noticed!
9
You know you have a visibility problem when…
You measure database performance based on:
• A. Increasing trends in user response time• B. Increasing system down time• C. Increasing help desk calls• D. Increasing decibel levels from irate users
10
Your role is sub-essential to the business of your organization when… Your role in the rollout of a new
customer facing application results in:
• A. Keys to drive the CEO’s Porsche• B. Keys to use the executive restroom• C. A mop to use in the executive restroom• D. Your office has been moved to the
restroom
11
You know you are accustomed to measuring the wrong thing when…
You measure the commute time to work based on:
• A. The time it takes to get there• B. Counting the times your wheels rotate• C. Monitoring your tachometer• D. The number of speeding tickets
12
Wait-based Performance Tuning
Emerging best-practice for database tuning
Proponents include leading consultants, trainers and authors
Oracle is starting to build wait-based tuning tools into the database particularly in 10g
Tune by determining where processing time is spent
13
Oracle 10g - Moving towards wait-based
Adding wait-based columns to existing views New wait-based views
Example: v$session_wait_history
• Provides the last 10 wait events for a session• Session ID, Username, Event, Wait_Time, etc.• Used to provide wait_time for only a few events
14
RMM
Resource Mapping Methodology
A set of requirements that define what data must be captured to effectively make tuning decisions and a process for applying the data to achieve the optimal outcome
Wait Based Tuning
15
DBA Success Stories using RMM
DBA solves a “Cold Case”. Problem unresolved for 1 year with traditional tools; Solution identified in 10 minutes during hands-on training
DBA ends “Crit Sit” 2 week situation ends quickly after identification of Library Cache pin wait and load locks. Metalink identifies Oracle bug, patch successfully applied
DBA saves $700K. 90% CPU capacity initiates expansion from 12 to 24 CPU server. DBA identifies parallel queries across 16 parallel threads as source of bottleneck. CPU eliminated as constraint, no new server required.
16
Resource Mapping Methodology
Three Key Principles of RMM
1. SQL View: View all statistics at SQL statement level
2. Time View: Measure Time, not number of times a resource is utilized
3. Full View: Separately measure every resource to isolate source of problems
17
Illustrating example: SQL View Principle
Example: ‘CEO’ measuring ‘employee’ output Averaging over entire company gives no useful data Must measure each job separately DBA must manage database similarly Measure and identify bottlenecks for each SQL
independently
18
Illustrating example: Time View Principle
Example: ‘CEO’ counting ‘tasks’ vs. ‘time to complete’ Counting system statistics not meaningful Must measure Time to complete System stats (buffer size, hit ratios, I/O counts) do not
identify where database customers are waiting Identify and optimize Wait Time for each SQL as best
indicator of performance
19
Illustrating example: Full View Principle
Example: ‘CEO’ measuring results with blind spot hiding key processes
Without direct visibility, valuable info is lost Must have visibility to every process step Distinctly identify and measure each Oracle resource for
each distinct SQL
20
Track SQL Time, Not System Counters
SQL 1
SQL 2
SQL 3
Resources I/O Network RedoLocks
• Watching Counters leads to wrong conclusions: Time is more relevant
• Total System Counters hide information: Need breakdown to individual SQLs
5 R
25 R
50 Reads
Total System Counter
80K Reads
30 Minutes
15M
5M
6 M
10 M
100 Minutes
35 A
50 A
50 A
125 Attempts
4 M
200 Minutes
5M
4 M
200 Minutes
5M
5K Packets 216K Writes
21
Applying RMM for Business Results
Five Step Process focusing on what matters
1. Identify 2. Allocate 3. Quantify 4. Prioritize 5. Assign
22
Step 1: Identify
Identify SQL Statements having largest impact • (SQL View and Time View
principles) Longest wait times = most
significant “pain points” for customers
Conversely, low cache hit ratios or high latch usage may not impose high wait times for users (so why fix them?)
SQL statements prioritized by Total Wait Time
23
Step 2: Allocate
Allocate impact to real customers (internal or external)
Allocate wait time to Program, Session, Machine• SQL View principle makes
this connection
Understanding database customer and application
Programs Prioritized by Total Wait Time
24
Step 3: Quantify
How much is save in time/money if fixed? Enabled by Full View and Time View principles Soft dollar savings
• Data entry clerks• DBA time spent in problem resolution
Hard dollar savings• Reduce hardware upgrades• Meet SLA’s avoiding penality• Ensure business isn’t lost due to poor performing
or unavailable system
Quantifiable benefit ofTuning a specific statement
25
Step 4: Prioritize
If last step properly executed, this step is fairly straight forward
Allow’s DBA to cut through the clutter of potential new projects, investigations, and trials.
Better justification for priorities. (e.g. We aren’t working on your problem since this other has a higher demonstrable business impact)
26
Step 5: Assign
Assign the right people to the problem• Log_buffer waits• Network issues• Same query 10,000/hour
Enabled by Full View principle Avoid finger-pointing by accurately
assigning quickly
28
Web Server
Custom Biz Logic
Network
Database Server
Storage Box
Software Layers
Silo Monitoring
Each team uses their own tool to partially monitor their non-Oracle layers. No view across layers. Management has no clear view.
Web Team
Custom App Team
Network Team
Database/OS Teams
Storage/OS Teams
IT Management
Business Management
Sitescope
Often No Commercial Tools
HP Openview
Wait-based tuning
EMC Control Center
LIMITED VIEW
LIMITED VIEW
29
The Solution - Integrated Vision
Web Server
Custom Biz Logic
Network
Database Server
Storage Box
Web Team
Custom App Team
Network Team
Database/OS Teams
Storage/OS Teams
All teams see a complete picture of all layers and dependencies. Enables more efficient “Umbrella” solution.
RMM across the stack
IT Management
Business Management
30
RMM Achieved Business Benefits
RMM Does: Business Benefit:
35% reduction in database capacity requirement
Reduce capital investment Avoid unnecessary additionsRecovers un-used capacity
Standardizes “expert” analysis ability across entire DBA team
Reduce training & consulting costs
Quantifies performance impact
Focus tuning efforts on biggest business impacts
Identifies problem Root Cause and resolution
Assign human resources and responsibility
Anticipates + resolves performance bottlenecks
Maintain SLA and end user performance
31
Example 1: Problem Observed
Critical situation: Secure Service Center application performance unsatisfactory• Response time between 2400 and 9000
seconds• Very high network traffic (3x—4x normal),
indicating time-outs and user refreshes• “CritSit” declared: major effort to resolve
problem
32
Observations using Resource Mapping Methods
Lib cache pin wait
Lib cache load lock
Notice scale: > 8000 secs
1: Identify accumulated Waits 2: Identify specific resources used
33
Results
Library cache pin nearly unobservable
Library cache load lock no longer observable
Notice scale: < 1400 secs max
34
Results
Response time improvement from 8000 seconds (worst case) to 900 seconds
Variance improvement:• Before: response time 2400 - 8000 sec• After: response time 800 - 900 sec
35
Example 2: Performance Drain – Identify the Source
Slow response reported DBA and database focus
of delays Database problem?
No – SQL*Net Message identified as source of delay
2nd highest wait event
36
RMM Drill Down identifies source of problem
Single application generates all SQL*Net Messages
App on same server as Oracle!
Answer: Misconfiguration – TCP/IP
used within server Change to IPC, eliminate
NIC traffic and 30% of wait time
Solution requires knowing: Which SQL, What Wait Time, Which Resource
37
Example 3: Scattered Reads
Situation: LINS06 database - Hourly profile identifies high wait anomaly
3-10x higher than other periods – requires investigation
wait time42,000 seconds
10:00-11:00
38
Drill Down to Key RMM Parameters
Db file scattered
reads
Db file scattered reads
Notice scale: > 6000 secs
39
Conclusion
Look for what has an impact Resource Mapping is more that Wait
Time – Analysis must include: • SQL level granularity• Full Resource granularity
Isolating the SQL and Resource allows you to find and fix the Root Cause
DBAs can have an impact and be heroes!
40
Thank you for coming
Matt Larson
Contact Information• [email protected]• 303-938-8282 ext. 110• Company website
www.confio.com
Top Related