Resource Mapping A Wait Time Based Methodology for Database Performance Analysis Prepared for...

40
Resource Mapping A Wait Time Based Methodology for Database Performance Analysis Prepared for NoCOUG, Fall Conference, 2004 Presented by Matt Larson Chief Technology Officer Confio Software

Transcript of Resource Mapping A Wait Time Based Methodology for Database Performance Analysis Prepared for...

Resource MappingA Wait Time Based Methodology for

Database Performance Analysis

Prepared for NoCOUG, Fall Conference, 2004

Presented by Matt Larson

Chief Technology Officer Confio Software

2

Presentation Agenda

Introduction Conventional Tuning vs. Wait-based

Tuning Foundation: Resource Mapping

Methodology 5 Key Steps of Applying RMM RMM Advantages Conclusion

3

Who am I?

Former DBA consultant specializing in Oracle performance tuning

Co-author of three Oracle books (Oracle Development Unleashed, Oracle Unleashed 2nd Edition, Oracle8 Server Unleashed)

Co-author of two other database related books

CTO and founder of Oracle performance software company

4

Problems with Conventional Tuning Tools: Like the Drunk Under the Streetlight

5

Conventional Tuning

Art, not a science Ratio-based (cache hit ratios, etc.) Sometimes fruitless It’s “tuned” (I guess?) Different tuning/investigation process for

each DBA/DBA Team/Company

6

Problems with Conventional Tuning Tools

Optimize systems, not business results Conventional tools:

• V$ Views: limited visibility & granularity• Statspack: averages across entire database• Tracing: Limited to session, excessive

volume

Incorrect Data hides real results• System-wide averages• Event counters• Incomplete visibility

7

What Problems are you Trying to Solve?

Methodology addresses a common problem space: • I spend the whole week monitoring and optimizing

Oracle configurations, but I have no demonstrable results to show for it - why?

• Will more hardware make my application run faster? By how much?

• Will the new application run efficiently on the production server?

• Why does one application keep impacting my SLA compliance?

• If I could make one (or 2, 3, or 4) changes to my database to have the biggest impact, what would they be?

8

You know you are working on the wrong thing when…

After spending an agonizing week tuning Oracle buffers to minimize I/O operations, management typically rewards you with:

• A. An all expense paid vacation• B. A free lunch• C. A stale donut• D. Reward? Nobody even noticed!

9

You know you have a visibility problem when…

You measure database performance based on:

• A. Increasing trends in user response time• B. Increasing system down time• C. Increasing help desk calls• D. Increasing decibel levels from irate users

10

Your role is sub-essential to the business of your organization when… Your role in the rollout of a new

customer facing application results in:

• A. Keys to drive the CEO’s Porsche• B. Keys to use the executive restroom• C. A mop to use in the executive restroom• D. Your office has been moved to the

restroom

11

You know you are accustomed to measuring the wrong thing when…

You measure the commute time to work based on:

• A. The time it takes to get there• B. Counting the times your wheels rotate• C. Monitoring your tachometer• D. The number of speeding tickets

12

Wait-based Performance Tuning

Emerging best-practice for database tuning

Proponents include leading consultants, trainers and authors

Oracle is starting to build wait-based tuning tools into the database particularly in 10g

Tune by determining where processing time is spent

13

Oracle 10g - Moving towards wait-based

Adding wait-based columns to existing views New wait-based views

Example: v$session_wait_history

• Provides the last 10 wait events for a session• Session ID, Username, Event, Wait_Time, etc.• Used to provide wait_time for only a few events

14

RMM

Resource Mapping Methodology

A set of requirements that define what data must be captured to effectively make tuning decisions and a process for applying the data to achieve the optimal outcome

Wait Based Tuning

15

DBA Success Stories using RMM

DBA solves a “Cold Case”. Problem unresolved for 1 year with traditional tools; Solution identified in 10 minutes during hands-on training

DBA ends “Crit Sit” 2 week situation ends quickly after identification of Library Cache pin wait and load locks. Metalink identifies Oracle bug, patch successfully applied

DBA saves $700K. 90% CPU capacity initiates expansion from 12 to 24 CPU server. DBA identifies parallel queries across 16 parallel threads as source of bottleneck. CPU eliminated as constraint, no new server required.

16

Resource Mapping Methodology

Three Key Principles of RMM

1. SQL View: View all statistics at SQL statement level

2. Time View: Measure Time, not number of times a resource is utilized

3. Full View: Separately measure every resource to isolate source of problems

17

Illustrating example: SQL View Principle

Example: ‘CEO’ measuring ‘employee’ output Averaging over entire company gives no useful data Must measure each job separately DBA must manage database similarly Measure and identify bottlenecks for each SQL

independently

18

Illustrating example: Time View Principle

Example: ‘CEO’ counting ‘tasks’ vs. ‘time to complete’ Counting system statistics not meaningful Must measure Time to complete System stats (buffer size, hit ratios, I/O counts) do not

identify where database customers are waiting Identify and optimize Wait Time for each SQL as best

indicator of performance

19

Illustrating example: Full View Principle

Example: ‘CEO’ measuring results with blind spot hiding key processes

Without direct visibility, valuable info is lost Must have visibility to every process step Distinctly identify and measure each Oracle resource for

each distinct SQL

20

Track SQL Time, Not System Counters

SQL 1

SQL 2

SQL 3

Resources I/O Network RedoLocks

• Watching Counters leads to wrong conclusions: Time is more relevant

• Total System Counters hide information: Need breakdown to individual SQLs

5 R

25 R

50 Reads

Total System Counter

80K Reads

30 Minutes

15M

5M

6 M

10 M

100 Minutes

35 A

50 A

50 A

125 Attempts

4 M

200 Minutes

5M

4 M

200 Minutes

5M

5K Packets 216K Writes

21

Applying RMM for Business Results

Five Step Process focusing on what matters

1. Identify 2. Allocate 3. Quantify 4. Prioritize 5. Assign

22

Step 1: Identify

Identify SQL Statements having largest impact • (SQL View and Time View

principles) Longest wait times = most

significant “pain points” for customers

Conversely, low cache hit ratios or high latch usage may not impose high wait times for users (so why fix them?)

SQL statements prioritized by Total Wait Time

23

Step 2: Allocate

Allocate impact to real customers (internal or external)

Allocate wait time to Program, Session, Machine• SQL View principle makes

this connection

Understanding database customer and application

Programs Prioritized by Total Wait Time

24

Step 3: Quantify

How much is save in time/money if fixed? Enabled by Full View and Time View principles Soft dollar savings

• Data entry clerks• DBA time spent in problem resolution

Hard dollar savings• Reduce hardware upgrades• Meet SLA’s avoiding penality• Ensure business isn’t lost due to poor performing

or unavailable system

Quantifiable benefit ofTuning a specific statement

25

Step 4: Prioritize

If last step properly executed, this step is fairly straight forward

Allow’s DBA to cut through the clutter of potential new projects, investigations, and trials.

Better justification for priorities. (e.g. We aren’t working on your problem since this other has a higher demonstrable business impact)

26

Step 5: Assign

Assign the right people to the problem• Log_buffer waits• Network issues• Same query 10,000/hour

Enabled by Full View principle Avoid finger-pointing by accurately

assigning quickly

27

Resource Mapping Methodology

RMM

Wait Based Tuning Network, Storage, Application, Web, etc.

28

Web Server

Custom Biz Logic

Network

Database Server

Storage Box

Software Layers

Silo Monitoring

Each team uses their own tool to partially monitor their non-Oracle layers. No view across layers. Management has no clear view.

Web Team

Custom App Team

Network Team

Database/OS Teams

Storage/OS Teams

IT Management

Business Management

Sitescope

Often No Commercial Tools

HP Openview

Wait-based tuning

EMC Control Center

LIMITED VIEW

LIMITED VIEW

29

The Solution - Integrated Vision

Web Server

Custom Biz Logic

Network

Database Server

Storage Box

Web Team

Custom App Team

Network Team

Database/OS Teams

Storage/OS Teams

All teams see a complete picture of all layers and dependencies. Enables more efficient “Umbrella” solution.

RMM across the stack

IT Management

Business Management

30

RMM Achieved Business Benefits

RMM Does: Business Benefit:

35% reduction in database capacity requirement

Reduce capital investment Avoid unnecessary additionsRecovers un-used capacity

Standardizes “expert” analysis ability across entire DBA team

Reduce training & consulting costs

Quantifies performance impact

Focus tuning efforts on biggest business impacts

Identifies problem Root Cause and resolution

Assign human resources and responsibility

Anticipates + resolves performance bottlenecks

Maintain SLA and end user performance

31

Example 1: Problem Observed

Critical situation: Secure Service Center application performance unsatisfactory• Response time between 2400 and 9000

seconds• Very high network traffic (3x—4x normal),

indicating time-outs and user refreshes• “CritSit” declared: major effort to resolve

problem

32

Observations using Resource Mapping Methods

Lib cache pin wait

Lib cache load lock

Notice scale: > 8000 secs

1: Identify accumulated Waits 2: Identify specific resources used

33

Results

Library cache pin nearly unobservable

Library cache load lock no longer observable

Notice scale: < 1400 secs max

34

Results

Response time improvement from 8000 seconds (worst case) to 900 seconds

Variance improvement:• Before: response time 2400 - 8000 sec• After: response time 800 - 900 sec

35

Example 2: Performance Drain – Identify the Source

Slow response reported DBA and database focus

of delays Database problem?

No – SQL*Net Message identified as source of delay

2nd highest wait event

36

RMM Drill Down identifies source of problem

Single application generates all SQL*Net Messages

App on same server as Oracle!

Answer: Misconfiguration – TCP/IP

used within server Change to IPC, eliminate

NIC traffic and 30% of wait time

Solution requires knowing: Which SQL, What Wait Time, Which Resource

37

Example 3: Scattered Reads

Situation: LINS06 database - Hourly profile identifies high wait anomaly

3-10x higher than other periods – requires investigation

wait time42,000 seconds

10:00-11:00

38

Drill Down to Key RMM Parameters

Db file scattered

reads

Db file scattered reads

Notice scale: > 6000 secs

39

Conclusion

Look for what has an impact Resource Mapping is more that Wait

Time – Analysis must include: • SQL level granularity• Full Resource granularity

Isolating the SQL and Resource allows you to find and fix the Root Cause

DBAs can have an impact and be heroes!

40

Thank you for coming

Matt Larson

Contact Information• [email protected]• 303-938-8282 ext. 110• Company website

www.confio.com