DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray...

47
DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely.” -- Hercule Poirot

Transcript of DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray...

Page 1: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

DATABASE MYSTERIES: BATCH

JOB DIAGNOSTICS

Chris LawsonMay 4, 2007

"It is the brain, the little gray cells on which one must rely.”

-- Hercule Poirot

Page 2: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Our Agenda for Today

• A fun trivia quiz • Overview: different types of

diagnostics• Preparing for Oracle Statspack• Running Statspack reports• Limitations of Statspack• Some helpful Sql scripts “Use the little gray

cells Hastings”

And now, the Trivia Quiz . . .

• Case study: Diagnosing “eco_out” batch job.• Awarding of prize for correct trivia quiz answer

Page 3: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

TRIVIA QUIZOracle’s Statspack does not show …

• Snapshot start/end times• High disk i/o Sql• High # of execution Sql• High # of parse Sql• Init.ora parameters• “ITL” locking events• Top 5 wait events• Database privilege problems• Database memory statistics

Vote now--prize at end of presentation!

“I say, Poirot!”

Page 4: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Here are 2 Kinds of Diagnostics

• Simple run-time summaries, provided by application, show runtime patterns.

• Graphs show server load or disk i/o over time.

• Oracle-specific utilities, such as Statspack, that identify problem areas.

• Custom scripts that identify long-running transactions & poorly tuned Sql.

Diagnostics that identify problems:

Diagnostics that help resolve problems:

Page 5: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Run-Time Summaries

• Most batch job/data warehouse jobs provide runtime stats.

• Some even show the runtimes for each step.• BillPay lists batch job stats in table Wfjob_Ctl:

SQL> desc Wfjob_Ctl

Name Null? Type ------------------------- -------- --------------- BATCH_DATE NOT NULL VARCHAR2(8) JOB_NAME NOT NULL VARCHAR2(40) JOB_RUN_NUMBER NOT NULL NUMBER JOB_STATUS VARCHAR2(2) JOB_RETURN_CODE NUMBER JOB_START_TIME DATE JOB_END_TIME DATE HOSTNAME VARCHAR2(20)

Page 6: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Sample Diagnostic: Find All Slow Batch Jobs

Select Job_name,To_char(job_start_time,'mon-dd Hh24:mi') STTME,To_char(job_end_time,'mon-dd Hh24:mi') ENDTME,Round((job_end_time-job_start_time)*24*60) MINSFrom Wfjob_ctlWhere Job_start_time > Sysdate - .5And (Job_end_time-job_start_time)*24*60 > 20Order By 2

Show jobs in past ½ day running > 20 mins:

Page 7: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Find Slow Batch Jobs

JOB_NAME STTME ENDTME MINS------------------------------- ------------ ------------ -------BP.westbp.gen_dailyrun.C dec-18 04:34 dec-18 06:08 94BP.westbp.sam_gen_tkconf.C dec-18 06:00 dec-18 06:36 36BP.westbp.prc_instalerts.C dec-18 06:01 dec-18 06:33 32BP.westbp.cfin_prc_svcrs2.C dec-18 08:11 dec-18 08:31 20

List jobs in past ½ day running longer than 20 mins:

Page 8: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Sample Report: Find ecoout Runtimes

select job_name,to_char(job_start_time, 'DAY') wekday,TO_CHAR(job_start_time,'MON-DD hh24:mi') STTME,TO_CHAR(job_end_time,'MON-DD hh24:mi') ENDTME,ROUND((job_end_time-job_start_time)*24*60) MINSfrom wfjob_ctlWHERE JOB_START_TIME > SYSDATE - 3and job_name like '%ecoout_gen_pmt%'order by job_start_time

JOB_NAME WEKDAY STTME ENDTME MINS------------------------ --------- ------------ ------------ ----BP.westbp.ecoout_gen_pmt SATURDAY DEC-16 08:01 DEC-16 08:48 48

Runtimes for ecoout job for last 3 days:

Page 9: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Find alert Runtimes

select job_name, TO_CHAR(job_start_time,'MON-DD hh24:mi') STTME, TO_CHAR(job_end_time,'MON-DD hh24:mi') ENDTME, ROUND((job_end_time-job_start_time)*24*60) MINS from wfjob_ctl WHERE JOB_START_TIME > SYSDATE - 2 and job_name like '%alrt%' order by job_start_time

JOB_NAME STTME ENDTME MINS---------------------------- ------------ ------------ -----BP.westbp.alrt_prc_daily.C DEC-16 10:43 DEC-16 11:39 56BP.westbp.alrt_cln_pmt.C DEC-16 11:39 DEC-16 11:44 5BP.westbp.alrt_cln_daily.C DEC-16 11:39 DEC-16 13:08 89BP.westbp.alrt2_gen_pmtmd.C DEC-18 03:31 DEC-18 03:46 16BP.westbp.alrt2_gen_ebnr.C DEC-18 03:31 DEC-18 03:38 7

Runtimes for all alerts job for last 2 days:

Page 10: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Another Type of Diagnostic:Server Load Charts

• This is a good way to spot trends and can help in capacity planning.

• We’ve used these graphs to launch investigation after odd spikes.

• Server load graphs help isolate the time period having trouble.

• This method is typically a visual inspection.

Page 11: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Problem Identification: Server Load Graphs

Page 12: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Spikes Help Identify Problem

Page 13: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Bill-Pay Late January Problem

Root cause: CSA users began running up to 8 bad SQL simultaneously

Page 14: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Introduction to Oracle Statspack

• Statspack is an Oracle utility that lists performance statistics for a given database. This is called a “Statspack Report.”

• You define the time period of interest by specifying the start-time and end-time.

• The stats come from taking regular “snapshots” of the database--typically once per hour (changeable.)

Page 15: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Oracle Statspack: Snaps

• The snaps taken are listed in Oracle table Stats$Snapshot• Here’s how to find the snaps taken in last day:

Select Snap_id, To_char(snap_time,'dd-mon-yy-hh24:mi') SnaptimeFrom Stats$snapshotWhere Snap_time > Sysdate 1Order By 1

SNAP_ID SNAPTIME

-------- --------------- 4237 21-dec-06-02:49 4238 21-dec-06-03:49 4239 21-dec-06-04:49 * * *

Page 16: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Starting a Statspack Report

SQL> @spreport

Specify the Begin and End Snapshot Ids~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Enter value for begin_snap: 1001Enter value for end_snap: 1002

Specify the Report Name~~~~~~~~~~~~~~~~~~~~~~~The default report file name is sp_1001_1002. To use

this name,press <return> to continue, otherwise enter an

alternative.Enter value for report_name: TESTREPORT

Page 17: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Statspack Key Sections

• Top 5 resource-consuming events• All events causing waits• Top SQL ordered by logical i/o• Top SQL ordered by physical i/o• Top SQL ordered by # of executions• Aggregate db statistics• Most-read objects• Most-locked objects

Page 18: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Statspack: Top 5 Events

Top 5 Timed Events

~~~~~~~~~~~~~~~~~~ % Total

Event Waits Time (s) Ela Time

---------------------------- ---------- ----------- --------

db file sequential read 6,672,182 25,852 62.21

CPU time 11,887 28.61

db file scattered read 2,611,855 2,831 6.81

log file sync 63,184 208 .50

ARCH wait on SENDREQ 261 199 .48

Single-block reads

Multi-block readsWrite of transaction after Commit

Page 19: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Statspack: List of “Top Sql”

• If you know the time period, Statspack will usually zero-in on the problem.

• Shows “resource hogs” that consume lots of disk, logical reads, physical reads, or huge number of executions.

• Nearly all bad Sql will be detected—but a few can slip-through undetected!

• E.g., what if user is blocked for a long time? That consumes neither disk or much logical i/o, so it’s missed.

• More exceptions later.

Page 20: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Statspack: Top Sql Example

Top resource consumer—logical reads CPU Elapsed Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value--------------- ------------ -------------- ------ -------- --------- ---------- 195,093,132 109,303 1,784.9 17.5 8750.55 97170.32 3351508081

select cur1.pae_id, cur1.mpa_id, cur1.mpa_nickname, nvl(cur1.mpasta_id, -1) mpa_status, cur3.pmt_start_date last_pay_date, cur3.pmt_amt pmt_amt_complete, cur2.pmt_start_date schedule_date,

* * *

CPU per Elap per Executions Rows Processed Rows per Exec Exec (s) Exec (s) Hash Value------------ --------------- ---------------- ----------- ---------- ---------- 589,109 589,066 1.0 0.00 0.00 3012166360Module: p_direct_check_inbound_pkgINSERT INTO CBPAYMENT_INSTRUCTION_AUDIT (PMTINSTR_ID, PMT_ID, INSTR_ID, MEM_ID, PAE_ID, MPA_ID, INSTRSTA_ID, PMTINSTRBAT_ID, WF_

* * *

Top resource consumer—# of executions

Page 21: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Another type of Report: sprepsql

• This report shows you actual execution plan for a particular Sql run previously.

• You first identify the Sql by its “hash value,” shown in the first statspack report:

Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value--------------- ------------ -------------- ------ -------- --------- ---------- 22,988,572 1,530 15,025.2 4.1 1939.33 1957.49 178837896

UPDATE CBNOTICE_MEMBER SET NTCSTA_ID = 3, NTCMEM_MODIFIER_ID = :1, NTCMEM_MODIFIED_DATE = :2 WHERE NTC_ID = :3 AND MEM_ID = :4

Note this value

Page 22: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

sprepsql: Sample Outputfor an Update Statement

--------------------------------------------------------------------------------| Operation | PHV/Object Name | Rows | Bytes| Cost |--------------------------------------------------------------------------------|UPDATE STATEMENT |----- 1119347130 ----| | | 529076 ||UPDATE | | | | || HASH JOIN SEMI | | 925K| 24M| 529076 || TABLE ACCESS FULL |MEMBER_ACTIVATION | 954K| 18M| 732 || TABLE ACCESS FULL |CBPAYMENT | 92M| 705M| 499796 |--------------------------------------------------------------------------------

Statement Total Per Execute Total --------------- --------------- ------ Buffer Gets: 20,474,697 20,474,697.0 3.17 Disk Reads: 19,895,404 19,895,404.0 22.28 Rows processed: 4,401 4,401.0 CPU Time(s/ms): 3,808 3,807,950.0 Elapsed Time(s/ms): 6,369 6,369,227.7

UPDATE MEMBER_ACTIVATION MA SET PMT_ACTIVATION_INDICATOR = :B2 ,PMT_ACTIVATION_DATE = SYSDATE , MODIFIED_BY = :B1 , MODIFIED_DATE = SYSDATE WHERE MA.PMT_ACTIVATION_INDICATOR IS NULL AND EXISTS (SELECT NULL FROM

* * *

Page 23: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Statspack Good Practices:Tips & Traps

• Old Snapshots are occasionally purged, so you can’t go back forever!

• Certain problems can fly “under the radar.”• Example: Statspack lists resource-intensive Sql.

But what if the problem is due to the cumulative effect of millions of similar (not identical) Sql?

Statspack will miss this because each Sql is below the reporting threshold.

Page 24: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Statspack Good Practices:Tips & Traps

Here’s one that most of us missed:

Oracle’s statistics are reset if table stats are gathered, so Statspack will rate that Sql as perfect!

SECS EXEC BUFFER_GETS DISK_READS SQL_TEXT----- ---- ----------- ---------- ---------------------------------- 2 2 10694 10683 select count(*) from chris_waivers

SECS EXEC BUFFER_GETS DISK_READS SQL_TEXT----- ---- ----------- ---------- ---------------------------------- 0 0 0 0 select count(*) from chris_waivers

Ha! I knew that!

After Analyze of “Chris Waivers”

Page 25: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Statspack Good Practices:Summary

• Statspack is an excellent diagnostic aid in identifying possible bottlenecks—not solving them.

• Statspack usually uncovers the resource drivers for a specified period of time.

• Remember, however, it will not find everything--a few things can slip by.

• Like any tool, it can’t fix the design for you.

Page 26: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

My Favorite Diagnostic Scripts

1. Show all Sql running on database.

2. Show the object a user is waiting on.

3. List cumulative wait events for a user.

4. List resource-intensive Sql run recently.

5. Show average speed of disk i/o.

I use custom scripts far more often than Statspack. Here are some favorites:

Page 27: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Script 1: Show all Active Sql

• This gives you an idea of some of the critical Sql for an application.

• If you see the same Sql over and over, it either is inefficient, or run extremely often.

• On most OLTP systems, only a handful of sessions are usually active at one point—because everyone else has already got their answer.

Page 28: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Select DISTINCT Sid, username, substr(sql_text,1,200) stext

From V$Session, V$Sql

Where status = 'ACTIVE'

And username is not null

And v$session.sql_hash_value = hash_value

and v$session.sql_address = v$sql.address

and sql_text not like '%sql_text%'

Example: Show all Active Sql

SID USERNAME STEXT----- ------------- ------------------------------------------------------- 128 BMAPP SELECT PM_ACCT_NO FROM CBPAYMENT_METHOD WHERE MEM_ID = :1 FOR UPDATE NOWAIT

This eliminates this query itself

This eliminates Oracle background processes

Page 29: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

• This is especially helpful in isolating bottlenecks for a particular Sql.

• This shows you where your Sql is stuck—you may be surprised!

• On FFIEC, this revealed that most queries were always reading one particular table (or its indexes.)

Script 2: Show Object User is Waiting on

Page 30: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

SeleCT DISTINCT username,object_name, sql_text

From V$Session, V$Sql, dba_objects o

Where v$session.status = 'ACTIVE'

And username is not null

and o.object_id = row_wait_obj#

And v$session.sql_hash_value = hash_value

and v$session.sql_address = v$sql.address

and username <> 'SYS'

Actual Example: Show Object User is Waiting on

USERNAME OBJECT_NAME SQL_TEXT------------- ------------------ ----------------------------------------ADMAPP PMT_PK WITH bills AS (SELECT /*+ Materialize */

bls.bls_external_billid bls_external_billid, blr.blr_bspbiller_idblr_bspbiller_id, bls.mem_id mem_id, bls.mpa_id mpa_id, bls.bls_id bls_id, TO_CHAR (bls.bls_first_seen, 'yyyyMMddHHmiss') bls_first_seen, bls.bls_notify_sent_to_bsp bls_notify_sent_to_bsp FROM cbbill_s

Scanning the primary key index

Page 31: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Script 3: Session Cumulative Wait Events

• This summarizes the bottlenecks for the totality of a particular session.

• Especially useful for a long-running batch job.• This shows how much time is due to each type of

bottleneck.• This helps you avoid solving problems that aren’t

really the bottleneck.

Page 32: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Select Sid, Event, Total_waits,time_waited/100 Timewaited, Average_wait/100 Avgwait,

Round(100*total_waits/Time_waited) RATE

From V$session_event

Where Sid = 339 And Time_waited > 10000

And Event Not Like '%Net%‘ Order By Timewaited

Actual Example:Session Cumulative Wait Events

SID EVENT TOTAL_WAITS TIMEWAITED AVGWAIT RATE---- ------------------------- ----------- ---------- ---------- ---------- 339 db file scattered read 2912329 534.3 0 5451 339 log file sync 138296 795.71 .01 174 339 enqueue 2594 6937.74 2.67 0

Blocked by another user

wait times are in centiseconds

Page 33: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Script 4:

Resource-Intensive Sql

• Pretty similar to the Statspack Sql report. This often highlights serious problems.

• Quickly shows you “resource hogs” for a database—big CPU or disk users.

• You set a threshold—such as all SQL consuming over 1 million disk reads, or SQL that has run over 10 hours.

• This script exemplifies why we don’t need to “guess” at the cause of performance problems.

Although finding root cause is usually easy, applying the fix may not be.

Page 34: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Select Round(elapsed_time/1000000/1) Secs,

Rows_processed Rowct, Executions, Buffer_gets, Disk_reads, Sql_text

From V$sql

Where Elapsed_time > 9910000000 And Executions > 0

And Abs(buffer_gets)/(Rows_processed+.01) > 100

Order By Elapsed_time

Actual Example: Resource-Intensive Sql

SECS ROWCT EXECUTIONS BUFFER_GETS DISK_READS ---- ----- ---------- ----------- ---------- 82136 0 20814258 62545409 3814708

select vcrypttrac0_."USER_NODE_LOG_ID" as USER1_, vcrypttrac0_."REQUEST_ID" as REQUEST2_25_, vcrypttrac0_."CLIENT_DEVICE_ID" as CLIENT3_25_, * * *

Surprise! Sql doesn’t do anything!(actual production case)

Page 35: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Script 5: Show Disk i/o Rate

• On many systems, disk i/o is a significant performance driver.

• You want to eliminate the disk i/o first, but it’s still helpful to know roughly how fast you can perform a typical read.

• Numbers typically range from 100-300 single-block reads per second

• Note: Oracle uses the non-intuitive term, “sequential read” for a single read.

• Multi-block reads are called “scattered reads.”

Page 36: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Select EVENT, TOTAL_WAITS, TIME_WAITED,

Round(100*total_waits/Time_waited) Rate

From V$system_event Where Time_waited > 1000

And Event Like '%Db File Sequential Read%'

Example: Show Disk I/O Ratefor 2 Different Systems

EVENT TOTAL_WAITS TIME_WAITED RATE------------------------- ----------- ----------- ----------db file sequential read 7485200484 2059793947 363

EVENT TOTAL_WAITS TIME_WAITED RATE------------------------ ----------- ----------- ----------db file sequential read 663982417 2481360621 27

BILLPAY

FFIEC Why is disk so slow?

Page 37: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Case Study: Diagnosing “eco” Batch

• Called “ecoout_gen_pmt,” it is critical Billpay batch, starting about 03:30 daily.

• Tight SLA—Especially for “big payment” days. Job typically runs several hours.

Business is concerned that SLA is at risk, especially for Tuesday runs.

Page 38: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Eco-PaymentBatch Runtimes

WEKDAY STTME ENDTME MINS--------- ------------ ------------ ------TUESDAY JUL-11 03:47 JUL-11 06:33 165TUESDAY JUL-18 03:48 JUL-18 09:48 360TUESDAY JUL-25 03:41 JUL-25 06:04 143TUESDAY AUG-01 03:51 AUG-01 07:55 244

Querying wfjob_ctl shows historical runtimes and confirms issue:

SLA failures

Page 39: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Ecoout: Identify Root Cause

The “resource-intensive” script finds this Sql running during the time period:

UPDATE CBPAYMENT PMT_UPDATE SET (COLS) =(SELECT “stuff” FROM CBMEMBER M, CBMEMBER_PAYEE_ACCOUNT MPA , CBELEMENT E, ETC.

Script shows that Sql ran for 3 hours, consuming 25 million logical reads to process 250k rows

This suggests problem with the way Oracle executed this transaction.

Page 40: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Eco-out Root Cause

• Running “Explain plan” for Sql shows Oracle combines 2 indexes:

“Bitmap conversion from ROWID.”

• Execution plan shows these indexes Used: Pmtinstr_instrsta_id_idx Pmtinstr_instr_idx

?? Non-Selective!

• So Oracle combines two indexes, one of which is a terrible choice.

• The optimizer should simply do the obvious: just use the one “good” index.

• BTW: This has happened more than one.

Page 41: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Eco-out Root Cause

• This optimizer tactic is a notorious weakness in Oracle 9i--some shops turn-off this feature.

• In some Billpay batch jobs, we issue an “Alter Session” command to disallow this feature.

• Our fix: Sql hint to specify the desired index:

/*+index(I PMTINSTR_INSTRSTA_ID_IDX) */

So what happened after Sql hint applied?

Page 42: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Eco-PaymentNew Runtimes

PERFORMANCE FIX: ECOOUT

050

100150200250300350400

1 2 3 4 5 6 7 8 9 10 11 12

TUESDAY RUN

RU

NT

IME

Series1

Page 43: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

CHALLENGE QUESTIONOracle’s Statspack does not show …

• Snapshot start/end times• High disk i/o Sql• High # of execution Sql• High # of parse Sql• Init.ora parameters• Database privilege problems• Database memory statistics• “ITL” locking events• Top wait events

Page 44: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

PRIVILEGE PROBLEMS

QUIZ ANSWER

“This is NOT part of Statspack.”

Page 45: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

“Case Closed” :Good Diagnostic Aids

• Consider logs that show job runtime• Statspack:

– spreport shows resource usage– sprepsql shows execution plan– In Oracle 10g, use AWR

• Custom scripts that show resource usage such as high CPU or disk i/o

• Don’t forget server CPU load graphs.

Page 46: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Case Closed:Tips & Traps

• There isn’t just “one way” to always identify performance bottlenecks.

• Statspack cannot detect everything; some things fly “under the radar.”

• No one has a script to cover every possible problem.

• So, develop a “toolkit” of approaches—custom Sql scripts, graphs, tools, etc.

Page 47: DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS Chris Lawson May 4, 2007 "It is the brain, the little gray cells on which one must rely. -- Hercule Poirot.

Questions?

“The world is full of obvious things which nobody by any chance ever observes.” Sherlock Holmes, The Hound of the Baskervilles.