Finding SQL execution outliers

40
Measuring SQL Execution Outliers (to track performance better) Maxym Kharchenko

description

This presentation is about tracking performance for OLTP queries, that typically take ) and how to capture them in ORACLE database.

Transcript of Finding SQL execution outliers

Page 1: Finding SQL execution outliers

Measuring SQL Execution Outliers(to track performance better)

Maxym Kharchenko

Page 2: Finding SQL execution outliers

500 ms

Page 3: Finding SQL execution outliers

A very important SQL

Typical elapsed time: 100 ms*Bad* elapsed time: > 200 ms

MERGE INTO orders_table USING dualON (dual.dummy IS NOT NULL AND id = :1 AND p_id = :2 AND order_id = :3 AND relevance = :4 AND …

Page 4: Finding SQL execution outliers

SQL Latency

Page 5: Finding SQL execution outliers

SQL latency metrics

Elapsed Elapsed Time Time (s) Executions per Exec (s) %Total %CPU %IO SQL Id---------------- -------------- ------------- ------ ------ ------ ------------- 635.5 10,090 0.1 31.5 16.5 77.6 fskp2vz7qrza2Module: MYmodulemerge into orders_table using dual on (dual.dummy is not null and id = :1and p_id = :2 and order_id = :3 and relevance = :4 and …

Page 6: Finding SQL execution outliers

What exactly is “average” ?

Page 7: Finding SQL execution outliers

What exactly is “average” ?

Aver

age

Page 8: Finding SQL execution outliers

Most typical value

95 % of all executions

“average” = “most typical”

Page 9: Finding SQL execution outliers

Probability: >= 200ms: 0.6 %

You can make predictions with “average”

Average: 100 ms

Page 10: Finding SQL execution outliers

Average is a pretty decent metric

Page 11: Finding SQL execution outliers

As long as distribution is normal

Page 12: Finding SQL execution outliers

Measured Execution Times

Page 13: Finding SQL execution outliers

Measured Execution Times

Page 14: Finding SQL execution outliers

Measured Execution Times

Page 15: Finding SQL execution outliers

Measured Execution Times

Page 16: Finding SQL execution outliers

Measured Execution Times

Page 17: Finding SQL execution outliers

What if the real distribution is not normal ?

Page 18: Finding SQL execution outliers

People feel *BAD* variancenot the average

Page 19: Finding SQL execution outliers

Percentiles

“average”

Page 20: Finding SQL execution outliers

Percentiles

“average”

99th percentile

Page 21: Finding SQL execution outliers

Average: (what we think)typical latency is: 102 ms

p99: The worst 1% of executions is at least as bad as: 532 ms

Page 22: Finding SQL execution outliers

SQL latency (but now with: p99)

Page 23: Finding SQL execution outliers

Ok, so how do we measure percentiles ?

Page 24: Finding SQL execution outliers

You need to capture individual query times

Page 25: Finding SQL execution outliers

Application side tracing

DbApp

start_exec = time()

Elapsed = time() – start_exec

Exec: 4fucahsywt13m:19731969

o “True” user experienceo Precise

(captures “everything”)

o (Lots of)DIY by developers

o Captures *not only* db time

Page 26: Finding SQL execution outliers

Server side (10046) tracing

DbApp

start_exec = time()

Elapsed = time() – start_exec

Exec: 4fucahsywt13m:19731969

o Precise(captures “everything”)

o Detailed: breakdown by events and SQL “stages”

o Cumbersome to process (lots of individual trace files and “events”)

Page 27: Finding SQL execution outliers

Sampling

• v$sql.elapsed_time

Executions Elapsed Time CPU Time IO Time App Time

58825 298,986,074 20,326,883 279,055,026 5,635

Executions Elapsed Time CPU Time IO Time App Time

58826 299,003,156 20,327,883 279,071,108 5,635

Executions Elapsed Time CPU Time IO Time App Time

1 17,082 1,000 16,082 0

Page 28: Finding SQL execution outliers

Sampling

with number_generator as ( select level as l from dual connect by level <= 1000), target_sqls as ( select /*+ ordered no_merge use_nl(s) */…from number_generator i, gv$sql s

Page 29: Finding SQL execution outliers

Sampling

SQL> @sqlc fdcz4kx11era5

                                     Gets    Ela (ms) LAST  C#   Plan hash   EXECUTIONS       pExec       pExec Active---- ----------- ------------ ----------- ----------- ------------   2   245875337    1,700,541      444.62      137.57 +0 00:00:01   7   245875337            2       23.50       21.39 +0 01:15:16   3   245875337            1       26.00       10.38 +27 04:42:52

Page 30: Finding SQL execution outliers

SamplingSQL> @ssql fdcz4kx11era5 2 1000

           Elapsed      CPU           IO      App       CCS  Ex         TIME     TIME         TIME     TIME     TIME   Pct

- --- ------------ -------- ------------ -------- -------- -----     1          330        0            0        0        0     0    1          340    1,000            0        0        0  3.33    1          786      999            0        0        0  6.67    1        1,518    2,000          188        0        0    10*   2       11,963    1,999       11,103        0        0 13.33    1       14,851    4,999       10,908        0        0 16.67    1       15,724    2,000       14,780        0        0    20    1       16,471    2,000       15,163        0        0 23.33…    1       90,256    5,999       87,365        0        0 86.67    1       97,171    2,000       93,585        0       27    90    1      120,635    1,999      117,660        0        0 93.33    1      142,201    6,999      138,853        0        0 96.67    1      167,552    4,998      165,333        0        0   100

Page 31: Finding SQL execution outliers

Sampling

SQL> @ssql2 fdcz4kx11era5 2 50000 avg 10              Elapsed                                CPU          IO Pct    Execs TIME                                  TIME        TIME --- -------- ------------------------------ ----------- ----------- p0       148 .23-7.11                               .89        2.30 p10      148 7.18-14.03                            1.11        9.44 p20      146 14.03-20.26                           1.48       15.82 p30      143 20.39-29.01                           1.86       22.92 p40      146 29.1-40.73                            1.91       32.63 p50      143 40.77-55.21                           2.37       45.50 p60      142 55.22-77.92                           3.15       63.09 p70      145 77.99-113.33                          3.58       90.72 p80      141 113.41-173.64                         4.46      136.22 p90      138 174.34-634.15                         6.83      245.30

Page 32: Finding SQL execution outliers

Sampling

SQL> @ssql3 fdcz4kx11era5 2 50000 avg 10

                                                  Elapsed         CPU          IO Bucket Range (ms)              Execs Graph             TIME        TIME        TIME ------ -------------------- -------- ---------- ----------- ----------- -----------      1 .19-51.81                 686 ##########       22.39        1.51       20.91      2 51.81-103.44              303 ####             76.37        2.89       73.75      3 103.44-155.07             198 ##              127.59        3.55      124.23      4 155.07-206.69              91 #               174.25        4.68      169.82      5 206.69-258.32              46                 224.91        5.47      220.11      6 258.32-309.95              22                 267.26        6.90      261.46      7 309.95-361.57               7                 339.04        9.00      331.30      8 361.57-413.2                8                 264.19        6.90      258.24      9 413.2-464.83                3                 318.62        6.00      311.41     10 464.83-516.45               2                 492.26       10.00      483.53

Page 33: Finding SQL execution outliers

The scripts are here

http://intermediatesql.com

Page 34: Finding SQL execution outliers

Samplingwith i_gen as ( select level as l from dual connect by level <= &REPS), target_sqls as ( select /*+ ordered

no_merge use_nl(s) */…from i_gen i, gv$sql s

o SQL access to datao Simplified time breakdowno Can capture “hours”

o Slightly imprecise (captures 90-95 % of runs)

o x$ data: “suspect” ?

Page 35: Finding SQL execution outliers

Monitoring

SQL> desc v$session sql_id sql_exec_start sql_exec_id

v$sql_monitor

/*+ MONITOR */

Page 36: Finding SQL execution outliers

MonitoringNAME VALUE DESCRIPTION------------------------------ ------- ------------------------------------------------------------_sqlmon_binds_xml_format default format of column binds_xml in [G]V$SQL_MONITOR_sqlmon_max_plan 480 Maximum number of plans entry that can be monitored. Defaults to 20 per CPU_sqlmon_max_planlines 300 Number of plan lines beyond which a plan cannot be monitored_sqlmon_recycle_time 60 Minimum time (in s) to wait before a plan entry can be recycled_sqlmon_threshold 5 CPU/IO time threshold before a statement is monitored. 0 is disabled

o Precise(captures “everything”)

o SQL access to data

o Capture size is limited (think: “seconds”)

Page 37: Finding SQL execution outliers

Can I find worst performers in ASH ?

10

2

3

4

5

6

7

8

9

1

11

1, 2, 3, 7 3, 5, 7, 9 7

Page 38: Finding SQL execution outliers

Can I find worst performers in ASH ?

Page 39: Finding SQL execution outliers

Takeaways

• Percentiles are better performance metrics than averages

• Percentile calculation: requires capturing (most of) individual SQL runs

• A number of ways exist to capture and measure individual SQL runs

Page 40: Finding SQL execution outliers

Thank you!