ADVT SQL Plan Explained

download ADVT SQL Plan Explained

of 24

Transcript of ADVT SQL Plan Explained

  • 8/13/2019 ADVT SQL Plan Explained

    1/24

    ADVT SQL Plan Explained

  • 8/13/2019 ADVT SQL Plan Explained

    2/24

  • 8/13/2019 ADVT SQL Plan Explained

    3/24

    The query run for long time because of high LIO.

  • 8/13/2019 ADVT SQL Plan Explained

    4/24

  • 8/13/2019 ADVT SQL Plan Explained

    5/24

    1. Those small rows are usually the results of missing

    stats.

    2. For this specific case, the query started before

    partition stats were ready.

    3. When there are no partition stats, global stats would

    be used.

    NESTED LOOPS JOIN with bad cardinality estimate

    on first row source is a major reason for high LIO and

    CPU usage.

  • 8/13/2019 ADVT SQL Plan Explained

    6/24

    1. Here is the plan currently running, using SQL profile to force

    hash join.

    2. Note the high cost of HASH GROUP BY at the bottom.

  • 8/13/2019 ADVT SQL Plan Explained

    7/24

    Query Structure

    SELECT /*+ parallel(d,4) full(d) */

    FROM

    SOURCE_BY_SRCH_DLY_REV_MASK s,

    ( Complex View based on source_search_type_daily t1 ) t,

    (Complex View based on DM_SUMMARY_DAILY d ) dwhere

    d.datestamp >= s.start_date

    and d.datestamp

  • 8/13/2019 ADVT SQL Plan Explained

    8/24

  • 8/13/2019 ADVT SQL Plan Explained

    9/24

    Inline View D

    select mrkt_id, datestamp, SOURCE, query_source, search_type,domain, pageview_type, country_of_origin ,

    sum(pageviews) pageviews, sum(bidded_searches)bidded_searches, sum(bidded_results) bidded_results,sum(bidded_clicks) bidded_clicks, sum(revenue) revenue

    fromDM_SUMMARY_DAILY d

    where

    d.datestamp = to_date('20120715' , 'yyyymmdd' )

    and d.source like 'geosign%derp and d.mrkt_id = 0

    group by

    mrkt_id, datestamp, SOURCE, query_source, search_type,domain, pageview_type, country_of_origin

    1. Access a single partition of DM_SUMMARY_DAILY.

    2. MRKT_ID=0

    3. SOURCE uses LIKE expr.

  • 8/13/2019 ADVT SQL Plan Explained

    10/24

  • 8/13/2019 ADVT SQL Plan Explained

    11/24

    How to Calculate Cardinality

    1. Cardinality = (num_rows) * (selectivity of column 1) *(selectivity of column 2) * *(selectivity of column n)

    2. Column Selectivity =1. Without histograms or with bind value: 1/(number of distinct

    values (NDV))2. With frequency histograms: (number of buckets for the

    specified value) /total bucket number

    3. With height balance histograms: if the value occupied morethat 1 bucket, see 2. Otherwise, use the density from columnstats, but we can always use 1/NDV as reference.

    4. For inequality predicate with bind variable or function: 0.05

  • 8/13/2019 ADVT SQL Plan Explained

    12/24

    Bad Plan

    DM_SUMMARY_DAILY Partition Stats not ready

    Global: rows: 23,451,579,811 Global NDV: datestamp: 2145, mrkt_id: 25,source: 65524

    Estimate: 23,451,579,811*(1/2145)*(1/25)*(1/65524) = 6.6742, round up to 7.

    Actual Partition Stats: rows: 5,127,832, datestamp: 1, mrkt_id: 23, source: 601

    Estimated if using part stats when it was ready: 5,127,832*(1/1)*(1/23)*(1/601) =371.

    If using histograms for mrkt_id=0 (3946 out of 5551 bucket numbers):5,127,832*(3946/5551)*(1/601) = 6065

    SRC_BY_SRCH_DREV_MASK_ED No stats. Default to (block_size-cache layer)*blocks/100. block_size is 16K,

    blocks is 5. 16*1024*5/100 = 819.2. Not sure about the value of cache layer.

    SOURCE_SEARCH_TYPE_DAILY, per (datestamp,mrkt_id,source) No partition and global stats captured.

    We blamed lacking of stats was the reason. So I will skip further researchon this plan.

  • 8/13/2019 ADVT SQL Plan Explained

    13/24

    Good Plan With SQL profile

    DM_SUMMARY_DAILY Actual Partition Stats: rows: 5,193,086, datestamp: 1, mrkt_id: 21,

    source: 609 (huge diff from global stats)

    Estimated if using part stats: 5,193,086*(1/1)*(1/21)*(1/609) = 406.

    If using histograms for mrkt_id=0 (3924 out of 5615 buckets) and forsource like geosign%drep (6 out of 254 buckets):

    5,193,086*(3924/5615)*(6/254) = 85,727. SRC_BY_SRCH_DREV_MASK_ED

    Still use default 818 rows. Actual value is 61.

    SOURCE_SEARCH_TYPE_DAILY partition stats: rows: 3,312,381, datestamp: 1, mrkt_id: 26 (histograms

    for value 0: 3015 out of 5590), source: 11890

    When using hash join, with datestamp and mrkt_id=0, 3,312,381*(3015/5590) = 1,786,542 (1786K in the plan).

    When use join predicate push down with column source, for each(datestamp, mrkt_id,source) is 3,312,381*(3015/5590)*(1/11890) = 150.Here column source is treated as bind value.

  • 8/13/2019 ADVT SQL Plan Explained

    14/24

    MRKT_ID Histograms

    Data is skewed on MRKT_ID=0

  • 8/13/2019 ADVT SQL Plan Explained

    15/24

    SOURCE Histograms

    Not easy to count the actual

    buckets

  • 8/13/2019 ADVT SQL Plan Explained

    16/24

    How Oracle evaluate join orders?

    Estimate cardinalities from each row source, SRC_BY_SRCH_DREV_MASK_ED: 818

    View D on DM_SUMMARY_DAILY 406 or 85,727, depending on if histograms available or not

    View S on SOURCE_SEARCH_TYPE_DAILY 1,786,542

    Oracle normally starts from the row source with smallest table, thennext smaller one, and eventually all the combinations (factorial oftotal number of tables, here is 3! = 6).

    So in this case, if histograms is used, the first table will beSRC_BY_SRCH_DREV_MASK_ED, otherwise, it will beDM_SUMMARY_DAILY.

    Since the view on SOURCE_SEARCH_TYPE_DAILY is the last toevaluate, the cardinality estimate for it is usually not very important,but the costs for different access methods will be very important andwill be very sensitive to the output counts of the join from the othertwo tables.

  • 8/13/2019 ADVT SQL Plan Explained

    17/24

    Join Cardinality Between S and D

    Join Cardinality = (num_rowsSnum_nullS)*(num_rowsDnum_nullD)

    /max(ndv(mrkt_idS),max(ndv(mrkt_idD))

    NDV 21 is found from 10053 trace for the small

    table. It is interesting how Oracle derives thisdefault value, because it is actual NDV of theother table at partition level.

    If no histograms is used: (818-0)*(406-0)/max(21,1) = 15,814

    If histogram is used: (818-0)*(85727-0)/max(21,1) = 3,339,270

  • 8/13/2019 ADVT SQL Plan Explained

    18/24

    Join Cardinality Between S and D

    After filtered by d.datestamp >= s.start_date and d.datestamp 40

    With histograms: 8348.175 -> 8349 (plan uses 8347) Fortunately, the result is inflated by

    SRC_BY_SRCH_DREV_MASK_ED, by 818/61 = 13.4times.

    Side note: when dynamic sampling was used as attemptto resolve this issue, it gave the actual count ofSRC_BY_SRCH_DREV_MASK_ED, that is, 61. So evenwith histograms, the join cardinality estimate is only at622, not enough for Oracle to pick up the right plan.

  • 8/13/2019 ADVT SQL Plan Explained

    19/24

    FTS Cost

    FTS CPU cost formula: cost = (#SRds +#MRds*mreadtim/sreadtim +#CPUCycles/(cpuspeed*sreadtim)

    When using noworkload statistics, like in this case

    MBRC = db_file_multiblock_read_count Sreadtim = ioseektim + db_block_size/iotfrspeed

    Mreadtim = ioseektim + db_file_multiblock_read_count*db_block_size/iotfrspeed

    #SRds: number of single block reads #MRds: number of multiple block reads with size of

    db_file_multiblock_read_count.

  • 8/13/2019 ADVT SQL Plan Explained

    20/24

    Cost Estimate For View T

    FTS on SOURCE_SEARCH_TYPE_DAILY

    26,243 blocks,

    Parameters (from 10053, except CPUSPEEDNW, all default values) db_file_multiblock_read_count:16

    CPUSPEEDNW: 1583 millions instructions/sec (default is 100)

    IOTFRSPEED: 4096 bytes per millisecond (default is 4096) IOSEEKTIM: 10 milliseconds (default is 10)

    sreadtim = (10 + 16*1024/4096) = 14

    mreadtim = (10 + 16*16*1024/4096) = 74

    FTS Cost = (0+26,243*74/14) + cpu_cost = 8669.5625 + cpu_cost

    The plan used cost 8736. The difference is from cpu_cost to readrows and filter the result.

    Because view T is aggregated complex view, there is a huge costassociated with it for sorting and grouping, making the total cost at31,796.

  • 8/13/2019 ADVT SQL Plan Explained

    21/24

    Index Scan Cost

    Cost = blevel + ceiling(leaf_blocks * effective

    index selectivity) +

    ceiling(clustering_factor*effective table

    selectivity) Effective index selectivity is the calculated as

    multiplications of all leading columns inside the

    index specified in the predicates. If an index has

    more columns than the predicates, stop whenencounter the first column without

  • 8/13/2019 ADVT SQL Plan Explained

    22/24

    Cost Estimate For T

    Cost estimate via JPPD (join predicate push down), per(datestamp, mrkt_id,source), via index range scan

    Index IDX2_SOURCE_SEARCH_TYPE_DAILY Blevel: 2

    Leaf_blocks: 11818 Clustering_factor: 530,949

    Effective selectivity: sel(datestamp)*sel(mrkt_id)*sel(source) = 1* (3015/5590)*(1/11890) = 0.0000453621

    Cost = 2+ceil(0.536)+ceil(24.08) = 28

    Cardinality Estimate: 3,312,381* 0.0000453621 = 150 Because the low cardinality, GROUP BY will be in memory and

    the cost can be ignored.

  • 8/13/2019 ADVT SQL Plan Explained

    23/24

    Cost Estimate for T

    If partition stats are not ready, global stats are used (forindex IDX2_SOURCE_SEARCH_TYPE_DAILY)

    Num_rows: 2,967,427,119, blevel: 3, leaf blocks:10,109,550, clustering_factor: 213,185,700.

    NDV: datestamp: 3350, mrkt_id: 32, source: 50900. Histograms for mrkt_id for value 0: 4393 out of 9212.

    Effective index selectivity:(1/3350)*(4393/9212)*(1/50900) = 2.796692e-9

    Cardinality: 2,967,427,119*2.796692e-9 = 8.3 - > 9 Cost: 3 + ceil(10,109,550*2.796692e-9) +

    ceil(213,185,700*2.796692e-9) = 3+1+1 = 5

  • 8/13/2019 ADVT SQL Plan Explained

    24/24