8/13/2019 ADVT SQL Plan Explained
1/24
ADVT SQL Plan Explained
8/13/2019 ADVT SQL Plan Explained
2/24
8/13/2019 ADVT SQL Plan Explained
3/24
The query run for long time because of high LIO.
8/13/2019 ADVT SQL Plan Explained
4/24
8/13/2019 ADVT SQL Plan Explained
5/24
1. Those small rows are usually the results of missing
stats.
2. For this specific case, the query started before
partition stats were ready.
3. When there are no partition stats, global stats would
be used.
NESTED LOOPS JOIN with bad cardinality estimate
on first row source is a major reason for high LIO and
CPU usage.
8/13/2019 ADVT SQL Plan Explained
6/24
1. Here is the plan currently running, using SQL profile to force
hash join.
2. Note the high cost of HASH GROUP BY at the bottom.
8/13/2019 ADVT SQL Plan Explained
7/24
Query Structure
SELECT /*+ parallel(d,4) full(d) */
FROM
SOURCE_BY_SRCH_DLY_REV_MASK s,
( Complex View based on source_search_type_daily t1 ) t,
(Complex View based on DM_SUMMARY_DAILY d ) dwhere
d.datestamp >= s.start_date
and d.datestamp
8/13/2019 ADVT SQL Plan Explained
8/24
8/13/2019 ADVT SQL Plan Explained
9/24
Inline View D
select mrkt_id, datestamp, SOURCE, query_source, search_type,domain, pageview_type, country_of_origin ,
sum(pageviews) pageviews, sum(bidded_searches)bidded_searches, sum(bidded_results) bidded_results,sum(bidded_clicks) bidded_clicks, sum(revenue) revenue
fromDM_SUMMARY_DAILY d
where
d.datestamp = to_date('20120715' , 'yyyymmdd' )
and d.source like 'geosign%derp and d.mrkt_id = 0
group by
mrkt_id, datestamp, SOURCE, query_source, search_type,domain, pageview_type, country_of_origin
1. Access a single partition of DM_SUMMARY_DAILY.
2. MRKT_ID=0
3. SOURCE uses LIKE expr.
8/13/2019 ADVT SQL Plan Explained
10/24
8/13/2019 ADVT SQL Plan Explained
11/24
How to Calculate Cardinality
1. Cardinality = (num_rows) * (selectivity of column 1) *(selectivity of column 2) * *(selectivity of column n)
2. Column Selectivity =1. Without histograms or with bind value: 1/(number of distinct
values (NDV))2. With frequency histograms: (number of buckets for the
specified value) /total bucket number
3. With height balance histograms: if the value occupied morethat 1 bucket, see 2. Otherwise, use the density from columnstats, but we can always use 1/NDV as reference.
4. For inequality predicate with bind variable or function: 0.05
8/13/2019 ADVT SQL Plan Explained
12/24
Bad Plan
DM_SUMMARY_DAILY Partition Stats not ready
Global: rows: 23,451,579,811 Global NDV: datestamp: 2145, mrkt_id: 25,source: 65524
Estimate: 23,451,579,811*(1/2145)*(1/25)*(1/65524) = 6.6742, round up to 7.
Actual Partition Stats: rows: 5,127,832, datestamp: 1, mrkt_id: 23, source: 601
Estimated if using part stats when it was ready: 5,127,832*(1/1)*(1/23)*(1/601) =371.
If using histograms for mrkt_id=0 (3946 out of 5551 bucket numbers):5,127,832*(3946/5551)*(1/601) = 6065
SRC_BY_SRCH_DREV_MASK_ED No stats. Default to (block_size-cache layer)*blocks/100. block_size is 16K,
blocks is 5. 16*1024*5/100 = 819.2. Not sure about the value of cache layer.
SOURCE_SEARCH_TYPE_DAILY, per (datestamp,mrkt_id,source) No partition and global stats captured.
We blamed lacking of stats was the reason. So I will skip further researchon this plan.
8/13/2019 ADVT SQL Plan Explained
13/24
Good Plan With SQL profile
DM_SUMMARY_DAILY Actual Partition Stats: rows: 5,193,086, datestamp: 1, mrkt_id: 21,
source: 609 (huge diff from global stats)
Estimated if using part stats: 5,193,086*(1/1)*(1/21)*(1/609) = 406.
If using histograms for mrkt_id=0 (3924 out of 5615 buckets) and forsource like geosign%drep (6 out of 254 buckets):
5,193,086*(3924/5615)*(6/254) = 85,727. SRC_BY_SRCH_DREV_MASK_ED
Still use default 818 rows. Actual value is 61.
SOURCE_SEARCH_TYPE_DAILY partition stats: rows: 3,312,381, datestamp: 1, mrkt_id: 26 (histograms
for value 0: 3015 out of 5590), source: 11890
When using hash join, with datestamp and mrkt_id=0, 3,312,381*(3015/5590) = 1,786,542 (1786K in the plan).
When use join predicate push down with column source, for each(datestamp, mrkt_id,source) is 3,312,381*(3015/5590)*(1/11890) = 150.Here column source is treated as bind value.
8/13/2019 ADVT SQL Plan Explained
14/24
MRKT_ID Histograms
Data is skewed on MRKT_ID=0
8/13/2019 ADVT SQL Plan Explained
15/24
SOURCE Histograms
Not easy to count the actual
buckets
8/13/2019 ADVT SQL Plan Explained
16/24
How Oracle evaluate join orders?
Estimate cardinalities from each row source, SRC_BY_SRCH_DREV_MASK_ED: 818
View D on DM_SUMMARY_DAILY 406 or 85,727, depending on if histograms available or not
View S on SOURCE_SEARCH_TYPE_DAILY 1,786,542
Oracle normally starts from the row source with smallest table, thennext smaller one, and eventually all the combinations (factorial oftotal number of tables, here is 3! = 6).
So in this case, if histograms is used, the first table will beSRC_BY_SRCH_DREV_MASK_ED, otherwise, it will beDM_SUMMARY_DAILY.
Since the view on SOURCE_SEARCH_TYPE_DAILY is the last toevaluate, the cardinality estimate for it is usually not very important,but the costs for different access methods will be very important andwill be very sensitive to the output counts of the join from the othertwo tables.
8/13/2019 ADVT SQL Plan Explained
17/24
Join Cardinality Between S and D
Join Cardinality = (num_rowsSnum_nullS)*(num_rowsDnum_nullD)
/max(ndv(mrkt_idS),max(ndv(mrkt_idD))
NDV 21 is found from 10053 trace for the small
table. It is interesting how Oracle derives thisdefault value, because it is actual NDV of theother table at partition level.
If no histograms is used: (818-0)*(406-0)/max(21,1) = 15,814
If histogram is used: (818-0)*(85727-0)/max(21,1) = 3,339,270
8/13/2019 ADVT SQL Plan Explained
18/24
Join Cardinality Between S and D
After filtered by d.datestamp >= s.start_date and d.datestamp 40
With histograms: 8348.175 -> 8349 (plan uses 8347) Fortunately, the result is inflated by
SRC_BY_SRCH_DREV_MASK_ED, by 818/61 = 13.4times.
Side note: when dynamic sampling was used as attemptto resolve this issue, it gave the actual count ofSRC_BY_SRCH_DREV_MASK_ED, that is, 61. So evenwith histograms, the join cardinality estimate is only at622, not enough for Oracle to pick up the right plan.
8/13/2019 ADVT SQL Plan Explained
19/24
FTS Cost
FTS CPU cost formula: cost = (#SRds +#MRds*mreadtim/sreadtim +#CPUCycles/(cpuspeed*sreadtim)
When using noworkload statistics, like in this case
MBRC = db_file_multiblock_read_count Sreadtim = ioseektim + db_block_size/iotfrspeed
Mreadtim = ioseektim + db_file_multiblock_read_count*db_block_size/iotfrspeed
#SRds: number of single block reads #MRds: number of multiple block reads with size of
db_file_multiblock_read_count.
8/13/2019 ADVT SQL Plan Explained
20/24
Cost Estimate For View T
FTS on SOURCE_SEARCH_TYPE_DAILY
26,243 blocks,
Parameters (from 10053, except CPUSPEEDNW, all default values) db_file_multiblock_read_count:16
CPUSPEEDNW: 1583 millions instructions/sec (default is 100)
IOTFRSPEED: 4096 bytes per millisecond (default is 4096) IOSEEKTIM: 10 milliseconds (default is 10)
sreadtim = (10 + 16*1024/4096) = 14
mreadtim = (10 + 16*16*1024/4096) = 74
FTS Cost = (0+26,243*74/14) + cpu_cost = 8669.5625 + cpu_cost
The plan used cost 8736. The difference is from cpu_cost to readrows and filter the result.
Because view T is aggregated complex view, there is a huge costassociated with it for sorting and grouping, making the total cost at31,796.
8/13/2019 ADVT SQL Plan Explained
21/24
Index Scan Cost
Cost = blevel + ceiling(leaf_blocks * effective
index selectivity) +
ceiling(clustering_factor*effective table
selectivity) Effective index selectivity is the calculated as
multiplications of all leading columns inside the
index specified in the predicates. If an index has
more columns than the predicates, stop whenencounter the first column without
8/13/2019 ADVT SQL Plan Explained
22/24
Cost Estimate For T
Cost estimate via JPPD (join predicate push down), per(datestamp, mrkt_id,source), via index range scan
Index IDX2_SOURCE_SEARCH_TYPE_DAILY Blevel: 2
Leaf_blocks: 11818 Clustering_factor: 530,949
Effective selectivity: sel(datestamp)*sel(mrkt_id)*sel(source) = 1* (3015/5590)*(1/11890) = 0.0000453621
Cost = 2+ceil(0.536)+ceil(24.08) = 28
Cardinality Estimate: 3,312,381* 0.0000453621 = 150 Because the low cardinality, GROUP BY will be in memory and
the cost can be ignored.
8/13/2019 ADVT SQL Plan Explained
23/24
Cost Estimate for T
If partition stats are not ready, global stats are used (forindex IDX2_SOURCE_SEARCH_TYPE_DAILY)
Num_rows: 2,967,427,119, blevel: 3, leaf blocks:10,109,550, clustering_factor: 213,185,700.
NDV: datestamp: 3350, mrkt_id: 32, source: 50900. Histograms for mrkt_id for value 0: 4393 out of 9212.
Effective index selectivity:(1/3350)*(4393/9212)*(1/50900) = 2.796692e-9
Cardinality: 2,967,427,119*2.796692e-9 = 8.3 - > 9 Cost: 3 + ceil(10,109,550*2.796692e-9) +
ceil(213,185,700*2.796692e-9) = 3+1+1 = 5
8/13/2019 ADVT SQL Plan Explained
24/24