OTN tour 2015 AWR data mining
-
Upload
andrejs-vorobjovs -
Category
Education
-
view
238 -
download
2
Transcript of OTN tour 2015 AWR data mining
AWR DB performanceData Mining
Yury Velikanov Oracle DBA
Mission
Let you remember/consider AWR next time you troubleshoot
Performance issue!
AWR Agenda
• Introduction & Background
• Examples, Examples, Examples
• Concept & Approach
• More examples
• Q & A
[LinkedIn, twitter, slideshare, blog, email, mobile, …]
Few words about Yury
Yury Oracle
Few words about Google
Google careers
Few words about Google
Background
• AWR is one of many RDBMS performance data sources
• Sometimes it isn’t the best source (aggregation)• SQL Extended trace (event 10046)
• RAW trace• tkprof• TRCAnlzr [ID 224270.1]• Method-R state of art tools
• PL/SQL Profiler• LTOM (Session Trace Collector)• others
• Sometimes it is the best/efficient source!• Sometimes it is the only one available!
Background
• Once I was called to troubleshoot high load• Connected to the database I saw 8 active processes running for 6
hours in average• Used 10046 event for all 8 processes for 15 minutes• Found several SQLs returning 1 row million times• Passed the results to development asking to fix the logic• Spent ~2 hours to find where the issue was
• Next day a colleague asked me• Why did you use 10046 and spent 2 hours?• He used AWR report and came up with the same
answer in less than 5 minutes
• Lesson learned: Right tool for the right case !
When should you consider AWR mining?
• General resource tuning (high CPU, IO utilization)• Find TOP resource consuming SQLs• You are asked to reduce server load X times
• You would like to analyze load patterns/trends
• You need to travel back in time and see how things progressed
• You don’t have any other source of performance information
• AWR report doesn’t provide you information at the right angle/dimension or are not available (Grid Control, awrrpt.sql)
• AWR SQL Execution Plans historical information analysis
When it is better to use other methods?
• You need to tune a procedure/function/activity
• You have a repeatable test case
• The problem could be repeated in an idle environment• There is no concurrent resource usage
• SQL Trace (10046) is way better troubleshooting method in such cases
• When application doesn’t use bind variables
TOP CPU/IO Consuming SQLs ?
select s.SQL_ID, sum(CPU_TIME_DELTA), sum(DISK_READS_DELTA),count(*)
from DBA_HIST_SQLSTAT
group by SQL_ID
order by sum(CPU_TIME_DELTA) desc
/
SQL_ID SUM(CPU_TIME_DELTA) SUM(DISK_READS_DELTA) COUNT(*)------------- ------------------- --------------------- ----------05s9358mm6vrr 27687500 2940 1f6cz4n8y72xdc 7828125 4695 25dfmd823r8dsp 6421875 8 153h1rjtcff3wy1 5640625 113 192mb1kvurwn8h 5296875 0 1bunssq950snhf 3937500 18 157xa8wfych4mad 2859375 0 2...
TOP CPU Consuming SQLs ?
select
s.SQL_ID, sum(s.CPU_TIME_DELTA), sum(s.DISK_READS_DELTA),count(*)
from DBA_HIST_SQLSTAT s
group by
s.SQL_IDorder by
sum(s.CPU_TIME_DELTA) desc
TOP CPU Consuming SQLs ?
select * from (select
s.SQL_ID, sum(s.CPU_TIME_DELTA), sum(s.DISK_READS_DELTA),count(*)
from DBA_HIST_SQLSTAT s
group by
s.SQL_IDorder by
sum(s.CPU_TIME_DELTA) desc)where rownum < 11/
TOP CPU Consuming SQLs ?
select * from (select
s.SQL_ID, sum(s.CPU_TIME_DELTA), sum(s.DISK_READS_DELTA),count(*)
from DBA_HIST_SQLSTAT s, DBA_HIST_SNAPSHOT p
where 1=1and s.SNAP_ID = p.SNAP_ID
and EXTRACT(HOUR FROM p.END_INTERVAL_TIME) between 8 and 16
group by
s.SQL_IDorder by
sum(s.CPU_TIME_DELTA) desc)where rownum < 11/
TOP CPU Consuming SQLs ?
select * from (select
s.SQL_ID, sum(s.CPU_TIME_DELTA), sum(s.DISK_READS_DELTA),count(*)
from DBA_HIST_SQLSTAT s, DBA_HIST_SNAPSHOT p
where 1=1and s.SNAP_ID = p.SNAP_ID
and EXTRACT(HOUR FROM p.END_INTERVAL_TIME) between 8 and 16
and p.END_INTERVAL_TIME between SYSDATE-7 and SYSDATE
group bys.SQL_ID
order by sum(s.CPU_TIME_DELTA) desc
)where rownum < 11/
TOP CPU Consuming SQLs ?
select * from (select
s.SQL_ID, sum(s.CPU_TIME_DELTA), sum(s.DISK_READS_DELTA),count(*)
from DBA_HIST_SQLSTAT s, DBA_HIST_SNAPSHOT p, DBA_HIST_SQLTEXT t
where 1=1and s.SNAP_ID = p.SNAP_IDand s.SQL_ID = t.SQL_IDand EXTRACT(HOUR FROM p.END_INTERVAL_TIME) between 8 and 16and t.COMMAND_TYPE != 47 –- Exclude PL/SQL blocks from outputand p.END_INTERVAL_TIME between SYSDATE-7 and SYSDATE
group bys.SQL_ID
order by sum(s.CPU_TIME_DELTA) desc
)where rownum < 11/
52.8 %
1.2. 3.
4.
5.
TOP CPU Consuming SQLs ?
select SQL_ID, sum(CPU_TIME_DELTA), sum(DISK_READS_DELTA),count(*)
from DBA_HIST_SQLSTAT
group by SQL_ID
order by sum(CPU_TIME_DELTA) desc
/
SQL_ID SUM(CPU_TIME_DELTA) SUM(DISK_READS_DELTA) COUNT(*)------------- ------------------- --------------------- ----------05s9358mm6vrr 27687500 2940 1f6cz4n8y72xdc 7828125 4695 25dfmd823r8dsp 6421875 8 153h1rjtcff3wy1 5640625 113 192mb1kvurwn8h 5296875 0 1bunssq950snhf 3937500 18 157xa8wfych4mad 2859375 0 2...
5 SlidesConcept & Approach
AWR = DBA_HIST_% objects
• 223 => 11.2.0.4.0• 243 => 12.1.0.1.0
• I use just few on a regular basis• DBA_HIST_ACTIVE_SESS_HISTORY• DBA_HIST_SEG_STAT• DBA_HIST_SQLSTAT• DBA_HIST_SQL_PLAN• DBA_HIST_SYSSTAT• DBA_HIST_SYSTEM_EVENT
• Most of the views contain data snapshots from V$___ views
• DELTA columns (e.g. DISK_READS_DELTA)• DBA_HIST_SEG_STAT• DBA_HIST_SQLSTAT
- V$ACTIVE_SESSION_HISTORY- V$SEGMENT_STATISTICS- V$SQL- V$SQL_PLAN- V$SYSSTAT ( ~SES~ )- V$SYSTEM_EVENT ( ~SESSION~ )
AWR Things to keep in mind …
• The data are just snapshots of V$ views
• Data collected based on thresholds (default top 30)
• Some data is excluded based on thresholds
• Some data may not be in SGA at the time of snapshot
• Longer time difference between snapshots more data got excluded
• For data mining use ALL snapshots available
Begin
Endt
AWR Things to keep in mind …
• Forget about AWR if there are literals in the code• Indicator is high parse count (hard) (10-50 per/sec)
• cursor_sharing = FORCE (use very carefully)
• In RAC configuration do not forget INST_ID column in joins
• Most of the V$ (DBA_HIST) performance views have incremental counters. END - BEGIN values
• You may get wrong results (sometimes negative)• Sometimes counters reach max value and get reset• Counters got reset at instance restart time
• Time between snapshots may be different• Suggestion (ENDv - BEGINv)/(ENDs - BEGINs)=value/sec
AWR Things to keep in mind …
AWR Things to keep in mind …
• Seconds count between 2 snapshotsselect s.BEGIN_INTERVAL_TIME, s.END_INTERVAL_TIME, s.END_INTERVAL_TIME-s.BEGIN_INTERVAL_TIME DTIME, -- Returns “Interval”
EXTRACT(HOUR FROM s.END_INTERVAL_TIME-s.BEGIN_INTERVAL_TIME) H, EXTRACT(MINUTE FROM s.END_INTERVAL_TIME-s.BEGIN_INTERVAL_TIME) M, EXTRACT(SECOND FROM s.END_INTERVAL_TIME-s.BEGIN_INTERVAL_TIME) S,
EXTRACT(HOUR FROM s.END_INTERVAL_TIME-s.BEGIN_INTERVAL_TIME)*60*60+ EXTRACT(MINUTE FROM s.END_INTERVAL_TIME-s.BEGIN_INTERVAL_TIME)*60+ EXTRACT(SECOND FROM s.END_INTERVAL_TIME-s.BEGIN_INTERVAL_TIME) SECS,
phy_get_secs(s.END_INTERVAL_TIME,s.BEGIN_INTERVAL_TIME), -– Write you own fun()
(cast(s.END_INTERVAL_TIME as date) - cast(s.BEGIN_INTERVAL_TIME as date)) *24*60*60from DBA_HIST_SNAPSHOT swhere 1=1 and s.INSTANCE_NUMBER = (select INSTANCE_NUMBER from V$INSTANCE) and s.DBID = (select DBID from V$DATABASE)order by s.BEGIN_INTERVAL_TIME;
AWR Things to keep in mind …
select SNAP_INTERVAL, RETENTION from
DBA_HIST_WR_CONTROL c, V$DATABASE dwhere
c.DBID = d.DBID;
SNAP_INTERVAL RETENTION------------------------------ ------------------------------+00000 01:00:00.0 +00007 00:00:00.0
select DBID, INSTANCE_NUMBER, count(*) C, min(BEGIN_INTERVAL_TIME) OLDEST, max(BEGIN_INTERVAL_TIME) YUNGESTfrom DBA_HIST_SNAPSHOTgroup by DBID, INSTANCE_NUMBER;
DBID INSTANCE_NUMBER C OLDEST YOUNGEST---------- --------------- ---------- ------------------------- -------------------------3244685755 1 179 13-AUG-14 07.00.30.233 PM 21-AUG-14 05.00.01.855 AM3244685755 2 179 13-AUG-14 07.00.30.309 PM 21-AUG-14 05.00.01.761 AM
Trends Analysis Example (1) …
select s.BEGIN_INTERVAL_TIME, s.END_INTERVAL_TIME, ( t.VALUE-
LAG (t.VALUE) OVER (ORDER BY s.BEGIN_INTERVAL_TIME) ) DVALUE,
(t.VALUE-LAG (t.VALUE) OVER (ORDER BY s.BEGIN_INTERVAL_TIME))/ phy_get_secs(s.END_INTERVAL_TIME, s.BEGIN_INTERVAL_TIME) VAL_SECfrom DBA_HIST_SNAPSHOT s, DBA_HIST_SYSSTAT twhere 1=1 and s.SNAP_ID = t.SNAP_ID and s.DBID = t.DBID and s.INSTANCE_NUMBER = t.INSTANCE_NUMBER and s.INSTANCE_NUMBER = (select INSTANCE_NUMBER from V$INSTANCE) and s.DBID = (select DBID from V$DATABASE) and t.STAT_NAME = 'parse count (hard)'order by s.BEGIN_INTERVAL_TIME;
DBA_HIST_SYSSTAT & DBA_HIST_SYSTEM_EVENT
Trends Analysis Example (1) …
select s.BEGIN_INTERVAL_TIME, s.END_INTERVAL_TIME, ( t.VALUE-
LAG (t.VALUE) OVER (ORDER BY s.END_INTERVAL_TIME) ) DVALUE,
(t.VALUE-LAG (t.VALUE) OVER (ORDER BY s.END_INTERVAL_TIME))/ phy_get_secs(s.END_INTERVAL_TIME-s.BEGIN_INTERVAL_TIME) VAL_SECfrom DBA_HIST_SNAPSHOT s, DBA_HIST_SYSSTAT twhere 1=1 and s.SNAP_ID = t.SNAP_ID and s.DBID = t.DBID and s.INSTANCE_NUMBER = t.INSTANCE_NUMBER and s.INSTANCE_NUMBER = (select INSTANCE_NUMBER from V$INSTANCE) and s.DBID = (select DBID from V$DATABASE) and t.STAT_NAME = 'parse count (hard)'order by s.END_INTERVAL_TIME;
DBA_HIST_SYSSTAT & DBA_HIST_SYSTEM_EVENTTrends Analysis Example (1) …
SQL Bad performance Example (2) …
• Called by a user to troubleshoot a badly performing SQL
• Sometimes the SQL hangs (never finishes) and needs to be killed and re-executed
• Upon re-execution, it always finishes successfully in a few minutes
• The client demanded a resolution ASAP …
select st.SQL_ID , st.PLAN_HASH_VALUE , sum(st.EXECUTIONS_ DELTA) EXECUTIONS, sum(st.ROWS_PROCESSED_ DELTA) CROWS, trunc(sum(st.CPU_TIME_ DELTA)/1000000/60) CPU_MINS, trunc(sum(st.ELAPSED_TIME_ DELTA)/1000000/60) ELA_MINSfrom DBA_HIST_SQLSTAT stwhere st.SQL_ID in ('5ppdcygtcw7p6','gpj32cqd0qy9a') group by st.SQL_ID , st.PLAN_HASH_VALUEorder by st.SQL_ID, CPU_MINS;
DBA_HIST_SQLSTATSQL Bad performance Example (2) …
SQL_ID PLAN_HASH_VALUE EXECUTIONS CROWS CPU_MINS ELA_MINS------------- --------------- ---------- ---------- ---------------- ----------------5ppdcygtcw7p6 436796090 20 82733 1 35ppdcygtcw7p6 863350916 71 478268 5 115ppdcygtcw7p6 2817686509 9 32278 2,557 2,765
gpj32cqd0qy9a 3094138997 30 58400 1 3gpj32cqd0qy9a 1700210966 36 69973 1 7gpj32cqd0qy9a 1168845432 2 441 482 554gpj32cqd0qy9a 2667660534 4 1489 1,501 1,642
DBA_HIST_SQLSTATSQL Bad performance Example (2) …
select st.SQL_ID , st.PLAN_HASH_VALUE , sum(st.EXECUTIONS_ DELTA) EXECUTIONS, sum(st.ROWS_PROCESSED_ DELTA) CROWS, trunc(sum(st.CPU_TIME_ DELTA)/1000000/60) CPU_MINS, trunc(sum(st.ELAPSED_TIME_ DELTA)/1000000/60) ELA_MINSfrom DBA_HIST_SQLSTAT stwhere st.SQL_ID in ('5ppdcygtcw7p6','gpj32cqd0qy9a') group by st.SQL_ID , st.PLAN_HASH_VALUEorder by st.SQL_ID, CPU_MINS;
DBA_HIST_SQLSTATSQL Bad performance Example (2) …
• In the result …
• Two different jobs were gathering statistics on a daily basis1. “ANALYZE …” part of other batch job (developer)2. “DBMS_STATS…” traditional (DBA)
• Sometimes “DBMS_STATS…“ did not complete before the batch job starts (+/- 10 minutes).
• After the job got killed (typically after 10 min since it started) the new “correct” statistics were in place.
• Takeaways …A. Don’t change your statistics that frequently (should be consistent)
B. AWR data helps to spot such issues easily
SQL Bad performance Example (2) …
SQL Plan flipping Example (3) …
• I asked myself: Well !
• If we find that the execution plan for one SQL has changed from a good (fast) to a bad one (slow), are there other SQLs affected by an issue alike?
• And if there are, how many are there?
• Would SQL Profiles (baselines, outlines) help address those?
SELECT st2.SQL_ID , st2.PLAN_HASH_VALUE , st_long.PLAN_HASH_VALUE l_PLAN_HASH_VALUE , st2.CPU_MINS , st_long.CPU_MINS l_CPU_MINS , st2.ELA_MINS , st_long.ELA_MINS l_ELA_MINS , st2.EXECUTIONS , st_long.EXECUTIONS l_EXECUTIONS , st2.CROWS , st_long.CROWS l_CROWS , st2.CPU_MINS_PER_ROW , st_long.CPU_MINS_PER_ROW l_CPU_MINS_PER_ROWFROM (SELECT st.SQL_ID , st.PLAN_HASH_VALUE , SUM(st.EXECUTIONS_DELTA) EXECUTIONS , SUM(st.ROWS_PROCESSED_DELTA) CROWS , TRUNC(SUM(st.CPU_TIME_DELTA) /1000000/60) CPU_MINS , DECODE( SUM(st.ROWS_PROCESSED_DELTA), 0 , 0 , (SUM(st.CPU_TIME_DELTA)/1000000/60)/SUM(st.ROWS_PROCESSED_DELTA) ) CPU_MINS_PER_ROW , TRUNC(SUM(st.ELAPSED_TIME_DELTA) /1000000/60) ELA_MINS FROM DBA_HIST_SQLSTAT st WHERE 1 =1 AND ( st.CPU_TIME_DELTA !=0 OR st.ROWS_PROCESSED_DELTA !=0) GROUP BY st.SQL_ID, st.PLAN_HASH_VALUE ) st2, (SELECT st.SQL_ID , st.PLAN_HASH_VALUE , SUM(st.EXECUTIONS_DELTA) EXECUTIONS , SUM(st.ROWS_PROCESSED_DELTA) CROWS , TRUNC(SUM(st.CPU_TIME_DELTA) /1000000/60) CPU_MINS , DECODE( SUM(st.ROWS_PROCESSED_DELTA), 0 , 0 , (SUM(st.CPU_TIME_DELTA)/1000000/60)/SUM(st.ROWS_PROCESSED_DELTA) ) CPU_MINS_PER_ROW , TRUNC(SUM(st.ELAPSED_TIME_DELTA) /1000000/60) ELA_MINS FROM DBA_HIST_SQLSTAT st WHERE 1 =1 AND ( st.CPU_TIME_DELTA !=0 OR st.ROWS_PROCESSED_DELTA !=0) HAVING TRUNC(SUM(st.CPU_TIME_DELTA)/1000000/60) > 10 GROUP BY st.SQL_ID, st.PLAN_HASH_VALUE ) st_longWHERE 1 =1AND st2.SQL_ID = st_long.SQL_IDAND st_long.CPU_MINS_PER_ROW/DECODE(st2.CPU_MINS_PER_ROW,0,1,st2.CPU_MINS_PER_ROW) > 2ORDER BY l_CPU_MINS DESC, st2.SQL_ID, st_long.CPU_MINS DESC, st2.PLAN_HASH_VALUE;
SQL Plan flipping Example (3) …
SELECT ...FROM (SELECT st.SQL_ID , st.PLAN_HASH_VALUE , ... DECODE( SUM(st.ROWS_PROCESSED_DELTA), 0 , 0 , (SUM(st.CPU_TIME_DELTA)/1000000/60)/SUM(st.ROWS_PROCESSED_DELTA) ) CPU_MINS_PER_ROW , ... FROM DBA_HIST_SQLSTAT st WHERE 1 =1 ... GROUP BY st.SQL_ID, st.PLAN_HASH_VALUE ) st2, (SELECT st.SQL_ID , st.PLAN_HASH_VALUE , ...HAVING trunc(sum(st.CPU_TIME_DELTA)/1000000/60) > 10GROUP BY st.SQL_ID, st.PLAN_HASH_VALUE ) st_longWHERE 1 =1AND st2.SQL_ID = st_long.SQL_IDAND st_long.CPU_MINS_PER_ROW/DECODE(st2.CPU_MINS_PER_ROW,0,1,st2.CPU_MINS_PER_ROW) > 2ORDER BY l_CPU_MINS DESC, st2.SQL_ID, st_long.CPU_MINS DESC, st2.PLAN_HASH_VALUE;
SQL Plan flipping Example (3) …
SQL_ID PLAN_HASH_VALUE L_PLAN_HASH_VALUE CPU_MINS L_CPU_MINS ELA_MINS L_ELA_MINS EXECUTIONS L_EXECUTIONS CROWS L_CROWS CPU_MINS_PE------------- --------------- ----------------- ---------- ---------- ---------- ---------- ---------- ------------ ---------- ---------- ---db8yz0rfhvufm 3387634876 619162475 17 2673 21 4074 3106638 193 2121380 131453 8.3979E-06 .02035ppdcygtcw7p6 436796090 2817686509 1 2557 3 2765 20 9 82733 32278 .000016381 .0795ppdcygtcw7p6 863350916 2817686509 5 2557 11 2765 71 9 478268 32278 .000011503 .0791tab7mjut8j9h 875484785 911605088 9 2112 23 2284 980 1436 808 606 .011678951 3.4851tab7mjut8j9h 2484900321 911605088 6 2112 6 2284 1912 1436 1516 606 .003998529 3.4851tab7mjut8j9h 3141038411 911605088 50 2112 57 2284 32117 1436 26048 606 .00195365 3.485gpj32cqd0qy9a 1700210966 2667660534 1 1501 7 1642 36 4 69973 1489 .000022548 1.00gpj32cqd0qy9a 3094138997 2667660534 1 1501 3 1642 30 4 58400 1489 .000025147 1.002tf4p2anpwpk2 825403357 1679851684 6 824 71 913 17 13 21558 253 .000314321 3.260csvwu3kqu43j4 3860135778 2851322291 0 784 0 874 1 2 1546 204 .000165255 3.8460q9hpmtk8c1hf 3860135778 2851322291 0 779 0 867 1 2 4075 479 .000068144 1.6272frwhbxvg1j69 3860135778 2851322291 0 776 0 865 1 2 1950 492 .000139004 1.5784nzsxm3d9rspt 3860135778 2851322291 0 754 0 846 1 2 1901 3445 .000074031 .21901pc2npdb1kbp6 9772089 2800812079 0 511 0 3000 7 695 383 14021 .000007121 .0364gpj32cqd0qy9a 1700210966 1168845432 1 482 7 554 36 2 69973 441 .000022548 1.093gpj32cqd0qy9a 3094138997 1168845432 1 482 3 554 30 2 58400 441 .000025147 1.093 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *4bcx6kbbrg6bv 3781789023 2248191382 0 11 0 41 2 2 34 34 .000797865 .32986wh3untj05apd 3457450300 3233890669 0 11 0 131 1 20 21 393 .003310346 .02856wh3untj05apd 3477405755 3233890669 0 11 1 131 2 20 17 393 .007030779 .02858pzsjt5p64xfu 3998876049 3667423051 0 11 5 44 3 18 12 72 .065142643 .1566bpfzx2hxf5x7f 1890295626 774548604 0 11 0 26 1 24 488580 11279752 .000000477 .0000g67nkxd2nqqqd 1308088852 4202046543 0 11 1 57 1 49 32 393 .011233152 .0282g67nkxd2nqqqd 1308088852 1991738870 0 11 1 39 1 38 32 414 .011233152 .0269g67nkxd2nqqqd 2154937993 1991738870 1 11 27 39 72 38 371 414 .003401926 .0269g67nkxd2nqqqd 2154937993 4202046543 1 11 27 57 72 49 371 393 .003401926 .0282
92 rows selected.
Elapsed: 00:00:02.53SQL>
SQL Plan flipping Example (3) …
• In the result …
• Load on the system was reduced by 5 times • Takeaways …
A. SQL Plans may flip from good plans to …B. SQL Outlines/Profiles may help some timesC. AWR provides good input for such analysis
• Why SQL Plans may flip?1. Bind variable peeking / adaptive cursor sharing2. Statistics change (including difference in partitions stats)3. Adding/Removing indexes4. Session/System init.ora parameters (nls_sort/optimizer_mode)5. Dynamic statistics gathering (sampling)
6. Profiles/Outlines/Baselines evolution
SQL Plan flipping Example (3) …
• AWR = DBA_HIST% views ( snapshots from V$% views )
• Sometimes it is the only source of information
• AWR contains much more information that default AWR reports and Grid Control could provide you
• Be careful mining data (there are some gotchas)
• Don’t be afraid to discover/mine the AWR data
I can show you the door … … but it is you who should walk through it
Conclusions …
Additional Resources
• www.oracle.com/scan• www.pythian.com/exadata• www.pythian.com/news/tag/exadata - Exadata
Blog• www.pythian.com/news_and_events/in_the_news
Article: “Making the Most of Oracle Exadata”
My Oracle Support notes 888828.1 and 757552.1
Thank you!
MissionLet you remember/consider AWR
next time you troubleshootPerformance issue!
Google careers