Database Monitoring With
description
Transcript of Database Monitoring With
Database Monitoring With
Tom Bascom
President, Greenfield Technologies
2
Agenda
Why do you need a monitor? Monitoring Alternatives What Are VSTs? A Monitoring Architecture Customizing And Extending The Code Basic Capabilities Advanced Features
3
Why Do You Need A Monitor?
Baselining Benchmarking Interactive troubleshooting Capacity management Resource Optimization
4
Agenda
Why do you need a monitor? Monitoring Alternatives What Are VSTs? A Monitoring Architecture Customizing And Extending The Code Basic Capabilities Advanced Features
5
Monitoring Alternatives
SAR, vmstat, iostat Glance, TOPAS, Navisphere, Measureware,
PerfMon … TOP, NMON
PROMON Fathom ProMonitor ProTop!
6
Progress Focused Interactive, Real-Time Sample Oriented Multi-platform VST Based 4GL Code Open Source Free!
7
Agenda
Why do you need a monitor? Monitoring Alternatives What Are VSTs? A Monitoring Architecture Customizing And Extending The Code Basic Capabilities Advanced Features
8
What Are VSTs?
Virtual System Tables A 4GL View of Progress Data Structures (the
same as those shown in PROMON.) No Performance Impact (unless you do some
really dumb things!) Primarily Read-Only Not Terribly “User Friendly” Quirky at times…
9
Some VST Quirks Updateable:
_startup._spin Private buffers APW settings
Table & Index Ranges: -tablebase, -tablerangesize -indexbase, -indexrangesize Table & Index Window can be reset!
Quirky Keys: _myconnection… _tablestat & _indexstat
10
User Number/Id VST Confusion…
find _myconnection no-lock.
find _connect no-lock where _connect-usr = _myconn-userid.
display _connect-usr _connect-id _myconn-userid.
find _userio no-lock where _userio-usr = _connect-usr.
display _userio-id _userio-usr.
User-Id _Connect-Id MyConn-UserId _UserIO-Id Usr
======= =========== ============= =========== ===========
253 254 253 254 253
11
Table Stats/** This does NOT work if –tablebase <> 1!!!
find _File no-lock where _File._File-num = p_tbl.
find _TableStat no-lock where _TableStat-id = p_tbl.
display p_tbl _file-num _TableStat-id.
**/
/*** instead, use the following: ***/
find _TableStat no-lock where _TableStat-id = p_tbl.
find _File no-lock where _File._File-num = _TableStat-id.
display p_tbl _file-num _TableStat-id.
12
Index Name
find _IndexStat no-lock where _IndexStat-id = p_idx.
find _Index no-lock where _Index._Idx-num = _IndexStat-id.
find _File where recid( _File ) = _Index._File-recid.
tt_index.idxnote = _File._File-name + “.” +
_Index._Index-name +
( if _file._prime-index = recid(_index)
then “ P" else “ " ) +
( if _index._unique then "U" else "" )
13
Agenda
Why do you need a monitor? Monitoring Alternatives What Are VSTs? A Monitoring Architecture Customizing And Extending The Code Basic Capabilities Advanced Features
14
A Monitoring Architecture
VST Based Multi-Platform
UNIX Character HTML Windows GUI
Using Publish & Subscribe More than just a VST Browser! Customizable!
15
A Monitoring Architecture
MonitoredDatabase
/* ProTop.p */
Define….
Initialize…
Loop: Publish... Wait-For…End.
Support Library
Loadable Module
Loadable Module
Loadable Module
Output Module
Output Module
Temp Tables
LoggingDatabase(optional)
16
Agenda
Why do you need a monitor? Monitoring Alternatives What Are VSTs? A Monitoring Architecture Customizing And Extending The Code Basic Capabilities Advanced Features
17
Customizing And Extending The Code
Events That A Module Handles Structure Of A Module Defining the Display Maintaining State Adding Help Making A Module Available
18
Events That A Module Handles
Mon-Restart Empty Temp-Table Remove self from memory
Mon-Init Empty Temp-Table Define Display Data Elements
Mon-Update Refresh Data Calculate intervals, rates and so forth Update UI Temp-Table with results
19
Structure Of A Module{lib/protop.i}def var support as character no-undo initial “Resources”.{lib/tt_xstat.i}
procedure mon-restart: empty temp-table tt_xstat. delete procedure this-procedure.end.
procedure mon-init: empty temp-table tt_xstat. /* define labels */end.
procedure mon-update: /* the real work */end.
subscribe to “mon-restart” anywhere run-procedure “mon-restart”.subscribe to “mon-init” anywhere run-procedure “mon-init”.subscribe to “mon-update” anywhere run-procedure “mon-update”.publish “register-disp-type” ( input support ).
20
Defining the Display
ui-define-label( support, 1, 1, "xid", " Id" ).
ui-define-label( support, 1, 2, "xname", "Resource " ).
ui-define-label( support, 1, 5, "stat1", " Locks" ).
ui-define-label( support, 1, 6, "stat2", " Waits" ).
ui-define-label( support, 1, 8, "stat-ratio", " Lock%" ).
ui-define-label(
support, /* display type */
1, /* variant */
8, /* order */
"stat-ratio",/* data element name */
" Lock%“ /* label value */
).
21
Maintaining State
define temp-table tt_xstat no-undo field xid as integer field xvalid as logical field xname as character field misc1 as character field misc2 as character field stat1 as integer extent 5 field stat2 as integer extent 5 field stat3 as integer extent 5 field stat-ratio as decimal index xid-idx is unique primary xid.
22
Sample, Summary, Rate & Raw Data•BaseValue
•LastValue
•ThisValue
•SampleTime
•SummaryTime
SampleRate = (ThisValue – LastValue) / SampleTime.
SummaryRate = (ThisValue – BaseValue) / SummaryTime.
SampleRaw = (ThisValue – LastValue) / 1.
SummaryRaw = (ThisValue – BaseValue) / 1.
23
Updating Datafor each dictdb._Resrc no-lock: run update_xstat ( input _Resrc-Id, input _Resrc-name, input "", input "", input _Resrc-lock, input _Resrc-wait, input 0 ).end.
ui-det(support, 1, i, 1, "xid", string(tt_xstat.xid, ">>9")).ui-det(support, 1, i, 2, "xname", string(tt_xstat.xname, "x(20)")).ui-det(support, 1, i, 5, "stat1", string((tt_xstat.stat1[x]/z), ">>>>>>>>>9")).ui-det(support, 1, i, 6, "stat2", string((tt_xstat.stat2[x]/z), ">>>>>>>>>9")).ui-det(support, 1, i, 8, "stat-ratio", string(tt_xstat.stat-ratio, ">>9.99%")).
24
Adding Help Help files are in the “hlp” directory. File name is value(“hlp/” + support + “.hlp”) Title the screen. Provide an overview of the screen. Try to explain
why the metrics are important and how they are related to other metrics.
Define each label and give some insight into its meaning.
Provide explanations of any codes that might appear under a label.
25
FileIO.hlpIO Operations to Database Extents
Id: The extent id number.
Extent Name: The file name of the extent.
Mode: The "mode" in which the file is opened. Possible values are:
BUFIO The extent is opened for buffered IO.
UNBUFIO The extent is opened for un-buffered IO.
BOTHIO The extent is opened for both buffered and un-buffered IO. Variable extents are opened with BOTHIO (there are two file descriptors unless you're using -directio.)
BlkSz: The Block size for the extent. This potentially varies between data, before-image and after-image extents. Values are expressed in bytes.
26
Making A Module Available Drop it into the “mon/” directory.
“mon/mymetric.p”
If it is OS specific use the “os/” directory “os/AIX/df.p” “os/Linux/netstat.p”
Send me a copy so that I can include it in the base distribution!
27
Agenda
Why do you need a monitor? Monitoring Alternatives What Are VSTs? A Monitoring Architecture Customizing And Extending The Code Basic Capabilities Advanced Features
28
29
Basic Capabilities
Summary Data Blocked Clients & Open Transactions Table & Index Activity User Activity Estimating Big B Latches & Resources Storage Area Capacity Balancing IO Clients & Servers
30
Summary Data
11:32:52 ProTop xvi -- Progress Database Monitor 07/05/05
Sample sports [/db/sports] Rate
Hit Ratio: 182:1 195:1 Commits: 149 195 Sessions: 2057
Miss% : 0.549% 0.512% Latch Waits: 13 16 Local: 953
Hit% : 99.45% 99.48% Tot/Mod Bufs: 60002 3167 Remote: 956
Log Reads: 76342 80927 Evict Bufs: 2 1 Batch: 1045
OS Reads: 419 414 Lock Table: 1516 3 Server: 97
Rec Reads: 23789 23619 LkHWM|OldTrx: 1392 00:00 Other: 51
Log/Rec: 3.2091 3.4264 Old/Curr BI: 54 54 TRX: 26
Area Full: 1 98.60% After Image: Disabled Blocked: 0
31
BI Clustersfor each _Trans no-lock where _Trans-usrnum <> ?:
if _Trans-counter <> ? and _Trans-counter > 0 then
do:
if oldbi = 0 or _Trans-counter < oldbi then
oldbi = _Trans-counter.
currbi = max( currbi, _Trans-counter ).
end.
end.
find _BuffStatus no-lock.
currbi = _BfStatus-LastCkpNum.
if oldbi = 0 then oldbi = currbi. /* if no TRX is active… */
32
Blocked Sessions
Blocked Sessions
Usr Name Waiting Note
--- -------- -------- ----------------------------------
24 tom 00:00:32 REC XQH 102 [Order] julia, peter
22 tucker 00:00:02 REC XQH 201 [Cust] astro, tiger
321 julia 00:00:00 BKSH:83524928:
33
Locked Recordsfor each _Lock no-lock while _Lock-usr <> ?:
if _Lock-recid = _Connect-wait1 then
do:
find _file where _file._file-num = _Lock-table.
bxtbl = _file._file-name.
end.
if _Lock-usr = _Connect-usr then
bxwait = bxwait + “ “ + _Lock-flags.
else
bxque = bxque + " " + _Lock-name.
end.
bxnote = bxtbl + bxwait + bxque.
34
Open Transactions
Open Transactions
Usr Name TRX Num BI Clstr Start Trx Stat Duration Wait
---- ----- -------- -------- -------- -------- -------- ----------
9 tom 2432897 1024 15:39:05 ACTIVE 00:00:01 -- 29440
20 jami 2432896 - ALLOCATE 00:00:00 -- 20115
5 emily 2432898 1024 15:39:06 ACTIVE 00:00:00 -- 21952
7 peter 2432899 1024 15:39:06 ACTIVE 00:00:00 -- 19040
23 julia 2418661 - ALLOCATE 00:00:00 -- 0
22 astro 2417938 - ALLOCATE 00:00:00 -- 0
35
Table Activity
Table Statistics
Tbl# Table Name Create Read Update Delete
---- ---------------- --------- --------- --------- ---------
4 OrderLine 1 28715 11 1
18 Order 0 2384 1 0
24 POLine 0 848 1 0
23 PurchaseOrder 0 627 40 0
21 Bin 0 216 0 0
2 Customer 18 175 20 20
1 Invoice 1 148 3 0
36
Index Activity
Index Statistics
Idx# Index Name Create Read Split Delete BlkDel
---- --------------- -- ------ ------ ------ ------ ------
904 usage 14 31597 0 13 0
78 journal P 0 21011 0 0 0
435 keyindex 0 7376 0 0 0
388 icest PU 0 1995 0 0 0
1251 keyindex 0 1991 0 0 0
1247 warehs U 0 945 0 0 0
900 stuff PU 1 783 0 1 0
37
User IO Activity
UIO
Usr Name Flags PID DB Access OS Rd OS Wr Hit%
---- ------- ----- ------ --------- ----- ----- -------
13 tom SB 13590 2266 200 1 91.13%
10 jami SB 13584 190 6 1 97.10%
16 julia SB 13596 185 6 1 97.03%
17 peter SB 13598 181 5 1 97.07%
15 emily SB 13594 177 5 1 97.12%
11 tiger SB* 13586 166 4 0 97.58%
14 tucker SB 13592 159 5 1 97.10%
19 granite SB 13602 146 1 0 99.25%
7 astro SB 13578 145 4 1 97.16%
38
Estimating Big B
Big B GuessTimator
Pct Big B % db Size Hit:1 Miss% Hit% OS Rd
----- --------- --------- ----- ------ ------- -----
10% 6000 0.124% 30 3.306% 96.694% 1343
25% 15001 0.311% 48 2.091% 97.909% 849
50% 30001 0.622% 68 1.479% 98.521% 601
100% 60002 1.243% 96 1.046% 98.954% 425 <=
150% 90003 1.865% 117 0.854% 99.146% 347
200% 120004 2.486% 135 0.739% 99.261% 300
400% 240008 4.973% 191 0.523% 99.477% 213
39
Big B
http://www.peg.com/lists/dba/history/200301/msg00509.html
MissPct = 100 * ( 1 – ( LogRd – OSRd ) / LogRd )).
HitPct = 100 – MissPct.
OSRd = LogRd * ( MissPct / 100 ).
m2 = m1 * exp(( b1 / b2 ), 0.5 ).
40
Resource Waits
Resource Waits
Id Resource Locks Waits Lock%
--- -------------------- ---------- ---------- -------
10 DB Buf S Lock 2661 0 100.00%
6 Record Get 658 0 100.00%
7 DB Buf Read 40 0 100.00%
2 Record Lock 21 0 100.00%
11 DB Buf X Lock 11 0 100.00%
19 TXE Share Lock 11 0 100.00%
8 DB Buf Write 3 0 100.00%
21 TXE Commit Lock 2 0 100.00%
1 Shared Memory 0 0 0.00%
3 Schema Lock 0 0 0.00%
41
Latch Waits
Latch Waits
Id Latch Requests Waits Lock%
--- -------------------- ---------- ---------- -------
28 MTL_BF4 5540 33 99.40%
17 MTL_BHT 4205 106 97.49%
21 MTL_LRU 4154 55 98.68%
10 MTL_LHT 1800 24 98.65%
15 MTL_LKF 1798 0 100.00%
26 MTL_BF2 1218 6 99.48%
27 MTL_BF3 1184 10 99.13%
25 MTL_BF1 1150 10 99.16%
4 MTL_OM 913 4 99.60%
42
Storage Area Capacity
Area StatisticsA# Area Name Alloc Var Hi Water Free %Used Note-- ------------ ------- ----- -------- ------ ------- -------68 order_idx 16 1998 1927 87 12044% i(3)67 order 256 14670 14860 66 5805% t(1) 6 Schema Area 256 1454 1391 319 543% i(25) * 3 BI Area 32000 13070 45056 14 141%13 customer 512000 55565 567515 50 111% t(15)92 After Image 0 5199 5191 8 100% Busy49 order-line 32000 2 25164 6838 79% t(1)61 inventory 128000 2 94897 33105 74% t(1)55 discount 1024000 0 755885 268114 74% t(1)57 employee 2048000 0 1442919 605076 70% t(1)
43
Storage Area Capacity
for each _AreaStatus no-lock, _Area no-lock where
_Area._Area-num = _AreaStatus._AreaStatus-Areanum:
bfree = _AreaStatus-Totblocks - _AreaStatus-Hiwater.
if ( _AreaStatus-Freenum <> ? ) then
bfree = bfree + _AreaStatus-Freenum.
if bfree = ? then bfree = _AreaStatus-totblocks.
used = (( _AreaStatus-totblocks - bfree) /
_AreaStatus-totblocks ) * 100.
end.
44
Storage Area Contentsfor each _storageobject no-lock where
_storageobject._area-number = xid and
_storageobject._object-num > 0 and
_storageobject._object-associate > 0:
if _storageobject._object-type = 1 then
so_tbl = so_tbl + 1.
else if _storageobject._object-type = 2 then
so_idx = so_idx + 1.
end.
/* ianum = initial area number… */
45
Balancing IODatabase File IO Id Ext Name Mode Blksz Size Read Wrt Ext---- ---------- ------- ----- ------- ----- --- --- 63 s2k_29.d1 F UNBUF 8192 2048000 11828 0 0 64 s2k_29.d2 F UNBUF 8192 2048000 7790 0 0 124 s2k_55.d2 F UNBUF 8192 2048000 432 0 0 125 s2k_55.d3 F UNBUF 8192 2048000 367 8 0 123 s2k_55.d1 F UNBUF 8192 2048000 220 0 0 67 s2k_30.d1 F UNBUF 8192 2048000 106 0 0 57 s2k_26.d1 F UNBUF 8192 1024000 26 2 0 128 s2k_56.d1 F UNBUF 8192 2048000 19 1 0 135 s2k_57.d6 F UNBUF 8192 2048000 12 0 0 140 s2k_58.d2 F UNBUF 8192 1024000 11 1 0 121 s2k_54.d1 F UNBUF 8192 256000 7 0 0 139 s2k_58.d1 F UNBUF 8192 1024000 6 0 0 134 s2k_57.d5 F UNBUF 8192 2048000 5 0 0 69 s2k_31.d1 F UNBUF 8192 128000 4 0 0 73 s2k_33.d1 F UNBUF 8192 128000 3 0 0 3 s2k.b2 V UNBUF 16384 0 0 0 0
46
Servers and ClientsServers
Srv Type Port Con Max MRecv MSent RRecv RSent QSent Slice
--- ----- ----- --- --- ----- ----- ----- ----- ----- -----
1 Login 7150 0 1 0 0 0 0 0 0
2 Auto 1026 10 55 0 0 0 0 0 0
3 Auto 1027 10 55 23 13 0 6 10 86
Server IO
Srv Type Port Con Max DB Access OS Rd OS Wr Hit%
--- ----- ---- --- --- --------- ----- ------ -------
19 Auto 1043 10 55 5041 2 0 99.96%
20 Auto 1044 10 55 1348 1 0 99.96%
18 Auto 1042 10 55 157 1 0 99.51%
16 Auto 1040 10 55 42 1 0 98.70%
47
Agenda
Why do you need a monitor? Monitoring Alternatives What Are VSTs? A Monitoring Architecture Customizing And Extending The Code Basic Capabilities Advanced Features
48
Drill Down
User Details
Usr#: 23 Name: tom PID: 18570 Device: /dev/pts/3
Transaction: Jul 7 15:20:36 2005 ACTIVE 00:00:45 REC 5892
Blocked On: REC XQH 5892 [Customer] peter
User 23's Other Sessions
Usr Name Flags PID DB Access OS Rd OS Wr Hit%
---- ------- ----- ------ ---------- ------- ------ -------
23 tom S * 18570 9 2 0 81.61%
0 tom O 18017 0 0 0 0.00%
22 tom S 18542 8534 134 15 98.43%
24 tom S 18576 3964 64 31 98.38%
49
ProTop Alerts
50
Alerts & Alarms
# $PROTOP/etc/alert.cfg
#
# Metric Type ? Target Message Action
# ========= ==== == ====== =========== ====================
LogRd num > 100000 "&1 &2 &3" alert-log
OSRd num > 500 "&1 &2 &3" alert-log
BufFlsh num > 0 "&1 &2 &3" alert-log,alert-mail
Trx num > 200 "&1 &2 &3" alert-log,alert-mail
LatchTMO num > 200 "&1 &2 &3" alert-log,alert-mail
ResrcWt num > 200 "&1 &2 &3" alert-log,alert-mail
51
Summary
Reasons to monitor. Some tools that are available for monitoring. How Progress VSTs work. An architecture for monitoring. How to modify and extend ProTop. What ProTop can do for you “out of the box”. What is “under the covers” of ProTop. How to use VSTs more effectively.
52
Questions