Deep review of LMS process

25
©OraInternals Riyaj Shamsudeen RAC Hack: Deep review of LMS/LGWR process By Riyaj Shamsudeen

description

Review of LMS process in Oracle RAC database

Transcript of Deep review of LMS process

Page 1: Deep review of LMS process

©OraInternals Riyaj Shamsudeen

RAC Hack: Deep review of LMS/LGWR process

By Riyaj Shamsudeen

Page 2: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 2

LMS Processing (over simplified)

Rx Msg

CR / CUR block build

Msg to LGWR (if needed)

Wakeup Log buffer processing

Log file write Signal LMS

Wake up

Send Block

OS,Network stack

OS,Network stack

Copy to SGA

User session processing

Send GC Message

OS,Network stack

User LMSx LGWR

Node 1 Node 2

Page 3: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 3

GC CR latency

  GC CR latency ~=

Time spent in sending message to LMS +

LMS processing (building blocks etc) + LGWR latency ( if any) +

LMS send time +

Wire latency

Averages can be misleading. Always review both total time and average to understand the issue.

Processing in the remote nodes

Page 4: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 4

LMS process – A deep dive

  LMS process uses pollsys system call and listens for incoming packets, with a 10ms timeout.

  Sockets are file descriptors in UNIX.

truss -d -E -v all -p 1485 |more

1.8531 0.0000 pollsys(0xFFFFFD7FFFDFBA70, 7, 0xFFFFFD7FFFDFBA20, 0x00000000) = 0

fd=36 ev=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND rev=0

fd=29 ev=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND rev=0

fd=33 ev=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND rev=0

fd=41 ev=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND rev=0

fd=42 ev=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND rev=0

fd=39 ev=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND rev=0

fd=40 ev=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND rev=0

timeout: 0.010000000 sec

1.8635 0.0000 pollsys(0xFFFFFD7FFFDFBA70, 7, 0xFFFFFD7FFFDFBA20, 0x00000000) = 0

Timeout 10ms

Page 5: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 5

LMS sockets

  Pfiles shows that these file descriptors are sockets, essentially, LMS process is sending and receiving messages in the ports. Pfiles 1845

36: S_IFSOCK mode:0666 dev:298,0 ino:10517 uid:0 gid:0 size:0

O_RDWR|O_NONBLOCK FD_CLOEXEC

SOCK_DGRAM

SO_SNDBUF(57344),SO_RCVBUF(57344),IP_NEXTHOP(0.224.0.0)

sockname: AF_INET 127.0.0.1 port: 33320

29: S_IFSOCK mode:0666 dev:298,0 ino:41400 uid:0 gid:0 size:0

O_RDWR|O_NONBLOCK FD_CLOEXEC

SOCK_DGRAM

SO_SNDBUF(262144),SO_RCVBUF(131072),IP_NEXTHOP(0.0.2.0)

sockname: AF_INET 169.254.106.96 port: 33318

33: S_IFSOCK mode:0666 dev:298,0 ino:10518 uid:0 gid:0 size:0

O_RDWR|O_NONBLOCK FD_CLOEXEC

SOCK_DGRAM

SO_SNDBUF(262144),SO_RCVBUF(131072),IP_NEXTHOP(0.0.2.0)

sockname: AF_INET 169.254.201.54 port: 33319

Demo: demo_lms_truss.ksh demo_lms_pfiles.ksh

Page 6: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 6

LMS CPU usage

  Just because LMS process runs in RT mode, does not mean that LMS process is consuming all that CPU.

#./trace_syscall_preempt_size.sh 1485

0 => pollsys timestamp : 45075622139230

0 | swtch:pswitch oracle sysinfo: timestamp : 45075622155047

0 | swtch:pswitch Vol context switch : 45075622155697 pswitch genunix`cv_timedwait_sig_hires+0x2ab

0 | resume:off-cpu On cpu 0 for: 92460

0 | resume:on-cpu Off cpu for: 10242512

0 <= pollsys timestamp : 45075632406018 elapsed : 10266788

Demo: as root trace_syscall_preempt_size.sh

Pollsys call voluntarily releases CPU until a new packet is arrived to a port or a timeout.

Uses very little of CPU if there is no work to be done. Without any work, just 92 Micro seconds of CPU used in a 10,242 micro seconds window.

Page 7: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 7

LMS – early wakeup

  Kernel will schedule LMS process if there is a network packet arriving to that port.

#./trace_syscall_preempt_size.sh 1485

0 => pollsys timestamp : 45075592390763

0 | swtch:pswitch oracle sysinfo: timestamp : 45075592402281

0 | swtch:pswitch Vol context switch : 45075592402933 pswitch genunix`cv_timedwait_sig_hires+0x2ab

0 | resume:off-cpu On cpu 0 for: 55099

0 | resume:on-cpu Off cpu for: 29660449

0 <= pollsys timestamp : 45075622072662 elapsed : 29681899

Demo: as root trace_syscall_preempt_size.sh

LMS process was woken up in 29 Micro seconds.

Page 8: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 8

LMS count

  Even in busy environments, I have seen LMS to be busy only 50% of the time.

  To schedule a process, CPU scheduler loads the CPU registers, instruction pipeline etc, a costly process.

  If you have many LMS processes, then the workload will be distributed among them. Due to RT priority, they will be moving in and out of CPU.

  In a multi-processor environment, this becomes more complicated.

  Version 11.2 uses much more meaningful values for LMS count.

Page 9: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 9

LMS – prstat

  In Solaris, another way to check the efficiency of LMS process is through prstat micro accounting.

# prstat -mL -p 18243

PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID

18243 prod 6.4 5.9 0.0 0.0 0.0 0.0 88 0.0 2K 0 30K 187 oracle/1

Demo: prstat command

In this case,

Breakdown of LMS CPU usage is:

6.4% USR mode

5.9% SYS mode

0% CPU latency

88% Sleep

  It breaks down the process in to micro accounting percentages.

2K voluntary context switches

0 involuntary context swiches

Page 10: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 10

LMS – session

  LMS session level statistics can be used to measure the workload distribution. Of course, this is from the start of instances.

@lms_workload_perc

INST_ID PGM NAME VAL PROC_TO_INST PROC_TO_TOT INST_TO_TOT

---------- ------- ------------------------------ ---------- ------------ ----------- -----------

1 (LMS0) gc cr blocks served 62960382 15 3 25

1 (LMS1) gc cr blocks served 58701920 13 3 25

1 (LMS2) gc cr blocks served 57757849 13 3 25

...

2 (LMS5) gc cr blocks served 44476702 14 2 18

2 (LMS6) gc cr blocks served 42312824 13 2 18

...

3 (LMS0) gc cr blocks served 38465965 14 2 15

3 (LMS1) gc cr blocks served 37541589 14 2 15

...

4 (LMS6) gc cr blocks served 95517242 14 5 40

4 (LMS5) gc cr blocks served 94879180 13 5 40

...

Demo: lms_workload_distr.sql – to measure workload from the instance start

LMS process in Node 4 is busy serving CR blocks. Why?

Page 11: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 11

LMS - workload

  In few cases, it is prudent to measure the current rate, instead of relying upon prior rate.

@gc_lms_workload_distr_diff.sql

Enter value for search_string: gc cr blocks served

Enter value for sleep: 60

---------|--------------|----------------|----------|---------------|---------------|-------------|

Inst | Pgm | value | totvalue | instvalue | proc2inst |inst2total |

---------|--------------|----------------|----------|---------------|---------------|-------------|

1 | (LMS0)| 348| 16931| 2993| 11| 2|

1 | (LMS1)| 335| 16931| 2993| 11| 1|

...

2 | (LMS0)| 359| 16931| 2660| 13| 2|

2 | (LMS1)| 375| 16931| 2660| 14| 2|

...

3 | (LMS0)| 132| 16931| 1231| 10| 0|

3 | (LMS1)| 194| 16931| 1231| 15| 1|

...

4 | (LMS0)| 1164| 16931| 10047| 11| 6|

4 | (LMS1)| 1784| 16931| 10047| 17| 10|

---------|--------------|----------------|----------|---------------|---------------|-------------|

Demo: gc_lms_workload_distr_diff.sql

LMS processes in Node 4 is busy here.

Page 12: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 12

LMS – applying undo

  LMS process applies undo blocks to construct the CR buffer to send.

Demo: get_sesstat_sid.sql

474K undo records were applied to create CR blocks.

  Following session level stats can be reviewed to determine the performance counter for undo blocks applied.

@get_sesstat_sid

Enter the wildcard character (Null=All):undo

Enter value threshold :1

Enter sid :11000

NAME VALUE

---------------------------------------------------------------- ----------

transaction tables consistent reads - undo records applied 13180

data blocks consistent reads - undo records applied 474036

Page 13: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 13

LMS – snapper.sql

  Tanel’s ultra-cool snapper is useful to find the rate of few statistics on LMS session.

Demo: snapper.sql

@session_snapper out,gather=stw 15 4 11000

-- Session Snapper v2.01 by Tanel Poder ( http://www.tanelpoder.com )

----------------------------------------------------------------------------------------------------------------------

SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH

----------------------------------------------------------------------------------------------------------------------

11000, (LMS0) , STAT, cleanouts and rollbacks - consistent rea, 633, 42.2,

11000, (LMS0) , STAT, immediate (CR) block cleanout applicatio, 689, 45.93,

11000, (LMS0) , STAT, commit txn count during cleanout , 56, 3.73,

11000, (LMS0) , STAT, active txn count during cleanout , 633, 42.2,

11000, (LMS0) , STAT, cleanout - number of ktugct calls , 690, 46,

11000, (LMS0) , TIME, background cpu time , 696907, 46.46ms, 4.6%, |@ |

11000, (LMS0) , TIME, background elapsed time , 696907, 46.46ms, 4.6%, |@ |

11000, (LMS0) , WAIT, gcs remote message , 13987778, 932.52ms, 93.3%, |@@@@@@@@@@|

11000, (LMS0) , WAIT, events in waitclass Other , 150638, 10.04ms, 1.0%, |@ |

-- End of snap 2, end=2011-06-28 22:28:01, seconds=15

Page 14: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 14

GC TX/RX %

  You can find the percentage for TX and RX packets using the gc_trafic_print.sql script too. These percentages are at instance level.

Demo: gc_traffic_print.sql

@gc_traffic_print.sql

---------|--------------|---------|----------------|---------|---------------|---------|-------------|---------|

Inst | CR blocks Rx | CR Rx% | CUR blocks Rx | CUR RX %| CR blocks Tx | CR TX % | CUR blks TX | CUR TX% |

---------|--------------|---------|----------------|---------|---------------|---------|-------------|---------|

1 | 283| 3.6| 950| 27.12| 214| 3.33| 665| 16.8|

2 | 7185| 91.47| 1327| 37.89| 256| 3.98| 1117| 28.22|

3 | 119| 1.51| 886| 25.29| 5798| 90.22| 1617| 40.86|

4 | 268| 3.41| 339| 9.68| 158| 2.45| 558| 14.1|

In that sampling interval, node 3 was transmitting 90% of the CR blocks and node 2 was receiving those blocks. This insight is useful to measure the workload distribution, with a larger sampling interval.

Page 15: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 15

gcs log flush sync

  Before sending a reconstructed CR block or CUR block, LMS will verify that corresponding redo vectors are flushed to disk.

  If the redo vector are not flushed, LMS need to wait for ‘gcs log flush sync’ event after requesting LGWR for a log flush, analogous to ‘log file sync’ event.

  This is not an idle event, even though some old documentation suggest that.

Page 16: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 16

Gcs log flush sync - ASH

  ASH shows that LMS waits for ‘gcs log flush sync’ event.

  In this database, there is no issue and so, waits for ‘gcs log flush sync’ is not high.

select event, count(*) from v$active_session_history

where session_id in (select sid from v$session where program like '%LMS0%')

and sample_time > sysdate -(4/24)

group by event

order by 2 desc;

EVENT COUNT(*)

---------------------------------------- ----------

767

gcs log flush sync 265

latch: KCL gc element parent latch 4

Page 17: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 17

Gcs log flush sync – v$session_event

  But, v$session_event for the LMS process shows that there are no waits for gcs log flush sync!

Event ‘gcs log flush sync’ is combined with other events in the wait_class “other”.

select event, trunc(time_waited_micro/1000) wait_milli, total_waits

from v$session_event where sid in (select sid from v$session where program like '%LMS0%')

order by 2 desc;

EVENT WAIT_MILLI TOTAL_WAITS

---------------------------------------- ---------- -----------

gcs remote message 218407919 83934373

events in waitclass Other 3970180 5237156

buffer busy waits 356 2897

latch: cache buffers chains 316 5197

latch: row cache objects 0 2

latch: shared pool 0 3

Page 18: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 18

Gcs log file sync - histogram

  To review the impact of ‘gcs log file sync’ waits, you should review v$event_histogram.

  73% of the waits complete under 1ms. This is probably not an issue.

@event_histogram.sql

Enter value for event_name: gcs log flush sync

INST_ID EVENT WAIT_TIME_MILLI WAIT_COUNT PER

---------- ---------------------------------------------------------------- --------------- ---------- ----------

1 gcs log flush sync 1 24490064 73.42

1 gcs log flush sync 2 6250630 18.74

1 gcs log flush sync 4 1848333 5.54

1 gcs log flush sync 8 597646 1.79

1 gcs log flush sync 16 142603 .42

1 gcs log flush sync 32 25006 .07

1 gcs log flush sync 64 66 0

Page 19: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 19

Gcs log file sync – Not so good

  Following histogram shows an example when there is a performance issue with LFS.

  If you have LFS waits and GC waits, then you should consider tuning log file sync before tuning GC events.

@event_histogram.sql

Enter value for event_name: gcs log flush sync

INST_ID EVENT WAIT_TIME_MILLI WAIT_COUNT PER

---------- ---------------------------------------------------------------- --------------- ---------- ----------

1 gcs log flush sync 1 28 .07

1 gcs log flush sync 2 24 .06

1 gcs log flush sync 4 31 .08

1 gcs log flush sync 8 33 .08

1 gcs log flush sync 16 35757 95.96

1 gcs log flush sync 32 1378 3.69

1 gcs log flush sync 64 6 .01

1 gcs log flush sync 128 2 0

Page 20: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 20

Gcs log file sync – LGWR interaction

  If LGWR is suffering from performance issues, then LMS process can be seen waiting on ‘gcs log flush’ wait event in a tight 10ms loop.

  If you have LFS waits and GC waits, then you should consider tuning log file sync before tuning GC events.

LMS trace file:

...

WAIT #0: nam='gcs log flush sync' ela= 10281 waittime=3 poll=0 event=136 obj#=-1 tim=1381909996

WAIT #0: nam='gcs log flush sync' ela= 10274 waittime=3 poll=0 event=136 obj#=-1 tim=1381920366

WAIT #0: nam='gcs log flush sync' ela= 10291 waittime=3 poll=0 event=136 obj#=-1 tim=1381930735

WAIT #0: nam='gcs log flush sync' ela= 10321 waittime=3 poll=0 event=136 obj#=-1 tim=1381941178

...

Page 21: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 21

GCS log flush sync - Example

Top 5 Timed Foreground Events

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Avg

wait % DB

Event Waits Time(s) (ms) time Wait Class

------------------------------ ------------ ----------- ------ ------ ----------

log file sync 2,054 23,720 11548 45.8 Commit

gc buffer busy acquire 19,505 10,382 532 20.0 Cluster

gc cr block busy 5,407 4,655 861 9.0 Cluster

enq: SQ - contention 140 3,432 24514 6.6 Configurat

db file sequential read 38,062 1,305 34 2.5 User I/O

Host CPU (CPUs: 24 Cores: 24 Sockets: 24)

~~~~~~~~ Load Average

Begin End %User %System %WIO %Idle

--------- --------- --------- --------- --------- ---------

1.18 1.16 2.7 2.6 0.0 94.7

Excessive waits for log file sync for the foreground processes.

Page 22: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 22

GCS log flush sync – GC waits

Global Cache and Enqueue Services - Workload Characteristics

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Avg global enqueue get time (ms): 7.4

Avg global cache cr block receive time (ms): 222.0

Avg global cache current block receive time (ms): 27.5

Avg global cache cr block build time (ms): 0.0

Avg global cache cr block send time (ms): 0.1

Global cache log flushes for cr blocks served %: 2.7

Avg global cache cr block flush time (ms): 15879.9

Avg global cache current block pin time (ms): 0.0

Avg global cache current block send time (ms): 0.1

Global cache log flushes for current blocks served %: 0.3

Avg global cache current block flush time (ms): 1701.3

Average waits for CR RX was at 222ms

High flush time indicating waits for LGWR process

Page 23: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 23

GCS log flush sync – GC waits

Avg

%Time Total Wait wait Waits % bg

Event Waits -outs Time (s) (ms) /txn time

-------------------------- ------------ ----- ---------- ------- -------- ------

gcs log flush sync 80,695 51 1,862 23 34.7 32.9

log file parallel write 44,129 0 880 20 19.0 15.6

Log archive I/O 1,607 0 876 545 0.7 15.5

gc cr block busy 729 71 752 1031 0.3 13.3

db file parallel write 25,752 0 434 17 11.1 7.7

enq: CF - contention 166 64 307 1850 0.1 5.4

Background processes waiting for excessive gcs log flush sync events.

High waits for log file parallel writes

Page 24: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 24

LGWR is important

  So, if you think, LGWR performance is important in single instance, then it is ultra-important in RAC.

  If you have LGWR related performance issues, you can almost discard other waits as symptoms.

  It’s a pity that LGWR does not run in RT mode ( or even FX class).

  LMS processes runs in elevated priority, but LGWR does not run in elevated priority, classic priority-inversion!

Page 25: Deep review of LMS process

©OraInternals Riyaj Shamsudeen 25

Contact info: Email: [email protected] Blog : orainternals.wordpress.com URL : www.orainternals.com

Thank you for attending!