Wait Events in RAC Session 362
description
Transcript of Wait Events in RAC Session 362
Arup Nanda2Wait Events in RAC
What is This?• RAC Performance Tuning
– I teach a course – Performance Tuning in RAC• Wait events are useful for understanding the bottlenecks• All single instance wait events are applicable to RAC• RAC has some special cases• This session talks about those RAC-specific wait events• This is just a subset of the events; not a comprehensive
list
Arup NandaWait Events in RAC
What’s a Wait?• A process in Oracle can only be in three states
– Doing something Useful (consuming CPU) ….. U– Idle, waiting for some work to be assigned ….. I– Waiting for something, e.g. ….. W
• a block from disk• a lock• a latch (could be waiting on CPU)
• Response time = U + I + W• We must accurately measure each component time
before we decide what and how to tune
3
RAC for Beginners4Arup Nanda
Node 1
Cluster Coordination
Buffer Cache Buffer Cache
DBWR DBWR
LMS LMS
SCN1
DBWR must get a lock on the database block before writing to the disk. This is called a Block Lock.
Node 2
Database
SCN2
Checkpoint!
Checkpoint!
Arup Nanda5Wait Events in RAC
Cache Fusion
Instance 1 Instance 2
5
session1 session2
5
Has modified it
Wants to modify it
Will get it via interconnect
Arup NandaWait Events in RAC
Checking for BuffersHow exactly is this “check”
performed?• By checking for a lock on the
block• The request comes to the Grant
Queue of the block• Someone checks that no other
instance has any lock• Instance 1 can read from the
disk• i.e. Instance 1 is granted the
lock
6
Block
SID1
SID2
SID3
Grant Queue
Convert Queue
SID5
SID6
SID7
Arup Nanda7Wait Events in RAC
Master Instance• Only one instance holds the grant and
convert queues of a specific block• This instance is called Master Instance
of that block• Master instance varies for each block• The memory structure that shows the
master instance of a buffer is called Global Resource Directory (GRD)
• That is replicated across all instances• The requesting instance must check the
GRD to find the master instance• Then make a request to the master
instance for the lock
Block
SID1
SID2
SID3
Grant Queue
Convert Queue
SID5
SID6
SID7
Arup Nanda8Wait Events in RAC
Scenario 1
• Session connected to Instance 1 wants to select a block from the table
• Activities by Instance 1
1. Check its own buffer cache to see if the block exists1. If it is found, can it just use it?
2. If it not found, can it select from the disk?
2. If not, then check the other instances• How will it know which copy of the block is the best
source?
Instance 1 Instance 2Session
DB
Arup Nanda9Wait Events in RAC
Node 2Node 1
Cache Fusion
Buffer Cache Buffer Cache
SMON SMON
LMS LMS
When node 2 wants a buffer, it sends a message to the other instance. The message is sent to the LMS (Lock Management Server) of the other instance. LMS then sends the buffer to the other instance. LMS is also called Global Cache Server (GCS).
messagebuffer
Arup NandaWait Events in RAC
Grant Scenario 21. Check its buffer cache to see if the block exists
2. And the buffer is found. Can Instance1 use it?Not really. The buffer may be old; it may have been changed
3. LMS of node1 sends message to master of the buffer
3. Master checks the GES and doesn’t sees any lock
4. Instance 1 is granted the global block lock
5. No buffer actually gets transferred
10
Arup NandaWait Events in RAC
Grant Scenario 3• Instance 1 is the master
– Then it doesn’t have to make a request for the grant• In summary, here are the possible scenarios when
Instance1 requests a buffer– Instance1 is the master; so no more processing is
required– No one has the lock on the buffer, the grant is made
by the master immediately– Another instance has the buffer in an incompatible
mode. It has to be changed.
11
Arup Nanda12Wait Events in RAC
Buffer States and Locks• Buffers can be gotten in two states
– Current – when the intention is to modify• Shared Current – most recent copy. One copy per
instance. Same as disk• Exclusive Current – only one copy in the entire
cluster. No shared current present
– CR – when the intention is to only select• Locks facilitate the state enforcement
– XCUR for Exclusive Current– SCUR for Shared Current– No locking for CR
Arup NandaWait Events in RAC
Placeholder Event• When the buffer is first requested, the session does not
know which of the three paths it will go on to• Therefore it is assigned a placeholder event• This event is known as gc cr block request (for
Consistent Read requests)• If the request is made for the buffer in Current mode, the
event is gc current block request • When one of the three options is chosen, the appropriate
event replaces the placeholder event
13
Arup NandaWait Events in RAC
Grant Event• If the session merely requests a grant from the master:
– It waits with the gc cr|current grant 2-way event– gc cr grant 2-way, for requests for buffers in
Consistent Read mode– gc current grant 2-way, for requests for buffers in
Current mode
14
Arup NandaWait Events in RAC 15
gc current|cr grant 2-way
Instance 1
Instance 2
Session
Database
LMSLMS
gc current block requestgc current grant 2-way
GRD GES
db file scattered readBlock
SID1
SID2
SID3
Grant Queue
Convert Queue
SID5
SID6
SID7request
grante
d
Arup NandaWait Events in RAC
Block Event• After the request is made
– and assuming that the buffer is not in the local cache– the buffer may be found in another instance– Requestor requests the buffer from the holding
instance• Possibilities:
– Holder is also the master of that buffer– Holder is not the master
• If the holder is the master, the requesting session waits with the event gc cr|current block 2-way
16
Arup NandaWait Events in RAC 17
gc current|cr block 2-way
Instance 1
Instance 2
Session
Database
LMSLMS
gc current block requestgc current block 2-way
GRD GES
Block
SID1
SID2
SID3
Grant Queue
Convert Queue
SID5
SID6
SID7
Arup NandaWait Events in RAC 18
gc current|cr block 3-way
Instance 1
Instance 2
Instance 3
Session
Requestor
Master
Holder
gc current block 3-wayBlock
SID1
SID2
SID3
Grant Queue
Convert Queue
SID5
SID6
SID7
Arup NandaWait Events in RAC
Grant –vs- Get EventsAction Wait Event is
Block is requested by a session Placeholder event – gc cr|current block request
Buffer lock is requested from the master gc cr|current grant 2-way
Buffer lock is granted, get from the disk db file scattered|sequential read
Buffer is requested from the holder which is the same as the master
gc cr|current block 2-way
Buffer is requested from the holder which is not the master
gc cr|current block 3-way
19
1. There is no 3-way grant event, since the request is made to the master2. There is no 4-way block event; since there will a maximum of 2 hops:
requestor master holder
Arup NandaWait Events in RAC
InterpretationScenario Interpretation Tuning ImplicationThe grant waits are very high compared to block waits
The requested blocks are in the current instance; but the master is another instance
•Manually remaster the object?
The 2-way block waits are very high compared to 3-way
The cache fusion is taking place as expected, and most of the blocks are held and mastered at the same instance. But most of the blocks are not found in the local instance.
•Faster interconnect•Shorten the run queue
The 3-way block waits are high compared to 2-way
Cache fusion is taking place; but the master and the holder are usually different.
•Manually remaster the object•Application partitioning
20
Arup Nanda21Wait Events in RAC
gc current/cr block lost• Lost blocks while being transferred to
the remote instance [in the interconnect]
• Cause: may not be the network itself• Cause #1 Network
– Network traffic drops the packets– Confirm from the ifconfig -a output– If packets are dropping, this could
be a cause– Why?
• Bad network configuration• CPU used for network
processing• Cause #2 CPU
– LMS process is CPU starved
Instance 1
Instance 2
Session
Database
LMSLMS
gc current block requestgc current block 2-way
GRD GES
Block
SID1
SID2
SID3
Grant Queue
Convert Queue
SID5
SID6
SID7
Arup Nanda22Wait Events in RAC
gc current/cr block busy• What it Means
– Session wants a block from the remote instance– Remote instance delays preparing the block to
fulfill the request [CR or current] read• Cause:
– Local delay on the remote instance– Most likely: an I/O bottleneck on the remote
instance• block is being accessed by some session which is CPU
starved• LGWR has not written a buffer to redo yet
– Less likely: CPU starvation
Arup Nanda23Wait Events in RAC
gc current/cr block congested
• Meaning– The instance has requested block from remote instance. – Remote instance has prepared the block and shipped it
• but it has not reached the requesting instance within 1 ms.
• What could be cause – network bottleneck?• Not necessarily. Causes:
– Long run queues, causing the LMS process to be delayed in processing the incoming block
– Heavy paging due to memory deficiency. This causes the blocks to be paged in before being processed.
Arup Nanda24Wait Events in RAC
Putting it All Together• Every Oracle process is either
– Doing some productive work– Waiting for some work to be given (idle)– Waiting for some resource
• Understand the reason for the wait– Devise a plan accordingly
• RAC related wait events are manifestations of these issues mostly
• Caused by– Network issues – LMS being overloaded– Blocks busy
Arup Nanda25Wait Events in RAC
Thank You!Session 362
• More Information:– Blog: arup.blogspot.com
• 100 Things You Probably Didn't Know About Oracle Database http://bit.ly/evr05e
– Twitter: arupnanda