Deadlock From Library Cache Lock

download Deadlock From Library Cache Lock

of 11

Transcript of Deadlock From Library Cache Lock

  • 8/13/2019 Deadlock From Library Cache Lock

    1/11

    Deadlock Caused By

    Library Cache Lock

  • 8/13/2019 Deadlock From Library Cache Lock

    2/11

    Introduction

    The in-house Oracle performance analyzer

    tool can be very handy and efficient to analyze

    performance issue triggered by library cache

    lock.

    Here I will present a real life case to analyze

    and resolve such type of performance issues

    in minutes.

  • 8/13/2019 Deadlock From Library Cache Lock

    3/11

    The Issue

    On 01/03/2014, DB sp2-stgadmdb suffered from library cache lock contention.Some queries have waited for more than 15 hours.

    From Oracle performance analyzer, we can see the active session list, wait events,running time and wait time. Here I sorted the list by column SEC_WAIT.

    Oracle performance analyzer provides a lock finder tool via context menu: FindLock Holder. It can be used to retrieve the lock queue and lock holder.

  • 8/13/2019 Deadlock From Library Cache Lock

    4/11

    The Lock Queue

    With one click, the lock queue information is displayed in another tab. Thesession I started the context menu has requested exclusive lock on tableAMD_LB_FACT.LB_ACTIVE_DAILY_AGGR.

    There are 19 entries in the table queue (the first row is the session Istarted the lock search, and it is duplicated later in the list.), most of themare requesting the lock in Share mode.

  • 8/13/2019 Deadlock From Library Cache Lock

    5/11

    The Lock Holder

    Sort by the column MODE_HELD, we can find the session which holds the library cache lock, sid 625 on node 2.

    Why does the session hold the lock for so long? The wait event for the holder session is also displayed. It is waitingfor PX Deq: Parse Reply. Basically, it is a PX operation and the QC session is waiting for parallel slave processes tocomplete child cursor parsing. Still, why wait for so long (check WAIT_SEC)?

    Oracle Performance Analyzer provides context menu to check individual session, or all sessions of a PX operationby QC, or all sessions of similar SQL_ID. Since the holder has PX operation, Track PX Operation is used for furtherresearch.

  • 8/13/2019 Deadlock From Library Cache Lock

    6/11

    The PX Operation Sessions

    Now we can see all the sessions related to the PX operation.

    Use the bottom pane we can dig into SQL specific information like SQL text.

    From the list, we can see all but one slave sessions are waiting for cursor: pin S wait on X, a typicalevent when PX operation runs into trouble for parsing. Usually one session is working on theparsing, which is either got blocked, or slowly loading library objects into shared pool. Here we cansee one slave session is also waiting on library cache lock, so this session is the bottleneck forparsing.

    We can further use context menu to find the lock chain.

  • 8/13/2019 Deadlock From Library Cache Lock

    7/11

    Recursive SQL

    Note the SQL_ID of the slave processes is different from the QC session.The bottom pane can be used to identify the actual SQL.

    The slave processes are working on a recursive SQL from parsing process.Basically, the SQL is used to figure out partition pruning.

    The SQL_ID without child number indicates the related sessions have notdone the parsing.

  • 8/13/2019 Deadlock From Library Cache Lock

    8/11

    The Lock Chain

    Find Lock Holder function, triggered from the slave session waiting for library cache lock,displays another list of the lock queue.

    The lock holder is the query QC session. The lock is held in Share mode.

    Why the blocked slave sessions cannot be granted the lock with Share mode?

    The SYS session (sid 13, node 2) is requesting the lock with Exclusive mode. It also requested beforethe PX slave sessions (check WAIT_SEC, it has waited for 54160 seconds).

  • 8/13/2019 Deadlock From Library Cache Lock

    9/11

    How Could That Happen?

    Here is my understanding: When the PX query started, the QC session spent too much time for

    parsing. Possibly the shared pool is not large enough, or the requiredobjects not loaded into shared pool.

    Before QC completed the parsing, the SYS query (a stat job, see next

    slide) needs update the table statistics, so it requested exclusive lockon the same table, and is blocked by library cache lock.

    When the QC session of PX operation completed its parsing, theparallel slave process parsings could not get library cache lock,because the SYS session had requested exclusive lock before them.

    Since the QC session is not going to release the lock, we basically getdeadlocked.

    Still, it is very interesting why cannot slave processes inherit the lockfrom PQ process?

  • 8/13/2019 Deadlock From Library Cache Lock

    10/11

    The Troublemaker

    The context menu Track This Session can be used to trackthe SYS session which is requesting exclusive lock.

    To break the deadlock, either the SYS stats job, or thequery, has to be killed.

  • 8/13/2019 Deadlock From Library Cache Lock

    11/11

    Back To Normal

    Here is the DB status minutes after I cleared the SYS stats job session.

    All but one queries have running time longer than 1 minute.

    At the time I am writing this slide (within one hour), all user queries are completedand the only visible active sessions are from Oracle Perf analyzer.

    As a side node, if a stat job got killed, better to restart it later. Missing orincomplete stats could cause other performance issues.