7/31/2019 Wp Global Cache Waits
1/18
Real Application Clusters WP - GC Waits i
Global Cache WaitsTechnical White Paper
Jos ValerioJun 2010
Disclaimer
The following is intended for information purposes only. The author of this White Paper does not makeguaranties and accepts no responsibility for the use of this information. Use of any supplied concepts ordefinitions constitutes acceptance and understanding of these disclaimers.
7/31/2019 Wp Global Cache Waits
2/18
Real Application Clusters WP - GC Waits 1
Content
INTRODUCTION................................................................................................................... 2About this White Paper................................................................................................... 2
GLOBAL CACHE SERVICE (GCS)......................................................................................... 2
GCS - Accessing Block...................................................................................................... 3
GCS - Block Access Cost............................................................................................... 4
GCS - Block Access Latency ......................................................................................... 4
GCS - Block modes ...................................................................................................... 4
GCS - Response Time ...................................................................................................... 5
GCS - Monitoring............................................................................................................. 5
Getting the GCS Hit Ratio............................................................................................ 5
Getting the GCS Hit Ratio............................................................................................ 6
Getting Blocks involved in busy workloads (Hotblocks) ............................................. 6
AWR Global Cache Efficiency Percentages .............................................................. 7
AWR Messaging Statistics ........................................................................................ 7
AWR Messaging Traffic ............................................................................................ 8
AWR Network Traffic ............................................................................................... 8
WAIT EVENTS I ................................................................................................................. 9
gc current block 2-way Fig 1 ........................................................................................ 9
gc current block 3-way Fig 2 ........................................................................................ 9
gc cr block 2-way............................................................................................................. 9
gc cr block 3-way............................................................................................................. 9
WAIT EVENTS - II ............................................................................................................... 11GC cr/current block congested ..................................................................................... 11
GC cr/current block busy .............................................................................................. 11
GC current grant busy................................................................................................... 11
GC cr/current block request ......................................................................................... 11
CONTENTION TYPES.......................................................................................................... 11
The block-oriented........................................................................................................ 11
The message-oriented .................................................................................................. 12
The contention-oriented............................................................................................... 12
The load-oriented ......................................................................................................... 12
Best Practices .................................................................................................................... 13
General.......................................................................................................................... 13
Network ........................................................................................................................ 13
Hardware ...................................................................................................................... 13
Monitoring .................................................................................................................... 14
Storage .......................................................................................................................... 14
Conclusions ....................................................................................................................... 14
Glossary............................................................................................................................. 15
About the author - Summary............................................................................................ 16
About Technical Reviewers............................................................................................... 16
Acknowledgements........................................................................................................... 16
7/31/2019 Wp Global Cache Waits
3/18
Real Application Clusters WP - GC Waits 2
INTRODUCTION
About this White Paper
Basically this paper provides good understanding in how nodes communicate between them;this is a must if working in performance. As with single-instance it is very important to avoiddisk I/O whenever possible primarily by keeping frequently accessed data in memory. RACconfiguration is similar; the data might be in the memory of one of the other instances.Therefore, RAC uses interconnect to request the required data from another instance thathas it in memory, rather than by reading it from disk. Each request across interconnect isreferred to as a Global Cache request. This document provides all the necessary tools toassist database administrators in understanding the Real Applications waits in the GlobalCache.
Who should read this white paper?
This white paper is intended to be accessible to those who are not relatively new to the RealApplications Cluster performance. Familiarity with advanced Oracle concepts and SQLlanguage is assumed.
GLOBAL CACHE SERVICE (GCS)
Global Cache Service (GCS) is the main component of Oracle Cache Fusion technology.This is represented by background process LMS. The main function of GCS is to track thestatus and location of data blocks. Status of data block is the mode and role of data block..GCS is also responsible for block transfer between the grid the instances. GCS is themechanism that guarantees the data integrity through global access levels use. GCS maintainsthe block modes for data blocks in the global role. It is also responsible for block transfersbetween the instances. Upon a request from an Instance, GCS organizes the block shippingand deliver the right lock mode conversions.
7/31/2019 Wp Global Cache Waits
4/18
Real Application Clusters WP - GC Waits 3
GCS
Guarantees cache coherency.
Manages caching of shared data via Cache Fusion
Minimizes access time to data which is not in local cache and would otherwise beread from disk or rolled back
Implements fast direct memory access over high-speed interconnects for all data
blocks and types Uses an efficient and scalable messaging protocol
GCS - Accessing Block
The effect of accessing blocks in the global cache and keep coherency is mapped by "TheGlobal Cache Service" statistics for current and cr blocks, for example, gc current blocksreceived, gc cr blocks received, and so on.
7/31/2019 Wp Global Cache Waits
5/18
Real Application Clusters WP - GC Waits 4
GCS - Block Access CostThe process of block retrieval generates:
Message propagation delay Inter process CPU Operating System Scheduling Block Server Load
To calculate the costs use the formula below.
Block Access Cost = message propagation delay + IPC CPU + Operating SystemScheduling + Block Server Load
Note: There is fifth factor called interconnect stability, get in play when there are switchproblems. At this time it is not clear how did not provide how to calculate or estimate itexactly.
GCS - Block Access Latency
The following factors impacts the processing time:
Operating System
CPU load on other nodes Oracle processing time Available Interconnect network throughput
GCS - Block modes
A block can exist in multiple buffer caches and can be help by multiples instances indifferent modes depending on whether the blocks is being read or updated by the instance.
Resource Modes
Null N No access rights
Shared S Share resources can be read by multiple databases instances butcant be updated by any instance
Exclusive X An instance holding a block in exclusive mode can modify a
block. Only one instance can hold a block in exclusive mode at a
time.
Resource Roles
Local When a data block is first read into the instance from the disk it
has a local role. Meaning that only 1 copy of data block exists in
the cache. No other instance cache has a copy of this block.
Global Global role indicates that multiple copy of data block exists in
clustered instance. A user connected to one of the instance
request for a data block. This data block is read from disk into an
instance. The role granted is local. If another instance request for
7/31/2019 Wp Global Cache Waits
6/18
Real Application Clusters WP - GC Waits 5
same block this block will get copied to the requesting instance
and the role becomes global can not be updated by any instance.
This role and mode information is maintained in GRD (Global Resource Directory) by GCS (Global
Cache Service).
GCS - Response Time
Response time for cache fusion transfers is determined by the messaging and processingtimes imposed by the physical interconnect components, the IPC protocol and the GCSprotocol. It is not affected by disk I/O factors other than occasional log writes. The cachefusion protocol does not require I/O to data files in order to guarantee cache coherency andOracle RAC inherently does not cause any more I/O to disk than a non-clustered instance.
GCS - Monitoring
Getting the GCS Hit Ratio
--
-- Use SQL*Plus and connect AS SYSDBA
-- Getting the Global Cache Hit Ratio
--
SELECT
inst_id Instance #,
VALUE+B.VALUE+C.VALUE+D.VALUE)/(E.VALUE+F.VALUE) GCS HIT RATIO
FROM
GV$SYSSTAT A,
GV$SYSSTAT B,
GV$SYSSTAT C,
GV$SYSSTAT D,
GV$SYSSTAT E,
GV$SYSSTAT F
WHERE
NAME=gc gets
AND B.NAME=gc converts
AND C.NAME=gc cr blocks received
AND D.NAME=gc current blocks received
AND E.NAME=consistent gets
AND F.NAME=db block gets
AND B.INST_ID=A.INST_ID
AND C.INST_ID=A.INST_ID
AND D.INST_ID=A.INST_ID
AND E.INST_ID=A.INST_ID
AND F.INST_ID=A.INST_ID;
Instance # GCS CACHE HIT RATIO
---------- ----------------------
2 .028036562 .01279997
7/31/2019 Wp Global Cache Waits
7/18
Real Application Clusters WP - GC Waits 6
Getting the GCS Hit Ratio
--
-- Use SQL*Plus and connect AS SYSDBA-- Getting Block Transfer Ratio
--
SELECTINST_ID Instance #,
VALUE/B.VALUE BLOCK TRN RATIOFROM
GV$SYSSTAT A, GV$SYSSTAT B
WHERENAME=gc defersAND B.NAME=gc current blocks served
AND B.INST_ID=A.INST_ID;
/
Instance # BLOCK TRN RATIO
---------- --------------------
2 .0526001052 .078004479
Note: A value over .3 its a high value
Getting Blocks involved in busy workloads (Hotblocks)
--
-- Use SQL*Plus and connect AS SYSDBA
-- Getting Hot Blocks--
SELECT INST_ID "Instance #",NAME,KIND,sum(FORCED_READS) "Forced Reads",
sum(FORCED_WRITES) "Forced Writes"
FROM GV$CACHE_TRANSFERWHERE owner#!=0GROUP BY INST_ID,NAME,KIND
ORDER BY 1,4 desc,2/
Instance NAME KIND Forced Reads Forced Writes--------- -------------------- ---------- ------------ -------------
1 TT_PROD_IND INDEX 408 0
1 TTQUEUE TABLE 44 0
2 TTQUEUE TABLE 573 02 TT_PROD_IND INDEX 321 0
2 AQ$_QUEUE_TABLES TABLE 9 0
7/31/2019 Wp Global Cache Waits
8/18
Real Application Clusters WP - GC Waits 7
Tip: GV$BH shows the buffer header information for all instances. That is, if you run a multi-instancedatabase, then GV$BH might be very of great help in order to find the block numbers of blocks experiencinga lot of FORCE_READS and FORCED_WRITES. Then you can find the rows in those blocks.
V$BH Status values
Free ResouceMode
Details
Freee Buffer is not currently in useCR NULL Consistent read (read only)SCUR S Shared Current Block (read only)XCUR X Exclusive current block (able to modify)PI NULL Past Image (read only)
AWR Global Cache Efficiency Percentages
Global Cache Efficiency Percentages
Data blocks retrieved from local cache or remote instance
Global Cache Efficiency Percentages (Target local+remote 100%)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer access - local cache %: 99.12
7/31/2019 Wp Global Cache Waits
9/18
7/31/2019 Wp Global Cache Waits
10/18
Real Application Clusters WP - GC Waits 9
WAIT EVENTS I
RAC wait events are grouped in a category called Cluster Wait Class characterized asCurrent or CR.
Current - blocks read into memory for the first time
Consistent Read (CR) - denotes block for read access
The following wait events shows that the remotely cached blocks were shipped to the localinstance without having been busy, pinned or requiring a log flush:
gc current block 2-way Fig 1gc current block 3-way Fig 2gc cr block 2-waygc cr block 3-way
When: during cache fusion process
Instance A requests block from master instance B
If the block is available on B then it is sent to A
Fig. 1
7/31/2019 Wp Global Cache Waits
11/18
7/31/2019 Wp Global Cache Waits
12/18
Real Application Clusters WP - GC Waits 11
WAIT EVENTS II
GC cr/current block congested
o Repeated requests by foreground processes, not serviced by LMSo Indicates LMS not able to keep upo Queue lengths & scheduling delays in OS, can cause LMS delays
GC cr/current block busy
o Delay for some reason, before block sent to requestor
GC current grant busy
o Permission to access the block granted, but blocked by other requests aheadof it
GC cr/current block request
o Wait time, cr or current block is being retrieved
Tip: The number of GCS resource structures is determined by the _gcs_resources parameter. Number of freeGCS resource structures are in X$KJBRFX.
CONTENTION TYPES
Block-oriented
o gc current block 2-wayo gc current block 3-wayo gc cr block 2-wayo gc cr block 3-way
Message-orientedo gc current grant 2-wayo gc cr grant 2-way
Contention-orientedo gc current block busyo gc cr block busyo gc buffer busy acquire/release
Load-orientedo gc current block congestedo gc cr block congested
The block-orientedwait event statistics indicate that a block was received as either theresult of a 2-way or a 3-way message, means that, the block was sent from either theresource master requiring 1 message and 1 transfer, or was forwarded to a third node from
which it was sent, requiring 2 messages and 1 block transfer.
The gc current block busy and gc cr block busy wait events points that the local instance thatis making the request will not immediately receive a current or consistent read block. The
7/31/2019 Wp Global Cache Waits
13/18
Real Application Clusters WP - GC Waits 12
term busy in these events reflects that the block was delayed on a remote instance. Forexample, a block cannot be shipped immediately if Oracle has not yet written the redo forthe blocks changes to a log file.
Nevertheless block busy wait events, a gc buffer busy event means that Oracle cannotimmediately grant access to data that is stored in the local buffer cache. This is because aglobal operation on the buffer is pending and the operation has not yet completed. In other
words, the buffer is busy and all other processes that are attempting to access the localbuffer must wait to the process will be completed.
The appearance of gc buffer busy events means that there is block contention that comesfrom multiple requests for access to the local block. Oracle must queue these requests. Thelength of time that Oracle needs to process the queue depends on the remaining service timefor the block. The service time is affected by the processing time that any network latency
adds, both from the remote and local instances.
The average wait time and the total wait time should be considered when being alerted toperformance issues. Usually, either interconnect or load issues or SQL execution against alarge shared working set can be found to be the root cause.
The message-orientedwait event statistics indicate that no block was received because itwas not cached in any instance. Instead a global grant was given, enabling the requestinginstance to read the block from disk or modify it.
If the time consumed by these events is high, then it may be assumed that the frequently
used SQL causes a lot of disk I/O (in the event of the cr grant) or that the workload insertsa huge amount of data and needs to find and format new blocks frequently (in the event ofthe current grant).
The contention-orientedwait event statistics shows up that a block was received whichwas pinned by a session on another node, was deferred because a change had not flushed todisk yet or due to of high concurrency, and therefore could not be shipped immediately. Abuffer may also be busy locally when a session has already initiated a cache fusion operationand is waiting for its completion when another session on the same node is trying to read ormodify the same data. High service times for blocks exchanged in the global cache mayexacerbate the contention, which can be caused by frequent concurrent read and write
accesses to the same data.
The load-orientedwait events indicate that a delay in processing has occurred in the GCS,which is usually caused by high load, CPU saturation and would have to be solved byadditional CPUs, load-balancing, off loading processing at different times frames or a newcluster node. For the events mentioned, the wait time encompasses the entire round tripfrom the time a session start waiting after initiating a block request until the block arrives.
The column CLUSTER_WAIT_TIME in V$SQLAREA represents the wait time incurred byindividual SQL statements for global cache events and will identify the SQL which may need to be tuned.
7/31/2019 Wp Global Cache Waits
14/18
Real Application Clusters WP - GC Waits 13
Best Practices
Sr. DBA is needed to successful advice and tune a RAC system, it is complex by default.Across the system life many default configuration needs changes according to yourproduction workload.
One of the most important aspects of RAC tuning is the monitoring and tuning of theglobal services directory processes. The processes in the Global Service Daemon (GSD)communicate through the cluster interconnects. If the cluster interconnects do not performproperly, the entire RAC structure will compromised.
General
Avoid serialization during the application design Ensure adequate resources on surviving nodes
Benchmark cluster configuration
Load test on single instance first
Apply few changes at a time
Fix bad plans, serialization and schemas
Fix I/O issues avoiding full scans
Reduce or eliminate the hard parsing
Avoid the use of non ASSM segments
Control High Rate DLMs on small cached segments.
Network
Monitor dropped packets, timeouts, buffer overflows, transmit and receive errors
Hardware
Redundancy server, storage, network components
Add HBA cards, switches, disk array controllers
Load balance LUNs across HBA ports
Enable hyperthreading at the OS level
Use asynchronous I/O
Avoid dissimilar disks within disk group
Verify Set aio-max-size and aio-max-ns
7/31/2019 Wp Global Cache Waits
15/18
Real Application Clusters WP - GC Waits 14
Monitoring
Monitoring and tuning requires deep RAC skills and knowledge, DBA will need this needsspecialized trainings.
Use Database Control or Grid Control
View overall system status, status of cluster, alert logs
Monitor throughput across Interconnect
Make decisions to add or redistribute resources
Tune SQL with full scans plans or heavy physical access.
V$BH can definitely be of great interest even on non-OPS systems if you want to
know which blocks of which objects are currently in your buffer cache and whatshappening.
Verify that Operating System and RAC Best Practices were applied.
Storage
RAID 1 its the better performance choice. RAID 5 it is known to be slower forwrites, because of the CPU overhead that is required for each write.
Measure the I/O frequently, be sure your storage can handle the database requests.
Conclusions
The integration between development group and database administration isessential to design robust applications.
Applications persistence should be carefully considered. Although most applications will run on RAC without modifications many changes
could be applied to get a better performance.
RAC Performance depends on the application. The Global Cache Service is highly dependent on the data blocks usage. Applications should be minimize in the use of blocks between the instances. Minimizing the interconnect traffic means maximize platform performance and
scalability. In a database cluster environment, badly tuned SQL will not run better. Serializing contention makes applications less scalable
7/31/2019 Wp Global Cache Waits
16/18
Real Application Clusters WP - GC Waits 15
GlossaryRAC Real Application Clusters. RAC is a shared disk clustered database. Every
instance in the cluster has equal access to the databases data on diskGCS The Global Cache Service is the controlling process that implements
Cache Fusion. It maintains the block mode for blocks in the global role. Itis responsible for block transfers between instances. The Global CacheService employs various background processes such as the Global CacheService Processes (LMSn) and Global Enqueue Service Daemon (LMD).
GlobalEnqueueService
The Global Enqueue Service Daemon (LMD) is the resource agentprocess that manages Global Enqueue Service (GES) resource requests.The LMD process also handles deadlock detection Global EnqueueService (GES) requests. Remote resource requests are requests originatingfrom another instance.
LMSn Oracle process that provides inter-instance resource managementHotblocks Most accessed blocks in the buffer cacheOPS Old version of the Oracle database system designed for massively parallel
processors. (Pre RAC)OS Operating SystemLUN Logical UnitHyperthreading Hyper-Threading technology is a technique which enables a single CPU to
act like multiple CPUs.RAID RAID, an acronym for redundant array of independent disks or redundant
array of inexpensive disks, is a technology that provides increased storagereliability through redundancy, combining multiple low-cost, less-reliabledisk drive components into a logical unit where all drives in the array are
interdependentAWR Automatic Workload Repository (Available from Oracle 10g and above).
The AWR is used to collect performance statistics.Oracle waits When Oracle executes an SQL statement, it is not constantly executing.
Sometimes it has to wait for a specific event to happen before it canproceed.
7/31/2019 Wp Global Cache Waits
17/18
Real Application Clusters WP - GC Waits 16
About the author Summary
Twelve+ years working in Oracle Consulting. Extensive training and field experiencedefining and implementing technologies strategies at key companies in Latin America.Advance knowledge within all RAC versions from 9.0.x to 11gR2, leading wholeproject expects. Technical and managerial leadership specializing in infrastructure andhigh availability.
Personal Blog Site: http://jose-valerio.com.areMail : [email protected],
About Technical Reviewers
Pablo Albeck Oracle Practice Manager at Oracle Corporation, with several years of experience inperformance tuning complex Systems. Paul enjoys exploring the world ofdatabases and the process of turning.
Juan Carserta - Oracle Oracle Technology Specialist Juan was very involved in RAC performancebeing one of the most experienced professionals in high availability in LAD.
Acknowledgements
This simple word paper is dedicated to all those in pursuit of extraordinary performance. I reallyappreciate the opinions, ideas and concepts proposed by my friends and colleagues, in fact I used many ofthem, so thank you very much.
Jorge Teodoro Sr. Engineer TechnologistReinaldo Gonzlez Oracle Database SpecialistFernando Sciaccaluga Oracle Technology SpecialistRosa Zahora Oracle Database SpecialistMarcelo Ochoa Oracle Developer Specialist Oracle ACEErik Peterson Oracle Director - Reviewer
7/31/2019 Wp Global Cache Waits
18/18
Real Application Clusters WP - GC Waits 17
Bibliography
Oracle Documentation http ://www.oracle.com/technology/documentation/index.html
Pro Oracle Database 10g RAC on Linux Julian Dyke & Steve ShawPro Oracle Database 11g RAC on Linux Julian Dyke, Martin Bach & Steve ShawJulian Dyke http://www.juliandyke.comTom Kyte http://asktom.oracle.comOracle Performance Survivor Guy HarrisonPersonal experience in field http://Jose-valerio.com.arOOW 2006 RAC PerformanceExpertsReveal All . Public
Michael Zoll and Barb LundhildPortions of this WP was inspired in this presentation.Thanks
Top Related