S311441 Practical Performance Management for Oracle Real Application Clusters Michael Zoll,...
-
Upload
george-wilkins -
Category
Documents
-
view
238 -
download
5
Transcript of S311441 Practical Performance Management for Oracle Real Application Clusters Michael Zoll,...
S311441 Practical Performance Management for Oracle Real Application Clusters
Michael Zoll, Consulting Member of Technical StaffBarb Lundhild, Product Manager, Oracle Real Application Clusters
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions.The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
<Insert Picture Here>
Agenda
• Oracle RAC Infrastructure and Technical Fundamentals
• Application and Database Design• Common Problems and Symptoms• Diagnostics and Problem Determination • Appendix
Objective
• Convey a few simple and fundamental concepts of Oracle RAC performance
• Summarize application level performance and scalability information
• Provide some simple sizing hints• Give exemplary overview of common problems and
solutions• Builds on similar presentation from OOW 2008 http://www.oracle.com/technology/products/database/clustering/pdf/s298716_oow2008_perf.pdf
<Insert Picture Here>
Oracle RAC Infrastructure: Technical Fundamentals, Sizing and Configuration
Service
Oracle RAC Architecture
public network
Node1
Operating System
Oracle Clusterware
instance 1
ASM
VIP1
Listener Node 2
VIP2
Node n
VIPn/…/
Redo / Archive logs all instances
shared storage
Database / Control files
OCR and Voting Disks
Managed by ASM
SCAN_Listener
Service
Operating System
Oracle Clusterware
instance 2
ASM
ListenerSCAN_Listener
Service
Operating System
Oracle Clusterware
instance n
ASM
ListenerSCAN_Listener
Global Cache and Global Enqueue ServiceProcesses and Functions
DictionaryCache
Log buffer
LGWRDBW0
LibraryCache
Buffer Cache
LMONLMD0
SGARuns in Real
Time Priority
LMSx
Cluster Private High Speed Network
Global Resource Directory
Global Enqueue Service Global Cache Service
OracleProcess
OracleProcess
Global Cache Access
Receive
LGWR
1
LMS
2
6SendShadow
process:
: Buffer Cache
Immediate direct send: > 96%
Log write and send : < 4%
Legend:
Flush redo
4
Post3
Post
5
Basic Performance Facts
• Global Cache access is 100 - 500 usecs ( roundtrip )– Data immediately served from remote instances via private, high
speed interconnect • Redo may have to be written to log file before send if data was changed and has
not been committed yet
– Performance varies with network infrastructure and network protocol
• Maximum network hops is 3 messages – For clusters with more than 2 nodes, independent of total cluster
size
• CPU cost per OLTP transaction – Dependent on locality of access , I.E. messages per tx
Basic Performance Facts: Latency (UDP/GbE and RDS/IB )
Block size
RT (ms)
2K 4K 8K 16K
UDP/GE 0.30 0.31 0.36 0.46
RDS/IB 0.12 0.13 0.16 0.20
Lower CPU cost relative to protocols and network infrastructure
•Actual interconnect latency is generally not the problem unless you have exceeded capacity or you are experiencing errors
Private Interconnect
• Network between the nodes of an Oracle RAC cluster MUST be private/dedicated to traffic between Oracle RAC nodes
• Large ( Jumbo ) Frames for GbE recommended– Avoids fragmentation and reassembly ( 8K / 1500 MTU = 6
fragments )
• Interconnect bandwidth should be tested with non-Oracle utilities ( e.g. iPerf )– No packet loss at 75% - 80% of bandwidth utilization
Interconnect Bandwidth
• Generally, 1Gb/sec sufficient for performance and scalability in OLTP.
• DSS/DW systems should be designed with > 1Gb/sec capacity categorically
• Prediction of interconnect traffic is difficult– Depends on transaction instruction length per message – Empirical rule of thumb: 1Gb/sec per 32 CPU Cores
• Infiniband and 10GbE are supported for scale-out
<Insert Picture Here>
Performance and Scalability of Applications and Database Design with RAC
General Scalability
• Scaling OLTP workloads, DML intensive– Scale well, if contention is little and database/working set size
scales ( I.E. add node when demand grows)
• Read intensive workloads scale predictably and linearly– Bigger cache when adding more nodes– Faster read access to global cache than to disk, less disk IO
• If cluster-size and database size growth are balanced, system will perform and scale well
Performance and Scaling in Application and Database Design
Response Time Impact • Index contention on INSERTS when index is right-
growing– system generated “artificial” keys such as consecutive order
numbers or “natural” keys such as dates
• UPDATES or DELETES to rows in a small working set– Session logging and tracking – First-in first-out queues – State of messages in queues
• Bulk INSERTS of large amounts of data – LOBS
DML Contention and Serialization
…… ……
INSERT INTO I WHERE Key = sequenceUPDATE T SET … WHERE row in blocks[1..n]
and n is a small number
“busy blocks” “busy blocks”
Index I Index I
Table T Table T’
Modification intensive operations on small set of ( cached) blocks
Performance and Scaling in Application and Database Design
CPU Cost due to Inter-Instance Messaging and non-linear scaling
• In-memory databases – Working set spans multiple buffer caches – Frequent modifications and reads of recent modifications
• Working set fits into memory of one instance – Locality of access worsens when node are added and users are
load balanced
• Scale as long as sufficient CPU power is available
Read-intensive
…… ……
Eventually all blocks cached, Larger read cache
ReadRead
Disk Transfer
Cache Transfer
No messages in 11g
Buffer Cache 32GB Buffer Cache 32GB
Working Set on Disk 64GB
Performance and Scalability
• Good linear or near-linear scaling out of box• IO and CPU intensive applications with large
working sets and low proximity of access– Self-Service Web Applications ( Shopping Carts etc. )– CRM– Document storage and retrieval – Business Analytics and Data Warehousing
Performance and Scalability
• Partitioning or load direction may optimize performance• High proximity of access , e.g. adding and removing
from message queues– Advanced Queuing and Workflow
• Batch and bulk processes– Order processing and Inventory– Payroll processing
Identifying Performance and Scaling Bottlenecks in Database Design
The Golden Rules:• #1: For first approximation, disregard read-mostly objects and
focus on the INSERT, UPDATE and DELETE intensive indexes and tablespace
• #2: If DML access to data is random, no worries if CPU is not an issue
• #3: Standard SQL and schema tuning solves > 80% of performance problems. There is usually only a few problem SQL and Tables.
• #4: Almost everything can be scaled out quickly with load-direction and load balancing
Identifying Performance and Scaling Bottlenecks in Database Design
• Look for indexes with right-growing characteristics
– Keys comprising DATE columns or keys generated by sequence numbers
• Find frequent updates of “small” and compact tables– “small”=fits into a single buffer cache
• Identify frequently and concurrently modified LOBs
HOW ?
• Look at segment and SQL statistics in the Automatic Workload Repository
• Use Oracle Enterprise Manager Access Advisories and Automatic Database Diagnostics Monitor (ADDM)
• Instrumentation with MODULE and ACTION helps identify and quantify components of the workload
Quick FixesWithout modifying Application
• Indexes with right-growing characteristics
– Cache sequence numbers per instance – Hash or range partition table with LOCAL indexes
• Frequent updates of “small” and compact tables– Reduce block size ( 2K ) and row density of blocks
(PCTFREE 99 )
• Frequently modified LOBS– Hash partitions ( 128 – 256 )– FREE POOLS
Quick Fixes
• Application Modules which may not scale or cannot be
quickly reorganized can be directed to particular nodes via cluster managed services– For Administrator Managed and older releases create service
with 1 preferred node and the rest available– For Policy Managed databases use a singleton service
• Some large scale and high performance applications may be optimized by Data Partitioning ( range, hash, or composites) and routing per partitioning key in application server tier– E.g. hash by CLIENT_ID, REGION etc.
Leverage Connection Pools UCP: Load Balancing and Affinity
I’m busy
I’m very busy
I’m idle
ApplicationApplication
RAC Database
Instance1
Instance2
Instance3
Pool
30% Work
60% Work
10% Work
Performance and Scalability Enhancements in 11.1 and 11.2
• Read Mostly– Automatic policy detects read and disk IO intensive tables– No interconnect messages when policy kicks in -> CPU savings
• Direct reads for large ( serial and parallel ) scans– No locks , no buffer cache contention– Good when IO subsystem is fast or IO processing is offloaded to
storage caches or servers ( e.g. Exadata )
• Fusion Compression– Reduces message sizes and therefore CPU cost
• Dynamic policies to make trade-off between disk IO and global cache transfers
<Insert Picture Here>
Performance Diagnostics and Checks: Metrics and Method
<Insert Picture Here>
Normal Behaviour
• It is normal to see time consumed in – CPU– Db file sequential/scattered read– Direct read– Gc cr/current block 2-way/3-way ( Transfer from remote cache )– Gc cr/current grant 2-way ( Correlates with buffered disk IOs )
• Average latencies should be within baseline parameters • Most problems boil down to CPU, IO, network capacity or
applications issues
Normality, Baselines and Significance
AVG Waits Time(s) (ms) %Time
db file sequential read 2,627,295 21,808 8 43.2 %
CPU time 9,156 18.2
gc current block 3-way 3,289,371 4,019 1 8.0
gc buffer busy acquire 373,777 3,272 9 6.5gc current block 2-way 3,982,284 3,192 1 6.3
20.8%
gc current block busy 125,595 2,931 4
Avg < 1 ms Contention
Most significant response timecomponent
GC waits are influenced by interconnect or remote effects which are not always obvious
Distributed Cause and Effect
Node 1
Node 2
Node 1
Node 2
ROOT CAUSE
Disk CapacityDisk or Controller Bottleneck
Example: Cluster-wide Impact of a Log File IO Problem
Global Metrics View
WORKLOAD REPOSITORY report for
InstanceOOW8
Hostoowdb8
WORKLOAD REPOSITORY report for
InstanceOOW4
Hostoowdb4
Event:gc current block busy 23ms
Log file paralel write 20ms
Global Cache Transfer Stats Inst # Busy % 4 data block 114,426 95.9 24.1
Avg global cache current block flush time (ms): 21 ms
Local Symptom
Remote instance table
Cause
Investigate Serialization
gc buffer busy 9 ms
gc current block busy 4 ms
Waits for
Not OK! Global Cache Transfer Stats
4 data block 114,426 95.9 4.1 7 data bloc 162,630 76.6 23.4
Inst Block Blocks % % No Class Received Immed Busy
Avg global cache current block flush time (ms): 3.7
Log file IO
Example: Segment Statistics
Segments by Global Cache Buffer Busy
Segments by Current Blocks Received ES_BILLING TABLE 85.81 %
ES_BILLING TABLE 97.41 %
ANALYSIS: TABLE ES_BILLING is frequently read and modified on all nodes. The majority of global cacheaccesses and serialization can be attributed to this .
Comprehensive Cluster-wide Analysis via Global ADDM
Courtesy of Cecilia Gervasio, Oracle Server Technologies, Diagnostics and Manageability
<Insert Picture Here>
Common Problems and Symptoms
<Insert Picture Here>
Common Problems and Symptoms
• Interconnect or Switch Problems• Slow or bottlenecked disks• High Log file Sync Latency• System load and scheduling
Symptoms of Interconnect Problems
Capacity LimitCongestionDropped Packets ROOT CAUSE
High latencies
Serialization
Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time(s)(ms) Time Wait Class----------------------------------------------------------------------------------------------------
log file sync 286,038 49,872 174 41.7 Commit
gc buffer busy 177,315 29,021 164 24.3 Cluster
gc cr block busy 110,348 5,703 52 4.8 Cluster
gc cr block lost 4,272 4,953 1159 4.1 Cluster
cr request retry 6,316 4,668 739 3.9 Other
Symptoms of an Interconnect Problem
Should never be hereAlways a severe performance problem
Interconnect or IPC problems
NIC1 NIC2
Ports
Queues
Device Drivers
Ifconfig -a
Applications: Oracle
Switch
UDP:Packet receive errorsSocket buffer overflows
Protocol processing:IP,UDP
IP: 1201 Fragments dropped after timeout467 Reassembly failure468 Incoming packets discarded
netstat –s
gc blocks lost
TX errors:135 dropped: overruns:RX errors: 0 dropped:27 overruns:
Cluster-wide Impact of a Database File IO Problem
Node 2
Node 1
ROOT CAUSE
Disk CapacityDisk or Controller BottleneckIO intensive Queries
Cluster-Wide Disk I/O Impact
Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time(s)(ms) Time ------------------------------ ------------ ----------- ------ ------
log file sync 286,038 49,872 174 41.7
gc buffer busy 177,315 29,021 164 24.3
gc cr block busy 110,348 5,703 52 4.8 ``
Load Profile~~~~~~~~~~~~ Per Second
---------------
Redo size: 40,982.21
Logical reads: 81,652.41
Physical reads: 51,193.37
Node 2
Node 1
Expensive Query in Node 2
Causes IO bottleneck
1. IO on disk group containing
redo logs is slow2. Block shipping for frequently
modified blocks is delayed
by log flush IO
3. Serialization builds up
CAUSE:
Log File Sync Latency: Causes and Symptoms
Courtesy of Vinay Srihari, Oracle Server Technologies, Recovery
Causes of High Commit Latency
• Symptom of Slow Log Writes– I/O service time spike may last only seconds or minutes– Threshold-based warning message in LGWR trace file
• “Warning: log write elapsed time xx ms, size xxKB”• Dumped when write latency >= 500ms
– Large log_buffer makes a bad situation worse.
• Fixes – Smooth out log file IO on primary system and standby redo
apply I/O pattern– Primary and Standby storage subsystem should be configured
for peaks– Apply bug fixes in appendix
Courtesy of Vinay Srihari, Oracle Server Technologies, Recovery
Block Server Process Busy or Starved
ROOT CAUSEToo few LMSsLMS not in High PrioMemory Problems ( Swapping)
Node 1
Node 2
Block Server Process Busy or Starved
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
------------------------------ - ----------- ----------- ------ ------ ----------
gc cr grant congested 26,146 28,761 1100 39.1 Cluster
gc current block congested 13,237 13,703 1035 18.6 Cluster
gc cr grant 2-way 340,281 12,810 38 17.4 Cluster
gc current block 2-way 119,098 4,276 36 5.8 Cluster
gc buffer busy 8,109 3,460 427 4.7 Cluster
“Congested” : LMS could not dequeue messages fast enough
Avg message sent queue time (ms): 16.1On remote note :
Block Server Processes Busy
• Increase # LMS based on– Occurrence of “congested” wait events– Heuristics: 75 – 80 % busy is ok– Avg send q time > 1ms
• Caveat: # of CPUs should always be >= # of LMS to avoid starvation
• On NUMA architectures and CMT– Bind LMS to NUMA board or cores in processor set– Fence off Hardware interrupts from the processor sets
High Latencies in Global Cache
ROOT CAUSE
<Insert Picture Here>
Transient Problems and Hangs
Temporary Slowness and Hang
• Can affect one or more instances in cluster• Can be related
– IO issues at log switch time ( checkpoint or archiver slow)
– Process stuck waiting for IO– Connection storm
• Hard to establish causality with AWR statistics • Use Oracle Enterprise Manager and Active
Session History
Temporary Cluster Wait Spike
Spike in Global Cache Reponse TimeSQL with High Global Cache Wait Time
Courtesy of Cecilia Gervasio, Oracle Server Technologies, Diagnostics and Manageability
Temporary Slowness or Hang
Slowdown from 5:00-5:30
$ORACLE_HOME/rdbms/admin/ashrpt.sql
Additional Diagnostics
• For all slowdown with high averages in gc wait time– Active Session History report ( all nodes )– Set event 10708 on selected processes:
• Event 10708 trace name context forever, level 7• Collect trace files
– Set event 10899 system-wide• Threshold based , I.E. no cost
– Continuous OS Statistics • Cluster Health Monitor (IPD/OS)
– LMS, LMD, LGWR trace files– DIA0 trace files
• Hang Analysis
<Insert Picture Here>
Conclusions
Golden Rules For Performance and Scalability in Oracle RAC
• Thorough configuration and testing of infrastructure is basis for stable performance
• Anticipation of application and database bottleneck and their possible magnified impact in Oracle RAC is relatively simple
• Enterprise Manager provides monitoring and quick diagnosis of cluster-wide issues
• Basic intuitive and empirical guidelines to approach performance problems suffice for all practical purposes
AQ&Q U E S T I O N SQ U E S T I O N S
A N S W E R SA N S W E R S
http://otn.oracle.com/rac
Recommended Sessions
DATE and TIME SESSION
Tuesday, October 13 1:00 PMNext Generation Database Grid - Moscone
Sourt 104
Tuesday, October 13 2:30 PMSingle Instance Oracle Real Application
Clusters - Better Virtualization for Databases – Moscone South 300
Wednesday, October 14 11:45 AM Understanding Oracle Real Application Clusters Intenals - Moscone South 104
Thursday, October 15 9:00 AM Oracle ACFS: The Awaited Missing Feature Moscone South 305
Visit us in the Moscone West Demogrounds Booth W-037
<Insert Picture Here>
Appendix
References
• http://www.oracle.com/technology/products/database/clustering/pdf/s298716_oow2008_perf.pdf
• http://otn.oracle.com/rac